Pandas Rank – Rank Your Data – pd.df.rank()

Pandas Rank will compute the rank of your data point within a larger dataset. It is extremely useful for filtering the ‘first’ or 2nd of of a sub dataset. We will look at two methods today:

  1. Rank data within your entire DataFrame
  2. Rank data within subgroups (group by)
1. pd.DataFrame.diff(periods=1)
2. pd.DataFrame.groupby().rank()

Pseudo code: For a given data point, rank its value within the total DataFrame or Series.

Pandas Rank

Pandas Rank - Rank values in your DataFrame relative to the other values in your data. pd.DataFrame.rank()

There are two core concepts you’ll need to grasp with .rank(): Rank order (ascending or not) and method (how to rank data points with the same value).

  • Rank Order: Ascending means you are climbing something, “I am ascending stairs.” This means you are going up in number. With ascending = True, Pandas will start at your lowest values and go up, meaning your lowest values will have the lowest rank and highest values will have the highest rank. Usually I do ascending=False so the highest value has a rank=1.
  • Method: There are many ways you can handle data points of the same value. Should you force a distinct rank? or should you have a rank end in .5? Check out the parameters below for a list of how to handle these.

Rank Pro Tip: Group By

Did you know that .rank() can be used as an aggregate function too? This means you can use it within your group by function. Simply call .rank() on top of your group by function and you’ll get the ranks specific to each subgroup in your DataFrame.

Check out the code sample below for a preview of this.

Rank Parameters

  • axis (Default=0): Believe it or not, you can rank either by rows or columns. By default (axis=0) you will be ranking by rows. Change axis=1 to rank your columns. 99% of the time we are ranks rows.
  • method (‘average’, ‘min’, ‘max’, ‘first’, ‘dense’): What should you do with your data points that have the same value? First, think of them as a group, then see which method you want
    • average: Use the average rank of the group and apply to all items
    • min: Take the lowest rank of the group and apply to all items
    • max: Take the highest rank of the group and apply to all items
    • first: Ranks are assigned in order the data point appears in the DataFrame or Series. This is essentially forcing a unique rank on each item.
    • dense: Like min but the rank will increase only +1 between groups. We don’t use this one often.
  • numeric_only (Default=True): Only rank your numeric columns. If false, .rank() will also rank your strings.
  • ascending (Default=True): True if you want the ranks in ascending order, False if you do not.
  • pct (Default=False): You can also normalize your ranks by setting pct=True. This will assign a percent to your ranks and put them all between 0-1.

Let’s take a look at a code sample


In [27]:
import pandas as pd
import numpy as np

Pandas Rank

Pandas ranks is a simple but helpful function that will rank your data points in relation with each other. Not only will it apply to an entire Series, but you can also use it in a group by as an aggregate function.

We will run through 3 examples:

  1. "Hello World" of Pandas Rank
  2. Ranking Ascending True/False
  3. Ranking with different methods
  4. Ranking via pct
  5. Ranking with Group By

But first, let's create our DataFrame

In [36]:
np.random.seed(seed=42)

df = pd.DataFrame(data=np.random.normal(loc=100, scale=50, size=(8,2)),
                  columns=('Parks', 'Schools'),
                  index=['San Francisco', 'San Diego', 'Los Angeles', \
                       'New York', 'Chicago', 'Denver', 'Seattle', 'Portland']
                 )
df = df.astype(int)
df
Out[36]:
ParksSchools
San Francisco12493
San Diego132176
Los Angeles8888
New York178138
Chicago76127
Denver7676
Seattle1124
Portland1371

1. "Hello World" of Pandas Rank

Let's start off with a simple example to see how ranks works. Generally we call .rank() on a Series. Rarely do we want to get ranks for all DataFrame values, but you may.

To demonstrate, I'll copy my original DataFrame, then attach a rank column.

In [38]:
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank()
df_copy
Out[38]:
ParksSchoolspark_rank
San Francisco124936.0
San Diego1321767.0
Los Angeles88884.0
New York1781388.0
Chicago761272.5
Denver76762.5
Seattle11245.0
Portland13711.0
In [40]:
df_copy = df.copy()
df_copy.rank()
Out[40]:
ParksSchools
San Francisco6.05.0
San Diego7.08.0
Los Angeles4.04.0
New York8.07.0
Chicago2.56.0
Denver2.53.0
Seattle5.01.0
Portland1.02.0

2. Ranking Ascending True/False

Notice how the lowest numbers have the lowest ranks? That's not usually how my brain works. It more intuitive to me to have the higest numbers have the lowest rank (Ex: Highest numbers are ranked #1). To do this, set ascending=False.

In [41]:
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending=False)
df_copy
Out[41]:
ParksSchoolspark_rank
San Francisco124933.0
San Diego1321762.0
Los Angeles88885.0
New York1781381.0
Chicago761276.5
Denver76766.5
Seattle11244.0
Portland13718.0

3. Ranking With Different Methods

Let's say that you had a group of identical values. How would you want to rank them? Let's explore a few different methods we can choose.

To see a list of methods and how they affect ranks, check out our post.

First I need a DataFrame with similar values.

In [55]:
df2 = pd.DataFrame([1,2,3,4,5,3,5,6,7,7,9], columns=['Sample']).sort_values(by='Sample')
df2
Out[55]:
Sample
01
12
23
53
34
45
65
76
87
97
109
In [57]:
df2['average_rank'] = df2['Sample'].rank(method='average')
df2['min_rank'] = df2['Sample'].rank(method='min')
df2['max_rank'] = df2['Sample'].rank(method='max')
df2['first_rank'] = df2['Sample'].rank(method='first')
df2['dense_rank'] = df2['Sample'].rank(method='dense')
df2
Out[57]:
Sampleaverage_rankmin_rankmax_rankfirst_rankdense_rank
011.01.01.01.01.0
122.02.02.02.02.0
233.53.04.03.03.0
533.53.04.04.03.0
345.05.05.05.04.0
456.56.07.06.05.0
656.56.07.07.05.0
768.08.08.08.06.0
879.59.010.09.07.0
979.59.010.010.07.0
10911.011.011.011.08.0

4. Ranking Via PCT

You can also normalize your ranks to fit between 0-1 using pct=True

In [58]:
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending=False, pct=True)
df_copy
Out[58]:
ParksSchoolspark_rank
San Francisco124930.3750
San Diego1321760.2500
Los Angeles88880.6250
New York1781380.1250
Chicago761270.8125
Denver76760.8125
Seattle11240.5000
Portland13711.0000

5. Ranking with Group By

Finally, let's check out ranking within subgroups. You can use .rank() on your group by function as well.

Let's create a DataFrame that will play nicely for this example

In [62]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', 25.30),
                   ('Chambers', 'bar', 35.89)],
           columns=('name', 'type', 'AvgBill')
                 )
df
Out[62]:
nametypeAvgBill
0Foreign CinemaRestaurant289.00
1Liho LihoRestaurant224.00
2500 Clubbar80.50
3The Squarebar25.30
4Chambersbar35.89
In [65]:
df['sub_group_rank'] = df.groupby('type')['AvgBill'].rank(ascending=False)
df
Out[65]:
nametypeAvgBillsub_group_rank
0Foreign CinemaRestaurant289.001.0
1Liho LihoRestaurant224.002.0
2500 Clubbar80.501.0
3The Squarebar25.303.0
4Chambersbar35.892.0

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation