Pandas Rank will compute the rank of your data point within a larger dataset. It is extremely useful for filtering the ‘first’ or 2nd of of a sub dataset. We will look at two methods today:

- Rank data within your
**entire DataFrame** - Rank data within
**subgroups (group by)**

1. pd.DataFrame.diff(periods=1) 2. pd.DataFrame.groupby().rank()

**Pseudo code: For a given data point, rank its value within the total DataFrame or Series.**

## Pandas Rank

There are two core concepts you’ll need to grasp with `.rank()`

: **Rank order** (ascending or not) and **method** (how to rank data points with the same value).

**Rank Order:**Ascending means you are climbing something, “I am ascending stairs.” This means you are going up in number. With`ascending = True`

, Pandas will start at your lowest values and**go up**, meaning your lowest values will have the lowest rank and highest values will have the highest rank. Usually I do`ascending=False`

so the highest value has a`rank=1`

.**Method:**There are many ways you can handle data points of the same value. Should you force a distinct rank? or should you have a rank end in .5? Check out the parameters below for a list of how to handle these.

### Rank Pro Tip: Group By

Did you know that `.rank()`

can be used as an aggregate function too? This means you can use it within your group by function. Simply call `.rank()`

on top of your group by function and you’ll get the ranks specific to each *subgroup* in your DataFrame.

Check out the code sample below for a preview of this.

### Rank Parameters

**axis (Default=0):**Believe it or not, you can rank either by rows or columns. By default (`axis=0`

) you will be ranking by rows. Change`axis=1`

to rank your columns. 99% of the time we are ranks rows.**method (‘average’, ‘min’, ‘max’, ‘first’, ‘dense’)**: What should you do with your data points that have the same value? First, think of them as a group, then see which method you want`average`

: Use the average rank of the group and apply to all items`min`

: Take the lowest rank of the group and apply to all items`max`

: Take the highest rank of the group and apply to all items`first`

: Ranks are assigned in order the data point appears in the DataFrame or Series. This is essentially*forcing*a unique rank on each item.`dense`

: Like`min`

but the rank will increase only +1 between groups. We don’t use this one often.

**numeric_only (Default=True):**Only rank your numeric columns. If false,`.rank()`

will also rank your strings.**ascending (Default=True):**`True`

if you want the ranks in ascending order,`False`

if you do not.**pct (Default=False):**You can also normalize your ranks by setting`pct=True`

. This will assign a percent to your ranks and put them all between 0-1.

Let’s take a look at a code sample

```
import pandas as pd
import numpy as np
```

### Pandas Rank¶

Pandas ranks is a simple but helpful function that will rank your data points in relation with each other. Not only will it apply to an entire Series, but you can also use it in a group by as an aggregate function.

We will run through 3 examples:

- "Hello World" of Pandas Rank
- Ranking Ascending True/False
- Ranking with different methods
- Ranking via pct
- Ranking with Group By

But first, let's create our DataFrame

```
np.random.seed(seed=42)
df = pd.DataFrame(data=np.random.normal(loc=100, scale=50, size=(8,2)),
columns=('Parks', 'Schools'),
index=['San Francisco', 'San Diego', 'Los Angeles', \
'New York', 'Chicago', 'Denver', 'Seattle', 'Portland']
)
df = df.astype(int)
df
```

### 1. "Hello World" of Pandas Rank¶

Let's start off with a simple example to see how ranks works. Generally we call .rank() on a Series. Rarely do we want to get ranks for all DataFrame values, but you may.

To demonstrate, I'll copy my original DataFrame, then attach a rank column.

```
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank()
df_copy
```

```
df_copy = df.copy()
df_copy.rank()
```

### 2. Ranking Ascending True/False¶

Notice how the lowest numbers have the lowest ranks? That's not usually how my brain works. It more intuitive to me to have the higest numbers have the lowest rank (Ex: Highest numbers are ranked #1). To do this, set ascending=False.

```
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending=False)
df_copy
```

```
df2 = pd.DataFrame([1,2,3,4,5,3,5,6,7,7,9], columns=['Sample']).sort_values(by='Sample')
df2
```

```
df2['average_rank'] = df2['Sample'].rank(method='average')
df2['min_rank'] = df2['Sample'].rank(method='min')
df2['max_rank'] = df2['Sample'].rank(method='max')
df2['first_rank'] = df2['Sample'].rank(method='first')
df2['dense_rank'] = df2['Sample'].rank(method='dense')
df2
```

### 4. Ranking Via PCT¶

You can also normalize your ranks to fit between 0-1 using pct=True

```
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending=False, pct=True)
df_copy
```

### 5. Ranking with Group By¶

Finally, let's check out ranking within subgroups. You can use .rank() on your group by function as well.

Let's create a DataFrame that will play nicely for this example

```
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
('Liho Liho', 'Restaurant', 224.0),
('500 Club', 'bar', 80.5),
('The Square', 'bar', 25.30),
('Chambers', 'bar', 35.89)],
columns=('name', 'type', 'AvgBill')
)
df
```

```
df['sub_group_rank'] = df.groupby('type')['AvgBill'].rank(ascending=False)
df
```

Check out more Pandas functions on our Pandas Page