Pandas Sort By Column – pd.DataFrame.sort_values()

One of the beautiful thinks about Pandas is the ability to sort datasets.

Through sorting, you’re able to see your relevant data at the top (or bottom) of your table. There isn’t a ton you need to know out of the box. The magic starts to happen when you sort multiple columns and use sort keys.

YourDataFrame.sort_values('your_column_to_sort')

Pseudo code: Take a DataFrame column (or columns) and sort by Ascending or Descending order.

Reasons to sort a dataset include:

  • Seeing top ranked items in order
  • Sorting strings in alphabetical order

Pandas Sort

Let’s take a look at the different parameters you can pass pd.DataFrame.sort_values():

  • by – Single name, or list of names, that you want to sort by. This can either be column names, or index names. Pass a list of names when you want to sort by multiple columns.
  • axis (Default: ‘index’ or 0) – This is the axis to be sorted. You need to tell Pandas, do you want to sort the rows (axis=’index’ or 0)? Or do you want to sort the columns (axis=’columns’ or 1)?
  • ascending (Default: True) – You can pass a single boolean (True or False) or a list of booleans ([True, False]) if you’re sorting by multiple columns. See the diagram below for the difference between Ascending vs Descending.
  • inplace (Default: False) – If true, then your new sort order will write over your current DataFrame. It will change it in place. If false, then your DataFrame will be returned to you.
    • I usually do inplace=True if I’m working with the DataFrame later in my code. I’ll do inplace=False if I’m just viewing the sort order visually.
  • kind (default ‘quicksort’) – Your choice of which sorting algorithm you’d like to use. This won’t matter much unless you’re dealing with huge datasets. Even then you’d have to know the differences and tradeoffs with using each one.
  • na_position (Default: ‘last’) – You can tell pandas where you would like to put your NAs (if you have them). At the top (‘first’), or at the bottom (‘last’).
  • ignore_index (Default: False) – If false, then your index values will move with the sorting. This is useful when you want to see how the rows have moved around. However, if you want your index to remain in order, then set ignore_index=True and it’ll remain 0, 1, 2, 3, …, n-1.
  • key – Super awesome parameter! With key you can pass a function that, based on your column or row, will return a derived value that will be the key which is sorted on. Check out below for an example
    • Example: Say you wanted to sort by the absolute value of a column. You could create a derived column with absolute values and sort that, but that feels cumbersome. Instead, sort your column with a key function that grabs the absolute values.

Pandas Ascending vs Descending

One key decision you’ll need to make…do you want your values sorted highest to lowest (descending) or lowest to highest (ascending)?

Pandas Sort Values - The difference between ascending vs descending
  • Ascending = Values that are lowest will appear first or on top
  • Descending = Values that are highest will appear first or on top

Here’s a Jupyter notebook showing the different ways to sort pandas DataFrame.

In [1]:
import pandas as pd

Pandas Sort Values

Sort Values will help you sort a DataFrame (or series) by a specific column or row. Let's take a look at some examples:

  1. Sort DataFrame by a single column
  2. Sort DataFrame by mulitple columns
  3. Sort DataFrame by a single row
  4. Apply a key to sort on - Example: Sort by absolute value

First, let's create our DataFrame of city temperatures

In [2]:
df = pd.DataFrame.from_dict({
     "San Francisco": [67, 72, 49, 56],
     "Chicago": [102, 75, 80, -3],
     "Fairbanks": [45, 5, -10, 80],
     "Miami": [67, 87, 90, 75]
    })
df
Out[2]:
San FranciscoChicagoFairbanksMiami
0671024567
17275587
24980-1090
356-38075

Sort DataFrame by a single column

In order to sort a dataframe by a single column, all you need to do is call YourDataFrame.sort_values('your_column')

In this case let's sort our DataFrame by Miami temperatures

In [11]:
df.sort_values('Miami')
Out[11]:
San FranciscoChicagoFairbanksMiami
0671024567
356-38075
17275587
24980-1090

Notice how it sorted the DataFrame by the Miami column lowest to highest. This is because the default sort order is ascending=True. If you wanted to reverse order, you can set ascending=False.

In [4]:
df.sort_values('Miami', ascending=False)
Out[4]:
San FranciscoChicagoFairbanksMiami
24980-1090
17275587
356-38075
0671024567

Sort DataFrame by multiple columns

In order to sort by multiple columns, I'm going to create another DataFrame that will work better

In [12]:
df2 = pd.DataFrame.from_dict({
     "500 Club": ["Bar", 34.64],
     "Liho Liho": ["Restaurant", 200.45], 
     "Foreign Cinema": ["Restaurant", 180.45],
     "The Square": ["Bar", 45.54]
    }, orient='index', columns=['Type', 'AvgBill'])
df2
Out[12]:
TypeAvgBill
500 ClubBar34.64
Liho LihoRestaurant200.45
Foreign CinemaRestaurant180.45
The SquareBar45.54

Now say that I wanted to first sort 'Type' alphabetically (so that Bar is on top of Restaurant), then descending order (highest > lowest) for AvgBill. In order to do this I need to

  1. Pass a list of column names into the "by=" parameter and
  2. Pass a list of booleans into the "ascending" parameter that tells pandas which columns I want ascending vs descending
In [6]:
df2.sort_values(by=['Type', 'AvgBill'], ascending=[True, False])
Out[6]:
TypeAvgBill
The SquareBar45.54
500 ClubBar34.64
Liho LihoRestaurant200.45
Foreign CinemaRestaurant180.45

Notice how I sorted ascending=True for first column 'Type', then ascending=False for the second column 'AvgBill'

Sort DataFrame by a single row

Now let's move over to the row side. Infrequently, I want to sort my columns in a specific order (I'm more of a tall table person than wide table person, so this doesn't happen often).

In order to do this, I need to tell pandas that I want to sort by rows and which row I want to sort by. Let's return to our original DataFrame. I'm going to sort by the index=3 label.

To do this, I need to set axis=1

In [13]:
df
Out[13]:
San FranciscoChicagoFairbanksMiami
0671024567
17275587
24980-1090
356-38075
In [14]:
df.sort_values(by=3, axis=1)
Out[14]:
ChicagoSan FranciscoMiamiFairbanks
0102676745
17572875
2804990-10
3-3567580

Now the columns of my DataFrame have been sorted by index=3 in ascending order!

Sort Columns By Special Key

In this case, I want to sort my column by the absolute value of it's contents. Check out Fairbanks, right now -10 is the lowest value, however I'll sort on the absolute value so that 5 appears on top.

What is happening under the hood? Pandas is applying a function (much like pandas apply) to each column value. The result of that function will be the thing (or key) that will be sorted on.

In [15]:
df.sort_values(by='Fairbanks', key=pd.Series.abs)
Out[15]:
San FranciscoChicagoFairbanksMiami
17275587
24980-1090
0671024567
356-38075

Now check out how we sorted on Fairbanks with the lowest numbers on top (ascending), and the value 5 is above the value -10. This is because we supplied the key to be the absolute value of the column contents!

Unfortunately, you can't call multiple key functions if you're using multiple columns. You'll need to apply the same function to all columns you're sorting on.

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation