Pandas Index Max – pd.DataFrame.idxmax()

Pandas idxmax is a very convenient function that will return the index of the maximum value across a specified axis. It is especially useful when trying to figure out the max column values for your rows OR maximum row value for your columns.

Say you have a list of students as your rows, and test scores in your columns. Which test was the highest for each student? .idxmax() is a one-liner that’ll tell you.

1. pd.DataFrame.idxmax(axis='your_axis_for_max')

Pseudo code: For a given axis, tell me where the highest value is on the other axis.

Pandas idxmax

Pandas IDXMAX - Index Max - Find the highest value for a given axis.

Index Max Parameters

Small tip: idxmax works best when your index is consistent. If you start using set_index() or reset_index() then your idxmax might give you back confusing results.

  • axis: Which axis would you like to find your highest value intersections for? Axis=1 means you want to find the highest column for each row. Axis=0 means you want to find the highest row for each column!
  • skipna (Default=True): Rarely used. By default pandas will skip the NA values. If you don’t want to skip them, set skipna=False.

Let’s look at a fun example of students and test scores

In [1]:
import pandas as pd
import numpy as np

Pandas idxmax

Pandas idxmax will tell you where your axis highest values are on the other axis. It is super useful when using your columns as observation points (vs categorical variables).

We will run through 3 examples:

  1. Find which row's column has it's highest value
  2. Find which column's row has it's highest value
  3. Using a larger dataframe, find which students scored highest on a each test.

But first, let's create our DataFrame

In [2]:

df = pd.DataFrame(data=np.random.randint(0, 100, (4,3)),
           columns=['Test1', 'Test2', 'Test3'],
            index=['Bob','Sally', 'Frank', 'Patty']

1. Find which row's column has it's highest value

In order to find out which column has the highest value for a given row, we need to call idxmax(axis=1). The resulting series will tell us which column/row intersection contains the highest value.

Notice below how the series tells us which test was higest for each of our students.

In [3]:
Bob      Test2
Sally    Test1
Frank    Test2
Patty    Test3
dtype: object

2. Find which column's row has it's highest value

Say we wanted to find the inverse, which row has the higest value for each column? Another way, which student scored highest on each test? To do this, set axis=0 to switch to a column view.

Now we only have 3 items in our resulting series, one for each test with the top student in each.

In [4]:
Test1    Frank
Test2      Bob
Test3    Patty
dtype: object

3. Using a larger dataframe, find which students scored highest on a each test.

Let's expand our student base. We will create a dataframe with 100 students and 10 tests to see which ones did the best.

Wow that's a lot of students and test scores. Let's also find how many there are.

In [5]:
num_students = 100
num_tests = 10

df = pd.DataFrame(data=np.random.randint(0, 100, (num_students,num_tests)),
           columns=["Test{}".format(x) for x in range(1, num_tests + 1)],
            index=["Student{}".format(x) for x in range(1, num_students + 1)]

print ("There are {:,} test scores".format(len(df)* len(df.columns)))
There are 1,000 test scores

100 rows × 10 columns

In [6]:
Student1      Test2
Student2      Test2
Student3      Test9
Student4      Test4
Student5      Test9
Student96     Test6
Student97     Test3
Student98     Test8
Student99     Test8
Student100    Test9
Length: 100, dtype: object

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation