Pandas Where – pd.DataFrame.where()

Pandas Where will replace values where your condition is False. It is useful when you have values that do not meet a criteria, and they need replacing.

We don’t often use this function, but it can be a handy one liner instead of iterating through a DataFrame or Series with .apply().

1. pd.DataFrame.where(cond=condition_to_check, other="Value To Fill")

Pseudo code: Where a condition is false, replace a value.

Pandas Where

Pandas Where - Replace a value in a DataFrame or Series Where a condition is False.

Where

.where() has two main parameters, cond and other. Both of these are flexible to take Series, DataFrame or callable.

  • cond: Which stands for condition. This can either be a Series, DataFrame, or callable (function). WHERE this condition is false, pandas will replace values. I know, it’s a bit counter intuitive.
  • Other What to replace your values with. This again can be a scalar (single value), Series, DataFrame, or callable.
  • Other Parameters: To check out the other parameters that are not used that often, see the official documentation.

Let’s run through each iteration of the above parameters


In [1]:
import pandas as pd
import numpy as np

Pandas Where

We will run through 3 examples:

  1. Pandas Where with DataFrame condition
  2. Pandas Where with Series condition
  3. Pandas Where with Callable condition
  4. Pandas Where with Callable other
  5. Pandas Where with DataFrame other

But first, let's create our DataFrame of students and test scores

In [2]:
np.random.seed(seed=42)

df = pd.DataFrame(data=np.random.randint(0,100,(4,3)),
                  columns=('Test1', 'Test2', 'Test3'),
                  index=['Bob', 'Sally', 'Frank', 'Patty']
                 )
df
Out[2]:
Test1Test2Test3
Bob519214
Sally716020
Frank828674
Patty748799

1. Pandas Where with DataFrame condition

To start off, let's use a DataFrame in our condition parameter. I want to see all test scores that are above 90. This means to evaluate the scores as FALSE I need to look for all scores BELOW 90.

I'll replace the >90 with "A+." Notice how I first evaluate the DataFrame, check out where the FALSES are. These will get replaced.

In [3]:
df<90
Out[3]:
Test1Test2Test3
BobTrueFalseTrue
SallyTrueTrueTrue
FrankTrueTrueTrue
PattyTrueTrueFalse
In [4]:
df.where(df<90, "A+")
Out[4]:
Test1Test2Test3
Bob51A+14
Sally716020
Frank828674
Patty7487A+

2. Pandas Where with Series condition

Say you wanted to fill an entire row based off of the condition of a single Pandas Series. You can also pass a series as the conditional.

Here, I'll replace a all values in the row where Test3<90

In [5]:
df['Test3']<90
Out[5]:
Bob       True
Sally     True
Frank     True
Patty    False
Name: Test3, dtype: bool
In [6]:
df.where(df['Test3']<90, "A+")
Out[6]:
Test1Test2Test3
Bob519214
Sally716020
Frank828674
PattyA+A+A+

3. Pandas Where with Callable condition

You can also pass a callable condition to your where cond parameter. This means you can pass a function to evaluate your value. Here I'll use a lambda function, but you can use a regular function as well.

In [7]:
df.where(lambda x: x<75, "A+")
Out[7]:
Test1Test2Test3
Bob51A+14
Sally716020
FrankA+A+74
Patty74A+A+

4. Pandas Where with Callable other

Say you wanted to replace your FALSE conditions with a dynamic value. You can also pass a callable into your other parameter

In [8]:
df.where(df<90, lambda x: x**2)
Out[8]:
Test1Test2Test3
Bob51846414
Sally716020
Frank828674
Patty74879801

5. Pandas Where with DataFrame other

Finally, you can pass a DataFrame into your other parameter to replace your False values. Think of this as selectively copying over values from another DataFrame to your originally DataFrame.

I'll start by making another DataFrame. Notice how only the values that are FALSE in the condition below are the ones that are replaced.

In [9]:
np.random.seed(seed=41)

df2 = pd.DataFrame(data=np.random.randint(0,100,(4,3)),
                  columns=('Test1', 'Test2', 'Test3'),
                  index=['Bob', 'Sally', 'Frank', 'Patty']
                 )
df2
Out[9]:
Test1Test2Test3
Bob643512
Sally988065
Frank892385
Patty26563
In [10]:
df.where(df<90, df2)
Out[10]:
Test1Test2Test3
Bob513514
Sally716020
Frank828674
Patty74873

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation