Pandas Query Data With Categorical Variables

Pandas Query is a great way to filter your data. But what if you have categorical variables. In this case, you’re gonna want to treat them a bit differently.

The important part to remember is that you will need to put your python expression inside of a string. After that it’s all peaches and cream.

1. pd.DataFrame.query("your_column == 'your value')

Pseudo code: For a given column or columns return the values that resolve to true

Pandas Query With Categorical Variables

Let’s take a look at a code sample


In [1]:
import pandas as pd
import numpy as np
np.random.seed(seed=42)

Pandas Query With Categorical Variables

Pandas Query is the other way to filter data, the one that you don't usually use but you might want to consider.

In a previous video, we did .query() with continuous variables (Ex: 1.2, 3, .003). In this tutorial we will be looking at categorical variables (Ex: red, blue, green).

We will run through 2 examples:

  1. When a categorical variable matches a single value
  2. When a categorical variable is in a list of items

First, let's create our DataFrame

In [7]:
df = pd.DataFrame.from_dict({"Name": ['Liho Liho', 'Tompkins', 'The Square', 'Chambers'],
                             "Type": ['Bar', 'Restaurant', 'Bar', 'Restaurant'],
                             "Location": ['San Francisco', 'Los Angeles', 'New York', 'San Francisco']})
df
Out[7]:
NameTypeLocation
0Liho LihoBarSan Francisco
1TompkinsRestaurantLos Angeles
2The SquareBarNew York
3ChambersRestaurantSan Francisco

1. When a categorical variable matches a single value

The most confusing part about .query() is that you need to write a string as if it were regular python code. Below we will query for locations that have "Bar" in the 'type' column.

Notice how we need to wrap bar in quotes even through it is already in a string. This is because python will read the greater string, and evaluate it as if it were not a string.

In [8]:
df.query('Type == "Bar"')
Out[8]:
NameTypeLocation
0Liho LihoBarSan Francisco
2The SquareBarNew York

Notice that all the other rows which don't satisfies this query are not returned

2. When a categorical variable is in a list of items

The same goes for when you're using 'in' to see if a variable 'is in' a list of items.

Below we are looking for any establishment that is in San Francisco or Los Angeles.

In [12]:
df.query('Location in ["San Francisco", "Los Angeles"]')
Out[12]:
NameTypeLocation
0Liho LihoBarSan Francisco
1TompkinsRestaurantLos Angeles
3ChambersRestaurantSan Francisco

When in doubt, make sure that you're conditional statement works when you're not using .query() first.

Like this!

In [14]:
df['Location'].isin(["San Francisco", "Los Angeles"])
Out[14]:
0     True
1     True
2    False
3     True
Name: Location, dtype: bool
In [ ]:
 

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation