Pandas Cross Tab – pd.crosstab()

For all intensive purposes, you can think of Pandas Crosstab as the same things as Pandas Pivot Table.

When to use pd.crosstab(): When you are starting with non-DataFrame based data. This could be a list of lists or dictionaries.

When to use pd.pivot_table(): When you are starting with a Dataframe.

pandas.crosstab(index=your_new_index(rows),
                columns=your_pivoted_columns,
                values=your_new_values)

Pseudo Code: With your list of arrays, construct a pivot table and return a DataFrame

Pandas Crosstab

Pandas Crosstab - Very similar to pandas pivot table

In fact, Pandas Crosstab is so similar to Pandas Pivot Table, that crosstab uses pivot table within it’s source code.

Resample Main Parameters

  • index – This is what your want your new rows to be aggregated (or grouped) on. This will generally be the main subject of your data analysis. In the picture above, we chose index=list_1 which were peoples names.
  • columns – This is the parameter you want your index to be split or cut by. This is generally a secondary column, or a spectrum of categories you want to further look into.
  • values – The values you want to show at the intersection of your index and column. By default, this will be a frequency count (how often values occur. However, if you specify an aggfunc, then you’ll summarize your values according to the function you specified.
  • aggfunc – Short for aggregate function, this is how you will summarize your values. Often times this will be sum, or average. Most commonly used ones include: min, max, average, sum, standard deviation, or specifying your own.
  • Other Parameters – For a list of other lesser used parameters. Check out the official documentation.

Now the fun part, let’s take a look at a code sample

In [1]:
import pandas as pd

Pandas Crosstab

Pandas crosstab is extremely similar to pandas pivot table. In fact, cross tab uses pivot table in its source code.

You use crosstab when you want to transform 3 or more columns into a summarization table. It's mostly used when your data does not start as a DataFrame. But rather lists of items.

Examples we'll run through:

  1. Simple crosstab exercises
  2. Simple crosstab exercises with sum aggregate function
  3. Exploring crosstab parameters

But first, let's start with a couple of lists of restaurants in San Francisco:

In [2]:
res_names = ['FC', 'LL', 'FC', '5C', 'TS', 'FC', '5C']
purchase_type = ['Food', 'Food', 'Food', 'Drink', 'Food', 'Drink', 'Drink']
price = [12, 25, 32, 10, 15, 22, 18]

print ('Restaurant Names: {}'.format(res_names))
print ('Purchase Type: {}'.format(purchase_type))
print ('Price: {}'.format(price))
Restaurant Names: ['FC', 'LL', 'FC', '5C', 'TS', 'FC', '5C']
Purchase Type: ['Food', 'Food', 'Food', 'Drink', 'Food', 'Drink', 'Drink']
Price: [12, 25, 32, 10, 15, 22, 18]

1. Simple crosstab exercises

When you create a crosstab table, you'll need to specify what you want on the rows, how to split the columns, and what you'd like to include in the values.

Notice that I need to pass a list of lists to index and columns. To do this I'll wrap my res_names in a list.

In [3]:
pd.crosstab(index=[res_names], columns=[purchase_type])
Out[3]:
col_0DrinkFood
row_0
5C20
FC12
LL01
TS01

2. Simple crosstab exercises with sum aggregate function

By default (in the example above) crosstab will count the frequencies in which an intersection happens. Notice how '5C' and 'Drink' intersection happens twice, so it's listed as '2' in the values.

But what if we wanted to summarize the price by summing them up? You can do that by passing values and aggfunc

In [4]:
pd.crosstab(index=[res_names], columns=[purchase_type], values=price, aggfunc=sum)
Out[4]:
col_0DrinkFood
row_0
5C28.0NaN
FC22.044.0
LLNaN25.0
TSNaN15.0

3. Exploring crosstab parameters

Crosstab comes with many other parameters you can use. Check out the documentation for reference.

In [23]:
pd.crosstab(index=[res_names],
            columns=[purchase_type],
            values=price,
            aggfunc=lambda x: x.sum()**2, # Setting a custom agg function
            rownames=["Restaurants"], # Giving a title to my rows
            colnames=['Food Types'], # Giving a title to my columns
            margins=True, # Adding margins (Subtotals on the ends)
            margins_name="Totals") # Give my subtotals a title
Out[23]:
Food TypesDrinkFoodTotals
Restaurants
5C784.0NaN784
FC484.01936.04356
LLNaN625.0625
TSNaN225.0225
Totals2500.07056.017956

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation