Pandas Apply – pd.DataFrame.apply()

Pandas Apply is a Swiss Army knife workhorse within the family.

Pandas apply will run a function on your DataFrame Columns, DataFrame rows, or a pandas Series. This is very useful when you want to apply a complicated function or special aggregation across your data.

Here’s an example:

YourDataFrame.apply(yourfunction, axis=0)

Pseudo code: Iterate through a DataFrame’s columns or rows, and apply a certain function to the data.

Example: Below we show two examples of how apply iterates through a DataFrame. Either column by column, or row by row.

Pandas apply explanation. You can iteration through a dataframe's columns or a dataframe's row.

Keep in mind! When apply “receives” a column or a row, it’s actually receiving a series of data, not a list. So when you’re working with your custom functions, make sure you treat your data with it’s index.

One very common use case for .apply() is to use pandas apply lambda. This is when you use a python lambda function to iterate through your data. Python lambda functions are mini little functions that serve a non reusable purpose.

Pandas Apply

Let’s take a look at the different parameters you can pass pd.apply():

  • func (required) – This is where most of the magic happens. You’ll pass a function into ‘func’ which will then get applied to your data. You can use a custom function (below) or use an out of the box function.
  • Axis (Default 0) – You can set axis to specify whether you want to drop rows, or columns. However, a bit counter intuitive vs other places: Axis = 0 or ‘index’ tells Pandas you want to apply a function to each column. Secondly, axis = 1 or ‘columns’ tells Pandas you want to use a function on each row
  • Raw (Default: False) – You’re telling .apply() if you’re passing it a Series (False), or a ndarray (numpy) instead. Sometimes you’ll be only be applying a quick numpy function to a column. In this case, instead of passing a Pandas Series to apply, you can pass just the values (raw=True) which will speed up you code.
  • Result_type – You’ll likely not use this parameter too often because pandas does some guessing as to what you want. I’ve honestly never had a use for this yet. That being said, you’ll use result_type when you want to switch between a list (‘reduce’) or Series (‘expand’) returned to you.

Here’s a jupyter notebook example of pandas df apply showing how to apply a function to a column in pandas.

In [1]:
import pandas as pd

Pandas Apply

We will run through 3 examples:

  1. How to iterate through columns
  2. How to iterate through rows
  3. Trying out a pandas function with apply

But first, let's create our DataFrame

In [2]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', 25.30)],
           columns=('name', 'type', 'AvgBill')
                 )
df
Out[2]:
nametypeAvgBill
0Foreign CinemaRestaurant289.0
1Liho LihoRestaurant224.0
2500 Clubbar80.5
3The Squarebar25.3

Then I'm going to create a very quick function that simply prints out what is passed to apply

In [10]:
def print_outcome(x):
    print (type(x))
    print (x)
    print ()

Iterating through columns!

When you want to iterate through columns, make sure that you have axis=0, which is also the default

In [13]:
df.apply(print_outcome, axis=0)
<class 'pandas.core.series.Series'>
0    Foreign Cinema
1         Liho Liho
2          500 Club
3        The Square
Name: name, dtype: object

<class 'pandas.core.series.Series'>
0    Restaurant
1    Restaurant
2           bar
3           bar
Name: type, dtype: object

<class 'pandas.core.series.Series'>
0     289
1     224
2    80.5
3    25.3
Name: AvgBill, dtype: object

Out[13]:
name       None
type       None
AvgBill    None
dtype: object

Notice that apply went through column by column and printed out a Series. These series contain the values of each row in that column.

Iterating through rows!

Let's do the same thing, but iterate through rows (axis=1) instead of columns.

In [14]:
df.apply(print_outcome, axis=1)
<class 'pandas.core.series.Series'>
name       Foreign Cinema
type           Restaurant
AvgBill               289
Name: 0, dtype: object

<class 'pandas.core.series.Series'>
name        Liho Liho
type       Restaurant
AvgBill           224
Name: 1, dtype: object

<class 'pandas.core.series.Series'>
name       500 Club
type            bar
AvgBill        80.5
Name: 2, dtype: object

<class 'pandas.core.series.Series'>
name       The Square
type              bar
AvgBill          25.3
Name: 3, dtype: object

Out[14]:
0    None
1    None
2    None
3    None
dtype: object

Same thing, but this time we iterated through the rows and we were returned a series, one value for each column. Notice how the column names ended up being the row index

Pandas function with apply

I want to concatenate the series of values in to 1 string, separated by a ",". In order to do this, I'll use a custom function.

In [47]:
def combine_strings(row):
    return "{}, {}, {}".format(row['name'], row['type'], row['AvgBill'])

df.apply(combine_strings, axis=1)
Out[47]:
0    Foreign Cinema, Restaurant, 289.0
1         Liho Liho, Restaurant, 224.0
2                  500 Club, bar, 80.5
3                The Square, bar, 25.3
dtype: object

In my custom function above, I had an argument within combine_strings, 'row', which would be a series of values from our dataframe. I then parsed out each column by it's name (name, type, AvgBill) and combined them together using python's format.

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation