Pandas Mean – Get Average pd.DataFrame.mean()

You’re anything but average! Jokes aside, Pandas Mean is a fundamental function that is in every data scientist’s, analyst’s, and data monkey’s toolkit.

pandas.DataFrame.mean()
pandas.Series.mean()

Pandas Mean will return the average of your data across a specified axis. If the function is applied to a DataFrame, pandas will return a series with the mean across an axis. If .mean() is applied to a Series, then pandas will return a scalar (single number).

Pseudo Code: With your Series or DataFrame, return the average of the values across a specified axis

pd.Mean()

No matter what field of data you’re doing, you’re going to need to have a good grasp on mean, median, and mode. With mean, python will return the average value of your data.

You must choose which axis you want to average, but this is a wonderful feature. You can choose across rows or columns.

Mean is also included within Pandas Describe.

Pandas Mean - Get the average of your values for your DataFrame or Series

Mean Parameters

The most important decision you need to make is with axis — Do you want to take the average across rows or columns?

  • axis = You can choose if you want to take the average across columns (axis=’index’ or 0) or rows (axis=’columns’ or 1).
  • skipna (Default: True) = Exclude the NA/null values when computing the result. If you set skipna=False and there is an NA in your data, pandas will return “NaN” for your average.
  • level = If you have a multi index, then you can pass the name (or int) of your level to compute the mean.
  • numeric_only: You’ll only need to worry about this if you have mixed data types in your columns. Leave this as default to start.

Now the fun part, let’s take a look at a code sample

In [32]:
import pandas as pd
import numpy as np

Pandas Mean

Pandas will take the average of your data across rows or columns. You pick.

Let's run through 3 examples:

  1. Mean across columns
  2. Mean across rows
  3. Skipping NAs

But first, let's create our DataFrame

In [33]:
np.random.seed(seed=42)

df = pd.DataFrame(data=np.random.randint(0,100,(4,3)),
           columns=('Monday', 'Tuesday', 'Wednesday'),
            index=('Bob', 'Sally', 'Frank', 'Claire')
                 )
df
Out[33]:
MondayTuesdayWednesday
Bob519214
Sally716020
Frank828674
Claire748799

1. Mean across columns

Unfortunately when referring to 'rows' and 'columns' in pandas can get confusing. The way I think about is is 'what axis do you want to cross to take the mean?'

Meaning, if you want to cross over rows, and take the column average, then you need to set axis='index' or axis=0. This mean's you jump down across rows and take the column average.

Notice here how axis='index' and I get the column average.

In [35]:
df.mean(axis='index')
Out[35]:
Monday       69.50
Tuesday      81.25
Wednesday    51.75
dtype: float64

2. Mean across rows

On the flip side, if you wanted to jump to the right across columns then you need to set your axis='columns' or 1. This essentially means you're taking the row averages.

Here the axis='columns' so I get the row average.

In [37]:
df.mean(axis='columns')
Out[37]:
Bob       52.333333
Sally     50.333333
Frank     80.666667
Claire    86.666667
dtype: float64

3. Skipping NAs

Finally let's take a look at how to skip NAs in .mean(). By default pandas will skip these for you, but say you wanted a sensitive .mean() function -- meaning you wanted it to throw an error if there was a 'NA' value. Then set skipna=False

In [39]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', np.nan)],
           columns=('name', 'type', 'AvgBill')
                 )
df
Out[39]:
nametypeAvgBill
0Foreign CinemaRestaurant289.0
1Liho LihoRestaurant224.0
2500 Clubbar80.5
3The SquarebarNaN
In [40]:
df.mean()
Out[40]:
AvgBill    197.833333
dtype: float64
In [41]:
df.mean(skipna=False)
Out[41]:
AvgBill   NaN
dtype: float64

4. Bonus: You can call .mean() on a series too

In [42]:
df['AvgBill'].mean()
Out[42]:
197.83333333333334

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation