You’re anything but average! Jokes aside, Pandas Mean is a fundamental function that is in every data scientist’s, analyst’s, and data monkey’s toolkit.
Pandas Mean will return the average of your data across a specified axis. If the function is applied to a DataFrame, pandas will return a series with the mean across an axis. If .mean() is applied to a Series, then pandas will return a scalar (single number).
Pseudo Code: With your Series or DataFrame, return the average of the values across a specified axis
No matter what field of data you’re doing, you’re going to need to have a good grasp on mean, median, and mode. With mean, python will return the average value of your data.
You must choose which axis you want to average, but this is a wonderful feature. You can choose across rows or columns.
Mean is also included within Pandas Describe.
The most important decision you need to make is with
axis — Do you want to take the average across rows or columns?
- axis = You can choose if you want to take the average across columns (axis=’index’ or 0) or rows (axis=’columns’ or 1).
- skipna (Default: True) = Exclude the NA/null values when computing the result. If you set skipna=False and there is an NA in your data, pandas will return “NaN” for your average.
- level = If you have a multi index, then you can pass the name (or int) of your level to compute the mean.
- numeric_only: You’ll only need to worry about this if you have mixed data types in your columns. Leave this as default to start.
Now the fun part, let’s take a look at a code sample
import pandas as pd import numpy as np
Pandas will take the average of your data across rows or columns. You pick.
Let's run through 3 examples:
- Mean across columns
- Mean across rows
- Skipping NAs
But first, let's create our DataFrame
np.random.seed(seed=42) df = pd.DataFrame(data=np.random.randint(0,100,(4,3)), columns=('Monday', 'Tuesday', 'Wednesday'), index=('Bob', 'Sally', 'Frank', 'Claire') ) df
1. Mean across columns¶
Unfortunately when referring to 'rows' and 'columns' in pandas can get confusing. The way I think about is is 'what axis do you want to cross to take the mean?'
Meaning, if you want to cross over rows, and take the column average, then you need to set axis='index' or axis=0. This mean's you jump down across rows and take the column average.
Notice here how axis='index' and I get the column average.
Monday 69.50 Tuesday 81.25 Wednesday 51.75 dtype: float64
2. Mean across rows¶
On the flip side, if you wanted to jump to the right across columns then you need to set your axis='columns' or 1. This essentially means you're taking the row averages.
Here the axis='columns' so I get the row average.
Bob 52.333333 Sally 50.333333 Frank 80.666667 Claire 86.666667 dtype: float64
3. Skipping NAs¶
Finally let's take a look at how to skip NAs in .mean(). By default pandas will skip these for you, but say you wanted a sensitive .mean() function -- meaning you wanted it to throw an error if there was a 'NA' value. Then set skipna=False
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0), ('Liho Liho', 'Restaurant', 224.0), ('500 Club', 'bar', 80.5), ('The Square', 'bar', np.nan)], columns=('name', 'type', 'AvgBill') ) df
AvgBill 197.833333 dtype: float64
AvgBill NaN dtype: float64
Check out more Pandas functions on our Pandas Page