I once had a data teacher told me, “You need to get intimate with your data.” One of the best ways to do this is through pandas describe.
pandas.DataFrame.describe() pandas.Series.describe()
Pandas Describe does exactly what it sounds like, describe your data. Describe will return a series of descriptive information. This Series will tell you:
- The count of values
- The number of unique values
- The top (most frequent) value
- The frequency of your top value
- The mean, standard deviation, min and max values
- The percentiles of your data: 25%, 50%, 75% by default
Pseudo Code: With your Series or DataFrame, return a Series that tell us what the distribution of values looks like.
Pandas Describe
In order to evaluate a dataset, you need to get a feel for your data. This means you need to get an intuitive sense of how your data is distributed and what spectrum of values you have. This is the first step to launching a successful data analysis.
Often times the process of ‘getting to know your data’ is called Exploratory Data Analysis (EDA).

Pandas Describe Parameters
The standard deviation function is pretty standard, but you may want to play with a view items.
- percentiles = By default, pandas will include the 25th, 50th, and 75th percentile. However you can tell pandas whichever ones you want. Simply pass a list to
percentiles
and pandas will do the rest. - include = You may want to ‘describe’ all of your columns, or you may just want to do the numeric columns. By default, pandas will only describe your numeric columns. Select ‘all’ to include all columns.
- exclude = The inverse of include, you can tell pandas which column data types you would like to exclude. Simply pass a list of datatypes you would like to exclude here.
- datetime_is_numeric: By default pandas will treat your datetimes as objects. Meaning, Pandas will not calculate things like ‘average time/date’. However, if you select
datetime_is_numeric=True
then pandas will apply the min, max, and percentiles to your datetimes.
Now the fun part, let’s take a look at a code sample
import pandas as pd
Pandas Describe¶
Pandas Describe will do all of the hard work for you. Well...most of it. Calling .describe() on your dataset will produce a series of descriptive statistics that allow you to get to know your data better.
We will run through 3 examples:
- Default Describe - Let's see what comes out by default
- Including all columns via 'include'
- Treating datetimes like numbers via datetime_is_numeric=True
But first, let's user our San Francisco Tree dataset as our DataFrame. You can download this dataset at the github link below. Watch out, it's 193K rows.
df = pd.read_csv('../data/Street_Tree_List.csv', parse_dates=['PlantDate'])
df = df[['TreeID', 'qSpecies', 'PlantDate', 'DBH']]
df.rename(mapper={'DBH':"tree_depth"}, axis=1, inplace=True)
df.head()
1. Default Describe - Let's see what comes out by default¶
By default, .describe() will tell us a series of descriptive statistics, let's see what they are.
You can see that although we have 4 columns in our dataset, only 2 of them are returned by default. This is because .describe() will only return the numeric column by default.
df.describe()
2. Including all columns via 'include'¶
If you wanted to include all columns in describe, then set include='all'.
You'll notice that pandas needs to put 'NaN' for descriptive statistics that do not apply to non-numeric columns like strings. For example: 'qSpecies' does not have a 25th percentile.
df.describe(include='all')
3. Treating datetimes like numbers via datetime_is_numeric=True¶
Finally, let's end by calling .describe() on a Series. We'll do it on our 'PlantDate' column and see the difference between treating dates like objects and treating them like numbers.
Notice how in the first example we do not get percentiles or min/max. But in the second example we do.
df['PlantDate'].describe()
df['PlantDate'].describe(datetime_is_numeric=True)
Check out more Pandas functions on our Pandas Page