Often when you’re doing exploratory data analysis (EDA), you’ll need to get a better *feel* for a column. One of the best ways to do this is to understand the *distribution of values* with you column. This is where *Pandas Value Counts* comes in.

Pandas

function returns a Series containing the counts (number) of unique values in your Series. By default the resulting series will be in descending order so that the first element is the most frequent element.** Series.value_counts()**

1. YourDataFrame['your_column'].value_counts() 2. YourSeries.value_counts()

I usually do this when I want to get a bit more intimate with my date. My workflow goes:

- Run pandas.Series.nunique() first – This will count how many unique values I have. If it’s +100K it’ll slow down my computer once I call value_counts
- Run pandas.Series.value_counts() – This will tell me which values appear most frequently

**Pseudo code**: Take a DataFrame column (or Series) and find the distinct values. Then count how many times each distinct value occurs.

*Hint: You can also do this across unique rows in a DataFrame by calling pandas.DataFrame.value_counts()*

## Pandas Value Counts

By default, you don’t need to input any parameters when counting the values. Let’s take a look at the different parameters you can pass pd.Series.value_counts():

**normalize****(Default: False)**: If true, then you’ll return the*relative frequencies*of unique values. This means that instead of returning counts, you Series returned will be the percent each unique value makes up of the whole series.: This will return your values in the frequency order. The exact order is determined by the next parameter (ascending)**sort**(Default: True)If true, ascending will return your values in ascending order (lowest ones on top). By default your highest values appear first.**ascending**(Default: False):: Sometimes you’re working with a continuous variable (think a range of numbers vs discrete labels). In this case you’ll have too many unique values to pull signal from your data. If you set bins (Ex: [0, .25, .5, .75, 1], you’ll assign your values a bin based off of where they fall. value_counts will then count the bin frequency vs distinct value frequency. Check out the video or code below for more.**bins**This will either count (False) or not count (True) your NaNs in your Series.**dropna**(Default: True):

Here’s a Jupyter notebook showing how to set index in Pandas

```
import pandas as pd
```

### Pandas Value Counts¶

Pandas Value Counts will count the frequency of the unique values in your series. Or simply, "count how many each value occurs."

We will run through 3 examples:

- Counting frequency of unique values in a series
- Counting
*relative*frequency of unique values in a series (normalizing) - Counting a continuous series using bins.

First, let's create our DataFrame

```
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
('Liho Liho', 'Restaurant', 224.0),
('500 Club', 'bar', 80.5),
('The Square', 'bar', 25.30),
('Liho Liho', 'Restaurant', 124.0),
('The Square', 'bar', 53.30),
('Liho Liho', 'Restaurant', 324.0),
('500 Club', 'bar', 40.5),
('Salzburg', 'bar', 123.5)],
columns=('name', 'type', 'AvgBill')
)
df
```

### Counting frequency of unique values in a series¶

Then let's call value_counts on our "name" column. This will look at the distinct values within that column, and count how many times they appear.

```
df['name'].value_counts()
```

We could also have the series returned in reverse order (lowest values first) by setting ascending=True. Remember, ascending means to go up, so you'll start low and go up to the higest values

```
df['name'].value_counts(ascending=True)
```

### Counting relative frequency of unique values in a series (normalizing)¶

Say you didn't want to get the count of each unique value, but rather see how frequent each value appears compared to the *whole series.* In order to do this, you'll set normalize=True

```
df['name'].value_counts(normalize=True)
```

Let's break this down quickly. There are a total of 9 items in the Series (run "len(df)" if you don't believe me.)

From value_counts above, we saw that "Liho Liho" appeared 3 times. Since it appears 3 times out of 9 rows, we can do 3 / 9 which equals .333. This is the relative frequency of "Liho Liho" in this series

### Counting relative frequency of unique values in a series (normalizing)¶

Now let's say we have a longer series of continous values. Think of a continous values as a list of numbers that don't serve as labels. For example: [.2, ,.23, .43, .85, .13]. Say we thought that .2 and .23 were close enough and wanted to count them together. Unfortunately, if we did value_counts regularly, we would count .2 and .23 as separate values.

If you want to group them together, this is where *bins* comes in. In order to create a list of random continuous numbers, I'm going to use numpy

```
import numpy as np
np.random.seed(seed=42) # To make sure the same values appear each time
random_numbers = np.random.random(size=(10,1), )
random_numbers = pd.DataFrame(random_numbers, columns=['rand_num'])
random_numbers
```

Now I want split my data into 3 bins and count how many times values appear in those bins.

```
random_numbers['rand_num'].value_counts(bins=3)
```

In this case, bins is returning buckets that are evenly spaced. But what if you wanted to create your own buckets? No problem, just pass a list of values that describe your buckets

```
random_numbers['rand_num'].value_counts(bins=[0,.2,.6, 1])
```

Check out more Pandas functions on our Pandas Page