Pandas Group By, the foundation of any data analysis. This is a MUST know function when working with the pandas library. 95% of analysis will require some form of grouping and aggregating data.

This post will focus directly on *how* to do a group by in Pandas. To learn *what* is a group by check out our future business analytics post.

Pandas Group By will aggregate your data around distinct values within your ‘group by’ columns. The result will apply a function (an aggregate function) to your data.

When it comes to group by functions, you’ll need two things from pandas

- The
**group by**function – The function that tells pandas how you would like to consolidate your data. - An
**aggregate**function – The function that tells pandas what you would like to do with your consolidated data.

We are going to go over** 3 ways to do Group Bys in Pandas**. They are listed from quickest to most complete. Pick the one that works best for your situation.

1. pd.DataFrame.groupby('column_to_group_by') \ ['col_to_agg'] \ .your_aggregate_function() 2. pd.DataFrame.groupby('column_to_group_by').agg({ 'col_to_agg1' : aggregate_function1, 'col_to_agg2' : aggregate_function2, }) 3. pd.DataFrame.groupby('column_to_group_by').agg( new_column_name1=pd.NamedAgg(column='col_to_agg1', aggfunc=aggfunc1), new_column_name2=pd.NamedAgg(column='col_to_agg2', aggfunc=aggfunc2) )

**Pseudo code: For a given column, find the distinct groups within that column. Then combine all of the values in those groups from another column and apply a function.**

## Pandas Group By – 3 Methods

### Method 1 – Single Aggregate Function

1. pd.DataFrame.groupby('column_to_group_by') \ ['col_to_agg'] \ .your_aggregate_function()

In method 1 we are doing the most simple type of group by in pandas. This method only has 1 aggregate function. You start by defining the column (or columns) you’d like to group by, then the column you’d like to aggregate, then specify your aggregate function.

This method works great when you’re looking for a quick group by. The result returned will be a Pandas Series.

### Method 2 – Multiple Aggregate Functions

2. pd.DataFrame.groupby('column_to_group_by').agg({ 'col_to_agg1' : aggregate_function1, 'col_to_agg2' : aggregate_function2, })

Say you have different aggregate functions you’d like to apply to different columns. The way we like to do this is with method 2: using a dictionary within .agg(). Unfortunately with this method you can not specify your new column names.

To do this, you’ll need to call the column you want to group by, the column(s) you want to aggregate, and then finally an aggregate function for each column. If you want to see a list of potential aggregate functions, check out the Pandas Series documentation.

### Method 3 – Multiple Aggregate Functions with new column names

3. pd.DataFrame.groupby('column_to_group_by').agg( new_column_name1=pd.NamedAgg(column='col_to_agg1', aggfunc=aggfunc1), new_column_name2=pd.NamedAgg(column='col_to_agg2', aggfunc=aggfunc2) )

In method 3 you’ll get to specify your new column names. This method is the most explicit and flexible. It takes the longest to write out, but the output is clear so it’s a winner in our book.

Let’s take a look at code examples

```
import pandas as pd
```

### Pandas Group By¶

We have 3 main methods we use to do group bys. Each has their own advantages and disadvantages. Check out each one and see which works best for your case.

We will run through 3 examples:

- Method 1 - Quick and simple group by
- Method 1 - Quick and simple group by with multiple columns
- Method 2 - Different columns with different aggregate functions
- Method 3 - Different columns with different aggregate functions and new column names

But first, let's create our DataFrame

```
df = pd.DataFrame([('Liho Liho', 200, 45.32),
('Chambers', 350, 65.33),
('The Square', 15, 12.45),
('Tosca Cafe', 35, 180.34),
('Liho Liho', 98, 145.42),
('Chambers', 205, 25.35)],
columns=('name', 'Customers', 'AvgBill')
)
df
```

### 1. Method 1 - Quick and simple group by.¶

The simplest group by takes a single 'group by column,' single 'column to aggregate' and finally an aggregate function. A series is returned to you

- Group By Column = 'name'
- Column To Aggregate = 'AvgBill'
- Aggregate function = .sum()

```
df.groupby('name')['AvgBill'].sum()
```

### 2. Method 1 - Quick and simple group by with multiple columns¶

In order to do an aggregate function on multiple columns, simply pass a list of columns into your 'columns to aggregate.'

- Group By Column = 'name'
- Column To Aggregate = ['Customers', 'AvgBill']
- Aggregate function = .sum()

```
df.groupby('name')[['Customers', 'AvgBill']].sum()
```

### 3. Method 2 - Different columns with different aggregate functions¶

In order to apply different aggregate functions to different columns, you'll need to use the .agg() function. This helpful function allows you to specify each column and the specific function you'd like to apply to it.

You'll need to pass a *dictionary* to your .agg() function. Keys=Column name you'd like to aggregate, values=aggregate function.

Here I'm taking the MAX number from the Customers column and the mean of the AvgBill Column

```
df.groupby('name').agg({
'Customers' : max,
'AvgBill' : pd.Series.mean
})
```

### 4. Method 3 - Different columns with different aggregate functions and new column names¶

This final method, although long, is nice because you can rename the output columns.

Example for the first columns **New column name** = max_customers (Note: I agree it's weird that a name of something is pulled from a variable vs a string) **column** = Column to aggregate: Customers **aggfunc** = The agg function you'd like to apply - max

```
df.groupby('name').agg(
max_customers = pd.NamedAgg(column='Customers', aggfunc=max),
mean_avg_bill = pd.NamedAgg(column='AvgBill', aggfunc=pd.Series.mean),
)
```

Check out more Pandas functions on our Pandas Page