Ah, sum. The backbone of any good mathematical operation. Pandas sum gives you the power to sum entire rows or columns.
You can use this function on a DataFrame or a Series.
I use Pandas Sum for series addition mostly. Especially when counting the number of “True” entries when filtering my rows.
Pseudo Code: With your DataFrame, the summation of your rows or columns.
Sum has simple parameters. 90% of the time you’ll just be using ‘axis’ but it’s worth learning a few more.
- axis – Axis to sum on. The way I remember this is to sum across rows set axis=0, to sum across columns set axis=1.
- skipna (Default: True) – If your dataset has NA/null values, then you may want to skip ’em
- level (Default: None) – In case you have a multi index, you can specify which part of that multi index you want to sum across. Check out an example below.
- numeric_only (Default: None) – This means you’ll only include float, int, and booleans. You’ll find that sometimes your column will have mix datatypes (strings, functions, etc.). This parameter will help you skip those when summing.
- min_count (Default: 0) – The minimum number of entries to in order to compute a sum. If the number of entries is below your minimum, then .sum() will return an NA.
Now the fun part, let’s take a look at a code sample
import pandas as pd
We will run through 3 examples:
- Summing across rows
- Summing across columns
- Multi Level Index Sum
- Summing a Series
But first, let's create our DataFrame
df = pd.DataFrame([(234.0, 289.0), (135.0, 224.0), (23.0, 80.5), (53.0, 25.30)], columns=('AvgBill1', 'AvgBill2') ) df
1. Summing across rows¶
Let's sum across rows. In order to do this I need to set my axis=0. This feels a bit counter intuitive because columns are returned. Remember, you're summing your columns across your rows.
AvgBill1 445.0 AvgBill2 618.8 dtype: float64
2. Summing across columns¶
Let's sum across columns. In order to do this I need to set my axis=1.
Notice how I've summed across my columns, and the result is the total of each row.
0 523.0 1 359.0 2 103.5 3 78.3 dtype: float64
3. Multi Level Index Sum¶
If you have a multi level index, then you can tell pandas which index you'd like to sum across.
Let's make a multi level DataFrame first.
data = (['Apple','Red',3,1.29], ['Apple','Green',9,0.99], ['Pear','Red',25,2.59], ['Pear','Green',26,2.79], ['Lime','Green',99,0.39]) df = pd.DataFrame(data, columns=['Fruit','Color','Count','Price']) df = df.set_index(['Fruit', 'Color']) df
Now let's sum across each level and see what we get. Notice that depending on the level we choose, we will get different values we sum on.
Check out more Pandas functions on our Pandas Page