Pandas Head – Preview Data – DataFrame.head()

Pandas Head might be our single most used function in the library. It is essential for quickly testing to ensure your data is correct, especially with a notebook type environment.

Pandas Head will return the top n-number of rows of your DataFrame. It is essential for checking to make sure your data is what you think it is. It’s close cousin, Pandas Tail will return the bottom n-number of rows.

1. pd.DataFrame.head(n=number_of_rows)

Pseudo code: Return the top n-rows of a Pandas DataFrame.

Pandas Head

Pandas Head - Return the top N rows from your DataFrame. It is very useful for previewing your data.

Head Parameters

.head() only has one parameter and it’s super easy: How many rows do you want to preview?

  • n (Default=5): The number of rows you’d like to preview. By default Pandas will show you 5 rows. I most often use the default, but occasionally I’ll only do n=1 or n=10.

Pandas Tail is the exact same thing as Pandas Head but you’ll be returned with the last rows of a DataFrame instead of first rows.

If you wanted a random sample of N rows, then check out Pandas Sample.

Let’s look at an example


In [1]:
import pandas as pd
import numpy as np

Pandas Head

Pandas Head is pretty straight forward: It's a great way to view the first N rows of your DataFrame. Let's check out how to use .head() and a few other ways to pull rows from your DataFrame: .tail() and .sample()

Examples:

  1. Pandas Head - Default on DataFrame + Series
  2. Pandas Head - Custom N Rows
  3. Pulling Rows from your DataFrame: .tail() and .sample()

But first, let's create a tall DataFrame

In [2]:
np.random.seed(seed=42)

num_students = 1000

df = pd.DataFrame(data=np.random.randint(0,10,(num_students,3)),
                  columns=('Score1', 'Score2', 'Score3'),
                 index=["Student{}".format(x) for x in range(1, num_students+1)])

print ("Your DataFrame is {:,} rows long".format(len(df)))
Your DataFrame is 1,000 rows long

1. Pandas Head - Default

Let's check out the top rows of our dataset. In order to do this we need to call .head() on our DataFrame. You can do this both on a DataFrame and a Series.

Notice how Pandas will return 5 rows with .head() by default.

In [3]:
df.head()
Out[3]:
Score1Score2Score3
Student1637
Student2469
Student3267
Student4437
Student5725
In [4]:
df['Score2'].head()
Out[4]:
Student1    3
Student2    6
Student3    6
Student4    3
Student5    2
Name: Score2, dtype: int64

2. Pandas Head - Custom N Rows

If you wanted to see the first N rows of your DataFrame then all you need to do is pass in an integer into .head().

Here I want to see the first 3 rows of my DataFrame. You can either specify n=3, or simply pass in 3.

In [5]:
df.head(n=3)
Out[5]:
Score1Score2Score3
Student1637
Student2469
Student3267
In [6]:
df.head(3)
Out[6]:
Score1Score2Score3
Student1637
Student2469
Student3267

3. Pulling Rows from your DataFrame: .tail() and .sample()

Say you didn't want to see the first top rows, but you wanted to see the bottom rows, or even a random sample of rows. This is where .tail() and .sample() come in.

.tail() will show you the 'tail' (or bottom rows) of your dataset

In [7]:
df.tail()
Out[7]:
Score1Score2Score3
Student996527
Student997108
Student998206
Student999944
Student1000086
In [8]:
df.tail(2)
Out[8]:
Score1Score2Score3
Student999944
Student1000086

.sample() will return a random sample of rows. By default you'll only get 1 sample, but you can also specify N rows.

In [9]:
df.sample()
Out[9]:
Score1Score2Score3
Student152079
In [10]:
df.sample(3)
Out[10]:
Score1Score2Score3
Student936368
Student910479
Student984031

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation