Pandas Set Index – pd.DataFrame.set_index()

Sometimes your row numbers as your index just doesn’t cut it and you need to pandas set index on your DataFrame.

You may want to replace your current index with another column in your DataFrame. No problem with pd.DataFrame.set_index()!


You’re usually doing this when you want to set your index to a list of names, or unique ids. For example, you imported a CSV but forgot to set your index_col.

Pseudo code: Take a DataFrame column (or columns) set them as your DataFrame index.

The questions you’ll have to ask yourself are:

  • Do you want to drop your column you’re setting as the index?
  • Do you want the transformation to happen in place? or have your DataFrame returned to you?

Pandas Set Index

Pandas Set Index - Swap one of your DataFrame columns and make it your new index

Let’s take a look at the different parameters you can pass pd.DataFrame.set_index():

  • keys: What you want to be the new index. This is either 1) the name of the DataFrame’s column or 2) A Pandas Series, Index, or NumPy Array of the same length as your DataFrame.
  • drop (Default: True): If set true, this will delete the column you’re referencing as the new index (unless you’re passing a series)
  • append (Default: False): Append=True will attach the column you’re referencing to your current index. Therefore creating a multi-index. My guess is most people will never use this.
  • inplace (Default: False): If set true, the operation will happen on the DataFrame inplace and will not return anything. If false, you’ll get your DataFrame with the new index returned to you.
  • verify_integrity (Default: False): This will check your new index for duplicates. You won’t need to touch this unless you’re getting into advanced performance territory.

Here’s a Jupyter notebook showing how to set index in Pandas

In [1]:
import pandas as pd

Pandas Set Index | pd.DataFrame.set_index()

We will run through 3 examples:

  1. Setting a new index from an existing column
  2. Setting a new index from an new np.array
  3. Setting a new index with append=True so we create a multi index

But first, let's create our DataFrame

In [2]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', 25.30)],
           columns=('name', 'type', 'AvgBill')
0Foreign CinemaRestaurant289.0
1Liho LihoRestaurant224.0
2500 Clubbar80.5
3The Squarebar25.3

Next I'm going to set my index as the "name" column in my DataFrame.

Setting a new index from an existing column

In [3]:
Foreign CinemaRestaurant289.0
Liho LihoRestaurant224.0
500 Clubbar80.5
The Squarebar25.3

Check it out! My name column has now become my index. Also notice that the 'name' column has been dropped from our original dataframe. If we set drop=False then it would have stayed and also replaced the index.

Setting a new index from an new np.array

Now I'm going to set my index, but in this case, instead of a DataFrame column I'm going to create a new list from scratch to be the new index.

Warning: Make sure to use a pd.Series or a np.ndarray when you pass it to set_index. A simple list won't work.

In [4]:
MyRes1Foreign CinemaRestaurant289.0
MyRes2Liho LihoRestaurant224.0
MyRes3500 Clubbar80.5
MyRes4The Squarebar25.3

Now my indexes are set to the creatively named pd.Series I passed.

Setting a new index with append=True so we create a multi index

Now I'm going to set append=True. This will set the new index I'm supplying as well as keep the old one. Previously the old one was dropped.

In [5]:
df.set_index(pd.Series(["MyRes1","MyRes2","MyRes3","MyRes4"]), append=True)
0MyRes1Foreign CinemaRestaurant289.0
1MyRes2Liho LihoRestaurant224.0
2MyRes3500 Clubbar80.5
3MyRes4The Squarebar25.3

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation