Sometimes your row numbers as your index just doesn’t cut it and you need to pandas set index on your DataFrame.
You may want to replace your current index with another column in your DataFrame. No problem with pd.DataFrame.set_index()!
You’re usually doing this when you want to set your index to a list of names, or unique ids. For example, you imported a CSV but forgot to set your index_col.
Pseudo code: Take a DataFrame column (or columns) set them as your DataFrame index.
The questions you’ll have to ask yourself are:
- Do you want to drop your column you’re setting as the index?
- Do you want the transformation to happen in place? or have your DataFrame returned to you?
Pandas Set Index
Let’s take a look at the different parameters you can pass pd.DataFrame.set_index():
- keys: What you want to be the new index. This is either 1) the name of the DataFrame’s column or 2) A Pandas Series, Index, or NumPy Array of the same length as your DataFrame.
- drop (Default: True): If set true, this will delete the column you’re referencing as the new index (unless you’re passing a series)
- append (Default: False): Append=True will attach the column you’re referencing to your current index. Therefore creating a multi-index. My guess is most people will never use this.
- inplace (Default: False): If set true, the operation will happen on the DataFrame inplace and will not return anything. If false, you’ll get your DataFrame with the new index returned to you.
- verify_integrity (Default: False): This will check your new index for duplicates. You won’t need to touch this unless you’re getting into advanced performance territory.
Here’s a Jupyter notebook showing how to set index in Pandas
import pandas as pd
Pandas Set Index | pd.DataFrame.set_index()¶
We will run through 3 examples:
- Setting a new index from an existing column
- Setting a new index from an new np.array
- Setting a new index with append=True so we create a multi index
But first, let's create our DataFrame
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0), ('Liho Liho', 'Restaurant', 224.0), ('500 Club', 'bar', 80.5), ('The Square', 'bar', 25.30)], columns=('name', 'type', 'AvgBill') ) df
Next I'm going to set my index as the "name" column in my DataFrame.
Check it out! My name column has now become my index. Also notice that the 'name' column has been dropped from our original dataframe. If we set drop=False then it would have stayed and also replaced the index.
Setting a new index from an new np.array¶
Now I'm going to set my index, but in this case, instead of a DataFrame column I'm going to create a new list from scratch to be the new index.
Warning: Make sure to use a pd.Series or a np.ndarray when you pass it to set_index. A simple list won't work.
Now my indexes are set to the creatively named pd.Series I passed.
Setting a new index with append=True so we create a multi index¶
Now I'm going to set append=True. This will set the new index I'm supplying as well as keep the old one. Previously the old one was dropped.
Check out more Pandas functions on our Pandas Page