Want to replace values in your DataFrame with something else? No problem. That is where pandas replace comes in.
Pandas DataFrame.replace()
is a small but powerful function that will replace (or swap) values in your DataFrame with another value. What starts as a simple function, can quickly be expanded for most of your scenarios
1. YourDataFrame.replace(to_replace='what you want to replace',\ value='what you want to replace with')
This function is very similar to DataFrame.at(), or trying to set a value via DataFrame.iloc/loc. However, in .replace(), pandas will do the searching for you.
Beginner Pandas users will have fun doing simple replaces, but the kung-fu Pandas master will go 3 levels deep.
Pseudo code: Find current values within my DataFrame, then replace them with another value.
Pandas Replace
.replace() starts off easy, but quickly gets nuanced as you dig deeper. Here are the most common ways to use pandas replace.Here’s a breakdown of the different
Code | Plain Language |
df.replace(0, 5) | Replace all of the 0s in your DataFrame with 5s |
df.replace([0, 1, 2, 3], 4) | Replace all the 0s, 1s, 2s, 3s in your DataFrame with 4s |
df.replace([0, 1, 2, 3], [4, 3, 2, 1]) | Replace all the 0s with 4s, 1s with 3s, 2s with 2s, and 3s with 1s. Note: if you pass two lists they both much be the same length |
df.replace({0: 10, 1: 100}) | Using a dict – Replace 0s with 10s, and 1s with 100s. |
df.replace({'A': 0, 'B': 5}, 100) | Replace 0’s in column “A” with 100, and replace 5s in column “B” with 100 |
df.replace({'C': {1: 100, 3: 300}}) | Using a dict – Within column “C” replace 1s with 100 and 3s with 300 |
df.replace(to_replace=r'^ba.$', value='new', regex=True) | Replace anything that matched the regex ‘^ba.$’ with “new” |
Replace Parameters
- to_replace: The values, list of values, or values which match regex, that you would like to replace. If using a dict, you can also include the values you would like to do the replacing. There are a ton of details here, we recommend referring to the official documentation for more.
- value: The values that will do the replacing. Note: This can also be none if you have a dict in your to_replace parameter.
- inplace (Default: False): If true, you would like to do your operation in place (write over your current DataFrame). If false, then your DataFrame will be returned to you.
- limit: The max size you could like to forward or back fill. Example: You may want to fill from values that are 2-3 rows away, but do you really want to fill from values that are 30 rows away?
- regex: If you want to_replace to read your inputs as regex or not.
- method: The fill method to use when to_replace is either a scalar, list, or tuple. Value must be None
- pad/ffill – Take the value that is in the back of what your replacing, and fill it going forward
- bfill – Take the value that is in the front of your value to be replaced, and fill it going backward.
Here’s a Jupyter notebook showing how to set index in Pandas
import pandas as pd
Pandas Replace¶
Pandas Replace will replace values in your DataFrame with another value. This function starts simple, but gets flexible & fun later on.
We will run through 7 examples:
- Single 1<>1 replace across your whole DataFrame
- Single Many<>1 replace across your whole DataFrame
- Many 1<>1 replaces across your whole DataFrame
- Many 1<>1 replaces across your whole DataFrame via a dictionary
- 1<>1 column specific replaces across multiple columns via a dictionary
- Many 1<>1 replaces with a single column via a dictionary
- Backfill a value with another value in the row below.
Let's create our DataFrame
df = pd.DataFrame({'X': [1, 2, 3, 4, 5],
'Y': [5, 6, 7, 8, 9],
'Z': ['z', 'y', 'x', 'w', 'v']})
df
1. Single 1<>1 replace across your whole DataFrame¶
Here we will find a all instances of a single value in our DataFrame, and replace it with something else.
Here all of the 2s are being replaced with 20s
df.replace(to_replace=2, value=20)
2. Single Many<>1 replace across your whole DataFrame¶
Here we will pass a list of values in our DataFrame that we want to replace with something else
We will replace all 1s, 3s, and 5s with 20
df.replace(to_replace=[1,3,5], value=20)
3. Many 1<>1 replaces across your whole DataFrame¶
Here we will pass two lists, one of values that need replacing, and one with the valuing that will do replacing
Notice that the 1s get replaced with 10s, the 3s with 30s and the 5s with 50s
df.replace(to_replace=[1,3,5], value=[10,30,50])
4. Many 1<>1 replaces across your whole DataFrame via a dictionary¶
Here we will pass a dictionary. The dictionary keys are the values we want to replace and the dictionary values are the values doing the replacing.
We are replacing 1s with 10s, 'z's with 'zz's, and 'v's with 'vvv's
df.replace(to_replace={1: 10, 'z':'zz', 'v':'vvv'}, value=None)
5. 1<>1 column-specific replaces across multiple columns via a dictionary¶
One interesting feature of pandas.replace is that you can specify values to replace per column. Example: you may want to only replace the 1s in your first column, but not in your second column.
To do this, you need to have a nested dict. The parent dict will have the column you want to specify, the child dict will have the values to replace.
Here we are replacing the 5s in column X (only) with 50s
df.replace(to_replace={'X': {5: 50}}, value=None)
6. Many 1<>1 column-specific replaces via a dictionary¶
We'll do the same thing here, but multiple values within multiple columns
Here we are doing a few replaces:
- In column "X": Replace 1s with 10s and 4s with 40s
- In column "Y": Replace 8s with 80s and 9s with 99s
- In column "Z": Replace 'z's with 'zzz's, 'y's with 'yyy's and 'x's with 'xx's
df.replace(to_replace={'X': {1: 10, 4: 40},\
'Y': {8: 80, 9: 99},
'Z': {'z': 'zzz', 'y': 'yyy', 'x': 'xx'},}, value=None)
7. Backfill a value with another value in the row below.¶
For this example, we will specify to_replace with value=None. However this time, we will also set method='bfill' which will fill a value with the row below it.
Here we are replacing 1, 2, 'w', and 4 with the values in the next row below them. This is most helpful when you have NAs (look into using .fillna()) or when you want to overwrite.
Notice how both 1 and 2 were getting replaced in column X, with method='bfill', the 3 filled both 1 and 2
df.replace([1, 2, 'w', 4], value=None, method='bfill')
Check out more Pandas functions on our Pandas Page