Pandas Find – pd.Series.str.find()

If you’re looking for information on how to find data or cell within a Pandas DataFrame or Series, check out a future post – Locating Data Within A DataFrame. This post will be around finding substrings within a series of strings.

Often times you may want to know where a substring exists in a bigger string. You could be trying to extract an address, remove a piece of text, or simply wanting to find the first instance of a substring.

Pandas.Series.Str.Find() helps you locate substrings within larger strings. This has the identical functionality as =find() in Excel or Google Sheets.

Example: “day” is a substring within “Monday.” However, “day” is not a substring of “November,” since “day” does not appear in “November”

Pseudo code: “Monday”.find(“day”) returns 4. “day” starts at the 4th character in “Monday”

But first, what is a string and substring?

  • String = Data type within python that represents text
  • Substring = A piece of text within a larger piece of text

To find where a substring exists (if it does at all) within a larger series of strings you need to call pd.Series.str.find()

Pandas Find

Pandas find returns an integer of the location (number of characters from the left) of a substring. It will return -1 if it does not exist

pandas find - How to find a substring within a larger series of strings

Find has two important arguments that go along with the function. Start & End

  • Start (default = 0): Where you want .find() to start looking for your substring. By default you’ll start at the beginning of the string (location 0).
  • End: Where you want .find() to finish looking for your substring.

Note: You would only use start & end if you didn’t want to search the entire string.

In [2]:
import pandas as pd

Pandas Find | pd.Series.str.find()

Say you have a series of strings and you want to find the position of a substring.

Pandas .find() will return the location (number of characters from the left) of a certain substring. Let's look at an example.

First, create a series of strings. Note: You can also do this with a column in a pandas DataFrame

In [23]:
my_string_series = pd.Series(['San Francisco',
                              'Chicago',
                              'Traveling',
                              'Pandas',
                              'Remote Worker'], name="string_series")

Now say we want to find if and where the substring "cago" sit within each string in our series. In order to do this, we call .find("cago")

In [24]:
find_result = my_string_series.str.find("cago") # Calling .find and passing "cago"
find_result.name = "find_result" # naming the series so we can call it later
find_result # displaying the results
Out[24]:
0   -1
1    3
2   -1
3   -1
4   -1
Name: find_result, dtype: int64

In order to view the output easily, let's concat our original series with the result

In [25]:
pd.concat([my_string_series, find_result], axis=1)
Out[25]:
string_seriesfind_result
0San Francisco-1
1Chicago3
2Traveling-1
3Pandas-1
4Remote Worker-1

Notice how San Francisco, Traveling, Pandas, and Remote Worker all return -1 for .find(). This is because the substring "cago" does not exist within those strings.

However, "Chicago" returns 3. This is because "cago" starts at position 3 within Chicago!

Let's try some more. This time, I want to find the first instance of the letter "o" within our series of strings

In [26]:
find_result = my_string_series.str.find("o")
pd.concat([my_string_series, find_result], axis=1)
Out[26]:
string_seriesstring_series
0San Francisco12
1Chicago6
2Traveling-1
3Pandas-1
4Remote Worker3

Now it looks like Traveling & Pandas do not contain "o" (good thing, because they don't contain it) while SF, Chiago, and Remote Worker do.

What if I only wanted to search a series of strings between the 3rd and 8th character? Then we would pass a start= and end=

In [27]:
find_result = my_string_series.str.find("o", start=3, end=8)
pd.concat([my_string_series, find_result], axis=1)
Out[27]:
string_seriesstring_series
0San Francisco-1
1Chicago6
2Traveling-1
3Pandas-1
4Remote Worker3

In this case, San Francisco does contain the letter "o", but not between characters 3 through 8, so .find() returns -1 for San Francisco. Chicago and Remote Worker both return results.

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation