Pandas To Datetime – String to Date – pd.to_datetime()

One of the Top 10 Pandas functions you must know is Pandas To Datetime. It a need-to-have in your data analysis toolkit. The wonderful thing about to_datetime() is it’s flexibility to read 95% of any dates you’ll throw at it.

Interested in my Top 10 Pandas Functions? Get em here.

Pandas To Datetime (.to_datetime()) will convert your string representation of a date to an actual date format. This is extremely important when utilizing all of the Pandas Date functionality like resample.

1. pd.to_datetime(your_date_data, format="Your_datetime_format")

If you walk away with anything from this post, make sure it’s an understanding of how to use format codes when converting dates. Check out the code sample below.

Pseudo code: Given format, convert a string into a datetime object.

Pandas To Datetime

Pandas To DateTime - Convert your date strings into pandas DateTime objects

To DateTime Parameters

.to_datetime() has a ton of parameters and they are all are important to understand. After you become familiar with them, you’ll need to understand date format codes below.

  • arg: This is the ‘thing’ that you want to convert to a datetime object. Pandas gives you a ton of flexibility; you can pass a int, float, string, datetime, list, tuple, Series, DataFrame, or dict. That’s a ton of input options!
  • format (Default=None): *Very Important* The format parameter will instruct Pandas how to interpret your strings when converting them to DateTime objects. The format must use the format codes below. See examples below.
  • origin (Default=’unix’): An origin is simply a reference date. Where do you want to have your universe of timestamps to start? By default is is set to unix which is 1970-01-01. 'julian' is January 1, 4713 BC. You can even set your own origin.
  • unit: Say you pass an int as your arg (like 20203939), with unit, you’ll be able specify what unit your int is is away from the origin. In the example here, if we set unit=’s’, this means pandas will interpret 20203939 as 20,203,939 seconds away from the origin. Available units are [D,s,ms,us,ns]
  • dayfirst: This parameter helps pandas understand if your ‘day’ is first in your format (ex: 01/02/2020 > 2020-02-01). I suggest playing with other parameters first before you try this one.
  • yearfirst: Same as the dayfirst parameter above. This will help pandas parse your dates if your year is first. Try the format code options first.
  • utc (Default=None): If you want to convert your DateTime objects to timezone-aware (meaning each datetime object also has a timezone) and you want that timezone to be UTC then set utc=True:

DateTime Format Codes

One extremely important concept to understand is DateTime format codes. This is how you instruct Pandas what format your DateTime string is in. It’s magic every time you see it work. In fact, I look forward to gross strings with dates in them just to parse. See documentation.

Format CodeDescriptionExamples
%aWeekday, abbreviatedMon, Tues, Sat
%AWeekday, full nameMonday, Tuesday, Saturday
%wWeekday, decimal. 0=Sunday1, 2, 6
%dDay of month, zero-padded01, 02, 21
%bMonth, abbreviatedJan, Feb, Sep
%BMonth, full nameJanuary, February, September
%mMonth number, zero-padded01, 02, 09
%yYear, without century, zero-padded02, 95, 99
%YYear, with century1990, 2020
%HHour (24 hour), zero padded01, 22
%IHour (12 hour) zero padded01, 12
%pAM or PMAM, PM
%MMinute, zero-padded01, 02, 43
%SSecond, zero padded01, 32, 59
%fMicrosecond, zero-padded000001, 000342, 999999
%zUTC offset ±HHMM[SS[.ffffff]]+0000, -1030, -3423.234
%ZTime zone nameITC, EST, CST
%jDay of year, zero-padded001, 365, 023
%UWeek # of year, zero-padded. Sunday first day of week00, 01, 51
%WWeek # of year, zero-padded. Monday first day of week00, 02, 51
%cAppropriate date and timeMonday Feb 01 21:30:00 1990
%xAppropriate Date02/01/90
%XAppropriate Time21:22:00
%%Literal '%' – Use this when you have a % sign in your format.%

Let’s run through each iteration of the above parameters

In [1]:
import pandas as pd

Pandas To Datetime

Pandas to datetime is a beautiful function that allows you to convert your strings into DateTimes. This is extremely useful when working with Time Series data.

Let's convert strings to datetimes:

  1. Basic conversion with scalar string
  2. Convert Pandas Series to datetime
  3. Convert Pandas Series to datetime w/ custom format
  4. Convert Unix integer (days) to datetime
  5. Convert integer (seconds) to datetime

The hardest part about this jupyter notebook will be creating the messy strings to convert. Forgive the plumming you'll see.

1. Basic Basic conversion with scalar string

To convert any string to a datetime, you'll need to start with .to_datetime(). This is called directly from the pandas library.

For this first one, I'll show the types of the variables to demonstrate going from a string to a datetime.

In [2]:
string_to_convert = '2020-02-01'
print ('Your string: {}'.format(string_to_convert))
print ('Your string_to_convert type: {}'.format(type(string_to_convert)))
print ()

# Convert your string
new_date = pd.to_datetime(string_to_convert)

print ('Your new date is: {}'.format(new_date))
print ('Your new type is: {}'.format(type(new_date)))
Your string: 2020-02-01
Your string_to_convert type: <class 'str'>

Your new date is: 2020-02-01 00:00:00
Your new type is: <class 'pandas._libs.tslibs.timestamps.Timestamp'>

2. Convert Pandas Series to datetime

Instead of passing a single string, I usually pass a series of strings that need converting.

Then, I'll replace a DataFrame column with the new Datetime column

First I'll make my series

In [3]:
s = pd.Series(['2020-02-01',
0    2020-02-01
1    2020-02-02
2    2020-02-03
3    2020-02-04
dtype: object
In [4]:
s = pd.to_datetime(s)
0   2020-02-01
1   2020-02-02
2   2020-02-03
3   2020-02-04
dtype: datetime64[ns]

3. Convert Pandas Series to datetime w/ custom format

Let's get into the awesome power of Datetime conversion with format codes. Say you have a messy string with a date inside and you need to convert it to a date. You need to tell pandas how to convert it and this is done via format codes.

Look how cool that is! We can pass any string along with a format and pandas will parse the dates

In [5]:
s = pd.Series(['My 3date is 01199002',
           'My 3date is 02199015',
           'My 3date is 03199020',
           'My 3date is 09199204'])
0    My 3date is 01199002
1    My 3date is 02199015
2    My 3date is 03199020
3    My 3date is 09199204
dtype: object
In [6]:
s = pd.to_datetime(s, format="My 3date is %m%Y%d")
0   1990-01-02
1   1990-02-15
2   1990-03-20
3   1992-09-04
dtype: datetime64[ns]

4. Convert Unix integer (days) to datetime

You can also convert integers into Datetimes. You'll need to keep two things in mind

  1. What is your reference point?
  2. What is the unit of your integer?

Reference point = What time do you want to start 'counting' your units from?

Unit = Is your integer in terms of # of days, seconds, years, etc.?

In [7]:
pd.to_datetime(14554, unit='D', origin='unix')
Timestamp('2009-11-06 00:00:00')

5. Convert integer (seconds) to datetime

More often, you'll have a unix timestamp that is expresses in seconds. As in seconds away from the default origin of 1970-01-01.

For example, at the time of this post, we are 1,600,355,888 seconds away from 1970-01-01. That's lot of seconds!

In [8]:
pd.to_datetime(1600355888, unit='s', origin='unix')
Timestamp('2020-09-17 15:18:08')

Bonus: 6. Change your origin or reference point

Say your dataset only has # of days after a certain time, but no datetimes. You could either add all of those days via a pd.Timedelta().

Or you could convert them to datetimes with a different origin. Let's check this out from 2020-02-01.

Below, we convert 160 into +160 days after 2020-02-01.

In [9]:
pd.to_datetime(160, unit='D', origin='2020-02-01')
Timestamp('2020-07-10 00:00:00')

Link to code above

Check out more Pandas functions on our Pandas Page

Official Documentation