Pandas Date Time Functions – to_datetime(), date_range(), resample() and tz_localize()

Pandas Date Time Functions – to_datetime(), date_range(), resample() and tz_localize()
Pandas Date Time Functions – to_datetime(), date_range(), resample() and tz_localize()

Introduction

As a data scientist or machine learning engineer, we may encounter such kind of datasets where we have to deal with dates in our dataset. In this article, we will look at pandas functions that will help us in the handling of date and time data. The functions covered in this article are to_datetime(), date_range(), resample() and tz_localize(). We will look at syntax and examples of these functions to understand them appropriately.

Importing Pandas Library

We will start the tutorial by importing the pandas library.

In [1]:
import pandas as pd
import numpy as np

Pandas To_Datetime : to_datetime()

The pandas to_datetime() function is used to convert the arguments to date time.

Syntax

pandas.to_datetime(arg, errors=’raise’, dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin=’unix’, cache=True)

Ad
Deep Learning Specialization on Coursera

arg : int, float, str, datetime, list, tuple, 1-d array, Series DataFrame/dict-like – This is the object used to convert to datetime.

errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’ – This parameter helps in classifying the invalid parsing.

dayfirst : bool, default False – This is used to specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first.

yearfirst : bool, default False – This is used to specify a date parse order if arg is str or its list-likes. If True, parses dates with the year first.

utc : bool, default None – This parameter is used to return UTC DatetimeIndex if True.

format : str, default True – The strftime to parse time.

exact : bool, default True – It behaves as: – If True, require an exact format match. – If False, allow the format to match anywhere in the target string.

unit : str, default ‘ns’ – The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number.

infer_datetime_format : bool, default False – If True and no format is given, attempt to infer the format of the datetime strings, and if it can be inferred, switch to a faster method of parsing them.

origin : scalar, default ‘unix’ – This parameter is used to define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date.

cache : bool, default True – If True, use a cache of unique, converted dates to apply the datetime conversion.

Example 1: Creating datetime from multiple dataframe columns

Here in this example, datetime is created from multiple columns of dataframe. We can see how dataframe data is converted to datetime.

In [2]:
df = pd.DataFrame({'year': [2018, 2019],
                    'month': [3, 7],
                    'day': [25, 9]})
In [3]:
df
Out[3]:
year month day
0 2018 3 25
1 2019 7 9
In [4]:
pd.to_datetime(df)
Out[4]:
0   2018-03-25
1   2019-07-09
dtype: datetime64[ns]

Example 2: Using errors parameter of pandas to_datetime function

In the to_datetime() function, we have a parameter known as errors. In this example, we will use this parameter and understand its usage.

In this example, the parameter errors is given two different values. When passed with “ignore”, the function accepts datetime data in any format and consider it a valid input. When value passed is “coerce”, then if datetime is not provided in an appropriate format, then the function returns NaT i.e. Not a time value.

In [5]:
pd.to_datetime('16000412', format='%Y%m%d', errors='ignore')
Out[5]:
datetime.datetime(1600, 4, 12, 0, 0)
In [6]:
pd.to_datetime('16000412', format='%Y%m%d', errors='coerce')
Out[6]:
NaT

Pandas Date_Range : date_range()

This pandas function returns a fixed frequency of datetime index.

Syntax

pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, kwargs)**

start : str or datetime-like, optional – This is the starting point for generating dates.

end : str or datetime-like, optional – This is the ending point for generating dates.

periods : int, optional – This parameter decides the number of periods to generate.

freq : str or DateOffset, default ‘D’ – This parameter is used to decide the frequency strings that have multiples.

tz : str or tzinfo, optional – This parameter helps in returning time zone name for returning localized DatetimeIndex.

normalize : bool, default False – Normalizing the starting/ending dates.

name : str, default None – This is the name of the resulting DatetimeIndex

closed : {None, ‘left’, ‘right’}, optional – This is used for making the interval closed with respect to the given frequency to the ‘left’, ‘right’, or both sides.

kwargs – Additional keyword arguments.

Example 1: Providing start and end parameters

In this example, the start and end parameters of the pandas to_range function is specified. Using these parameters the starting and ending dates are given in the form of range.

In [7]:
pd.date_range(start='1/1/2019', end='1/01/2020')
Out[7]:
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10',
               ...
               '2019-12-23', '2019-12-24', '2019-12-25', '2019-12-26',
               '2019-12-27', '2019-12-28', '2019-12-29', '2019-12-30',
               '2019-12-31', '2020-01-01'],
              dtype='datetime64[ns]', length=366, freq='D')

Example 2: Specifying start and periods parameters

Here we will not provide the end parameter to the to_range function and then will se how it works.

In [8]:
pd.date_range(start='1/14/2020', periods=8)
Out[8]:
DatetimeIndex(['2020-01-14', '2020-01-15', '2020-01-16', '2020-01-17',
               '2020-01-18', '2020-01-19', '2020-01-20', '2020-01-21'],
              dtype='datetime64[ns]', freq='D')

Pandas Resample : Resample()

The pandas resample() function is used for the resampling of time-series data.

Syntax

pandas.DataFrame.resample(rule, axis, closed, label, convention, kind, loffset, base, on, level)

rule : DateOffset, Timedelta or str – This parameter is the offset string or object representing target conversion.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – This is the axis where resampling takes place.

closed : {‘right’, ‘left’}, default None – The closed parameter is used to check which side of bin interval is closed.

label : {‘right’, ‘left’}, default None – This parameter is to decide which bin edge label to label bucket with.

convention : {‘start’, ‘end’, ‘s’, ‘e’}, default ‘start’ – This parameter is used for periodindex only, it controls whether to use the start or end of rule.

kind : {‘timestamp’, ‘period’}, optional, default None – This parameter is used to pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex.

loffset : timedelta, default None – For adjusting the resampled time labels.

base : int, default 0 – For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals.

on : str, optional – In case we want to use columns for resampling, then we use this parameter.

level : str or int, optional – If we multindex dataframe, then we require level and that’s why this parameter is used.

Example 1: Using pandas resample() for downsampling

Using the resample function of pandas, we can perform the downsampling of the time-series data. Through this, we are able to categorize them into bins. In this example, series data is used for the resample function.

In [9]:
index = pd.date_range('1/1/2019', periods=9, freq='T')
In [10]:
series = pd.Series(range(9), index=index)
In [11]:
series
Out[11]:
2019-01-01 00:00:00    0
2019-01-01 00:01:00    1
2019-01-01 00:02:00    2
2019-01-01 00:03:00    3
2019-01-01 00:04:00    4
2019-01-01 00:05:00    5
2019-01-01 00:06:00    6
2019-01-01 00:07:00    7
2019-01-01 00:08:00    8
Freq: T, dtype: int64

Using the resample function the series is categorized into 3 minute bins and values of the timestamp is summed.

In [12]:
series.resample('3T').sum()
Out[12]:
2019-01-01 00:00:00     3
2019-01-01 00:03:00    12
2019-01-01 00:06:00    21
Freq: 3T, dtype: int64

Example 2: Resampling over columns

In this example the resample function will be applied over columns and will find mean of the specified column.

In [13]:
d = dict({'cost': [ 18, 17, 19, 50, 60, 40, 100, 50,],
           'quantity': [100, 40, 50,10, 11, 9, 13, 14,]})
In [14]:
df = pd.DataFrame(d)
In [15]:
df
Out[15]:
cost quantity
0 18 100
1 17 40
2 19 50
3 50 10
4 60 11
5 40 9
6 100 13
7 50 14
In [16]:
df['week_starting'] = pd.date_range('01/01/2020',periods=8,freq='W')
In [17]:
df.resample('M', on='week_starting').mean()
Out[17]:
cost quantity
week_starting
2020-01-31 26.0 50.00
2020-02-29 62.5 11.75

Pandas Tz_localize : tz_localize()

The pandas tz_localize() function localizes tz-naive datetime array/index to tz-aware datetime array/index.

Syntax

pandas.DataFrame.tz_localize(args,kwargs)

tz : str, pytz.timezone, dateutil.tz.tzfile or None – This is the time zone to convert timestamps to. Passing none will remove the time zone information preserving local time.

This function returns array/index converted to a specified time zone.

Example 1: Simple example of pandas tz_localize()

In this example of tz_localize function, we will understand the usage of this function. Here in this example, we will see how to convert the time to US/Eastern, CET(Central European Time) and Asia/Calcutta time zone is passed to the tz parameter.

In [18]:
tz_naive = pd.date_range('2019-07-09 09:00', periods=3)
In [19]:
tz_naive
Out[19]:
DatetimeIndex(['2019-07-09 09:00:00', '2019-07-10 09:00:00',
               '2019-07-11 09:00:00'],
              dtype='datetime64[ns]', freq='D')
In [20]:
tz_date = tz_naive.tz_localize(tz='US/Eastern')
In [21]:
tz_date
Out[21]:
DatetimeIndex(['2019-07-09 09:00:00-04:00', '2019-07-10 09:00:00-04:00',
               '2019-07-11 09:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

In the below example, CET i.e. Central European Time time zone is passed to the pandas tz_localize() function.

In [22]:
tz_euro = tz_naive.tz_localize(tz='CET')
In [23]:
tz_euro
Out[23]:
DatetimeIndex(['2019-07-09 09:00:00+02:00', '2019-07-10 09:00:00+02:00',
               '2019-07-11 09:00:00+02:00'],
              dtype='datetime64[ns, CET]', freq='D')

In the below example, Asia/Calcutta time zone is passed to the pandas tz_localize() function.

In [24]:
tz_ind = tz_naive.tz_localize(tz='Asia/Calcutta')
In [25]:
tz_ind
Out[25]:
DatetimeIndex(['2019-07-09 09:00:00+05:30', '2019-07-10 09:00:00+05:30',
               '2019-07-11 09:00:00+05:30'],
              dtype='datetime64[ns, Asia/Calcutta]', freq='D')

Conclusion

So in this tutorial, we learned about the functions that can help us in handling the datetime data. The functions covered in this article are pandas to_datetime(), date_range(), resample() and tz_localize(). With the proper knowledge of these functions, we can easily manage datasets that consist of datetime data and other related tasks. We looked at the examples along with syntax for all of these functions for a better understanding of their usage.

Like and Comment section (Community Members)

Create Your ML Profile!

Don't miss out to join exclusive Machine Learning community

Comments

No comments yet