Skip to content Skip to sidebar Skip to footer

Difference Between Two Dates In Pandas Dataframe

I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame. The f

Solution 1:

You need to convert the strings to datetime dtype, you can then subtract whatever arbitrary date you want and on the resulting series call dt.days:

In [15]:
df = pd.DataFrame({'date':['2015-09-25T01:45:34.372Z']})
df

Out[15]:
                       date
0  2015-09-25T01:45:34.372Z

In [19]:
df['date'] = pd.to_datetime(df['date'])
df['day'] = (df['date'] - dt.datetime.now()).dt.days
df

Out[19]:
                     date  day
0 2015-09-25 01:45:34.372 -252

Solution 2:

Well, it all kinda depends on the time format you use. I'd recommend using datetime.

If in_time and out_time are currently strings, convert them with datetime.strptime():

from datetime import datetime

f = lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')
df.in_time = df.in_time.apply(f)
df.out_time = df.out_time.apply(f)

and then you can simply subtract them, and assign the result to a new column named 'days':

df['days'] = df.out_time - df.in_time

Example: (3 seconds and 1 day differences)

In[5]: df = pd.DataFrame({'in_time':['2015-09-25T01:45:34.372Z','2015-09-25T01:45:34.372Z'],
                          'out_time':['2015-09-25T01:45:37.372Z','2015-09-26T01:45:34.372Z']})
In[6]: df
Out[6]: 
                    in_time                  out_time
0  2015-09-25T01:45:34.372Z  2015-09-25T01:45:37.372Z
1  2015-09-25T01:45:34.372Z  2015-09-26T01:45:34.372Z

In[7]: type(df.loc[0,'in_time'])
Out[7]: str

In[8]: df.in_time = df.in_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))   
In[9]: df.out_time = df.out_time.apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ'))

In[10]: df    # notice that it looks exactly the same, but the type is different
Out[10]: 
                  in_time                  out_time
0 2015-09-25 01:45:34.372  2015-09-25T01:45:37.372Z
1 2015-09-25 01:45:34.372  2015-09-26T01:45:34.372Z

In[11]: type(df.loc[0,'in_time'])
Out[11]: pandas.tslib.Timestamp

And the creation of the new column:

In[12]: df['days'] = df.out_time - df.in_time
In[13]: df
Out[13]: 
                  in_time                out_time            days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372 0 days 00:00:03
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372 1 days 00:00:00

Now you can play with the output format. For example, the portion of seconds difference:

In[14]: df.days = df.days.apply(lambda x: x.total_seconds()/60)
In[15]: df
Out[15]: 
                  in_time                out_time     days
0 2015-09-25 01:45:34.372 2015-09-25 01:45:37.372     0.05
1 2015-09-25 01:45:34.372 2015-09-26 01:45:34.372  1440.00

Note: Regarding the in_time and out_time format, notice that I made some assumptions (for example, that you're using a 24H clock (thus using %H and not %I)). To play with the format have a look at: strptime() documentation.

Note2: It would obviously be better if you can design your program to use datetime from the beginning (instead of using strings and converting them).


Post a Comment for "Difference Between Two Dates In Pandas Dataframe"