Pandas To_datetime Valueerror: Unknown String Format
I have a column in my (pandas) dataframe: data['Start Date'].head() type(data['Start Date']) Output: 1/7/13 1/7/13 1/7/13 16/7/13 16/7/13
Solution 1:
I think the problem is in data - a problematic string exists. So you can try check length of the string in column Start Date
:
import pandas as pd
import io
temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13"""
data = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=False)
#data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)print data
Start Date
01/7/1311/7/121/7/131217316/7/13416/7/13#check, if length is more as 7print data[data['Start Date'].str.len() > 7]
Start Date
21/7/131217
Or you can try to find these problematic row different way e.g. read only part of the datetime and check parsing datetime:
#read first 3 rows
data= data.iloc[:3]
data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
But this is only tips.
EDIT:
Thanks joris for suggestion add parameter errors ='coerce'
to to_datetime
:
temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13 12 04"""
data = pd.read_csv(io.StringIO(temp), sep=";")
#add parameter errors coerce
data['Start Date']= pd.to_datetime(data['Start Date'], dayfirst=True, errors='coerce')
print data
Start Date
02013-07-01
12001-07-01
2 NaT
32013-07-164 NaT
#index of data with null - NaT to variable idx
idx = data[data['Start Date'].isnull()].index
print idx
Int64Index([2, 4], dtype='int64')
#read csv again
data = pd.read_csv(io.StringIO(temp), sep=";")
#find problematic rows, where datetime is not parsedprint data.iloc[idx]
Start Date
21/7/131217416/7/1312 04
Post a Comment for "Pandas To_datetime Valueerror: Unknown String Format"