Indexing Timeseries By Date String
Solution 1:
Try indexing with a Timestamp
object:
>>>import pandas as pd>>>from pandas.lib import Timestamp>>>url = 'http://ichart.finance.yahoo.com/table.csv?s=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'>>>df = pd.read_csv(url, index_col='Date', parse_dates=True)>>>s = df['Close']>>>s[Timestamp('2012-12-04')]
141.25
Solution 2:
When the time series is not ordered and you give a partial timestamp (e.g. a date, rather than a datetime) it's not clear which datetime should be selected.
It can't be assumed that there is only one datetime object per date, although there are in this example, here there are several options but it seems safer to throw an error here rather than guess a users motives. (We could return a series/list similar to .ix['2011-01']
, but this may be confusing if returning a number in other cases. We could try to return a "closest match"... but this doesn't really make sense either.)
In an ordered case it's easier, we pick the first datetime with the selected date.
You can see in this behaviour in this simple example:
import pandas as pd
from numpy.random import randn
from random import shuffle
rng = pd.date_range(start='2011-01-01', end='2011-12-31')
rng2 = list(rng)
shuffle(rng2) # not in order
rng3 = list(rng)
del rng3[20] # in order, but no freq
ts = pd.Series(randn(len(rng)), index=rng)
ts2 = pd.Series(randn(len(rng)), index=rng2)
ts3 = pd.Series(randn(len(rng)-1), index=rng3)
ts.index
<class'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-3100:00:00]
Length: 365, Freq: D, Timezone: None
ts['2011-01-01']
# -1.1454418070543406
ts2.index
<class'pandas.tseries.index.DatetimeIndex'>
[2011-04-1600:00:00, ..., 2011-03-1000:00:00]
Length: 365, Freq: None, Timezone: None
ts2['2011-01-01']
#...error which you describe
TimeSeriesError: Partial indexing only valid for ordered time series
ts3.index
<class'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-3100:00:00]
Length: 364, Freq: None, Timezone: None
ts3['2011-01-01']
1.7631554507355987
rng4 = pd.date_range(start='2011-01-01', end='2011-01-31', freq='H')
ts4 = pd.Series(randn(len(rng4)), index=rng4)
ts4['2011-01-01'] == ts4[0]
# True # it picks the first element with that date
I don't think this is a bug, nevertheless I posted it as an issue on github.
Solution 3:
While the pandas tutorial was instructive, I think the original question posed deserves a direct answer. I ran into the same problem converting Yahoo chart info to a DataFrame that could be sliced, etc. I found that the only thing that was required was:
import pandas as pd
import datetime as dt
defdt_parser(date):
return dt.datetime.strptime(date, '%Y-%m-%d') + dt.timedelta(hours=16)
url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
df = pd.read_csv(url, index_col=0, parse_dates=True, date_parser=dt_parser)
df.sort_index(inplace=True)
s = df['Close']
s['2012-12-04'] # now should work
The "trick" was to include my own date_parser. I'm guessing that there is some better way to do this within read_csv, but this at least produced a DataFrame that was indexed and could be sliced.
Post a Comment for "Indexing Timeseries By Date String"