Indexing Timeseries By Date String

November 16, 2024 Post a Comment

Given a timeseries, s, with a datetime index I expected to be able to index the timeseries by the date string. Am I misunderstanding how this should work? import pandas as pd url =

Solution 1:

Try indexing with a Timestamp object:

>>>import pandas as pd>>>from pandas.lib import Timestamp>>>url = 'http://ichart.finance.yahoo.com/table.csv?s=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'>>>df = pd.read_csv(url, index_col='Date', parse_dates=True)>>>s = df['Close']>>>s[Timestamp('2012-12-04')]
141.25

Solution 2:

When the time series is not ordered and you give a partial timestamp (e.g. a date, rather than a datetime) it's not clear which datetime should be selected.

It can't be assumed that there is only one datetime object per date, although there are in this example, here there are several options but it seems safer to throw an error here rather than guess a users motives. (We could return a series/list similar to .ix['2011-01'], but this may be confusing if returning a number in other cases. We could try to return a "closest match"... but this doesn't really make sense either.)

In an ordered case it's easier, we pick the first datetime with the selected date.

You can see in this behaviour in this simple example:

Baca Juga

import pandas as pd
from numpy.random import randn
from random import shuffle
rng = pd.date_range(start='2011-01-01', end='2011-12-31')
rng2 = list(rng)
shuffle(rng2) # not in order
rng3 = list(rng)
del rng3[20] # in order, but no freq

ts = pd.Series(randn(len(rng)), index=rng)
ts2 = pd.Series(randn(len(rng)), index=rng2)
ts3 = pd.Series(randn(len(rng)-1), index=rng3)

ts.index
<class'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-3100:00:00]
Length: 365, Freq: D, Timezone: None

ts['2011-01-01']
# -1.1454418070543406

ts2.index
<class'pandas.tseries.index.DatetimeIndex'>
[2011-04-1600:00:00, ..., 2011-03-1000:00:00]
Length: 365, Freq: None, Timezone: None

ts2['2011-01-01']
#...error which you describe
TimeSeriesError: Partial indexing only valid for ordered time series

ts3.index
<class'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-3100:00:00]
Length: 364, Freq: None, Timezone: None

ts3['2011-01-01']
1.7631554507355987


rng4 = pd.date_range(start='2011-01-01', end='2011-01-31', freq='H')
ts4 = pd.Series(randn(len(rng4)), index=rng4)

ts4['2011-01-01'] == ts4[0]
# True # it picks the first element with that date

I don't think this is a bug, nevertheless I posted it as an issue on github.

Solution 3:

While the pandas tutorial was instructive, I think the original question posed deserves a direct answer. I ran into the same problem converting Yahoo chart info to a DataFrame that could be sliced, etc. I found that the only thing that was required was:

import pandas as pd
import datetime as dt

defdt_parser(date): 
return dt.datetime.strptime(date, '%Y-%m-%d') + dt.timedelta(hours=16)

url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'  
df = pd.read_csv(url, index_col=0, parse_dates=True, date_parser=dt_parser)
df.sort_index(inplace=True)
s = df['Close']
s['2012-12-04']     # now should work

The "trick" was to include my own date_parser. I'm guessing that there is some better way to do this within read_csv, but this at least produced a DataFrame that was indexed and could be sliced.

Python Channel

Indexing Timeseries By Date String

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Indexing Timeseries By Date String"