Skip to content Skip to sidebar Skip to footer

Average Over A Specific Time Period

I have a quite huge table in python from a .h5 file The start of the table looks somewhat like this: table = [WIND REL DIRECTION [deg]] [WIND SPEED [kts]] \ 735

Solution 1:

resample is your friend.

idx=pltd.num2date(table.index)df=pd.DataFrame({'direction':np.random.randn(10),'speed':np.random.randn(10)},index=idx)>>>dfdirectionspeed2014-05-28 08:53:59.971204+00:000.2054290.6994392014-05-28 08:54:01.008002+00:000.383199-0.3922612014-05-28 08:54:04.031995+00:00-2.146569-0.3255262014-05-28 08:54:04.982402+00:001.5723521.2892762014-05-28 08:54:06.019200+00:000.880394-0.4406672014-05-28 08:54:11.980795+00:00-1.3437580.6157252014-05-28 08:54:13.017603+00:00-1.7130430.5520172014-05-28 08:54:13.968000+00:00-0.3500170.7289102014-05-28 08:54:15.004798+00:00-0.6192730.2867622014-05-28 08:54:16.041596+00:000.4597470.524788>>>df.resample('15S',how='mean')# how='mean' is the default heredirectionspeed2014-05-28 08:53:45+00:000.2054290.6994392014-05-28 08:54:00+00:00-0.3882060.2896392014-05-28 08:54:15+00:00-0.0797630.405775

Performance is similar to the method provided by @LondonRob. I used a DataFrame with 1 million rows to test.

df = pd.DataFrame({'direction': np.random.randn(1e6), 'speed': np.random.randn(1e6)}, index=pd.date_range(start='2015-1-1', periods=1e6, freq='1S'))

>>> %timeit df.resample('15S')
100 loops, best of 3: 15.6 ms per loop

>>> %timeit df.groupby(pd.TimeGrouper(freq='15S')).mean()
100 loops, best of 3: 15.7 ms per loop

Solution 2:

I think this is the "right" way to do this. (Although it seems a little bit underdocumented to me. Anyway it works!)

You need to do a groupby on your DataFrame and use something called a TimeGrouper.

It works like this:

import pandas as pd
import numpy as np

# Create a dataframe. You can ignore all this bit!
periods = 60 * 60
random_dates = pd.date_range('2015-12-25', periods=periods, freq='s')
random_speeds = np.random.randint(100, size=periods)
random_directions = np.random.random(periods)
df = pd.DataFrame({'date': random_dates, 'wind_speed': random_speeds, 'wind_direction': random_directions})
df = df.set_index('date')

# Here's where the magic happens:
grouped15s = df.groupby(pd.TimeGrouper(freq='15S'))
averages_ws_15s = grouped15s.wind_speed.mean()

Or, if you insist on having spaces in your column names, that last line will become:

averages_ws_15s = grouped15s['Wind Speed'].mean()

This results in the following:

date2015-12-25 00:00:00    45.8000002015-12-25 00:00:15    48.4666672015-12-25 00:00:30    38.0666672015-12-25 00:00:45    54.8666672015-12-25 00:01:00    34.8666672015-12-25 00:01:15    37.0000002015-12-25 00:01:30    47.133333etc....etc....

Post a Comment for "Average Over A Specific Time Period"