Skip to content Skip to sidebar Skip to footer

Comparing Single Dataframe Value To Previous 10 In Same Column

In a dataframe, I would like to count how many of the prices from the previous 10 days are greater than today's price. Result would look like this: price ct>prev10 50.00 51

Solution 1:

You can use a rolling_apply function on the series. I used a window length of 5 given the small size of the sample data, but you can easily change it.

The lambda function counts the number of items in the rolling group (excluding the last item) is greater than the last item.

df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

window = 5  # Given that sample data only contains 11 values.df['price_count'] = pd.rolling_apply(df.price, window, 
                                     lambda group: sum(group[:-1] > group[-1]))
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2

In the example above, the first group is the prices with index values 0-4. You can see what is happening with:

group= df.price[:window].values>>>grouparray([ 50. ,  51. ,  52. ,  50.5,  51. ])

Now, do your comparison of the previous four prices to the current price:

>>>group[:-1] >group[-1]
array([False, False,  True, False], dtype=bool)

Then, you are just summing the boolean values:

>>>sum(group[:-1] > group[-1])
1

This is the value that gets put into the first closing window at index 4.

Solution 2:

Here's a vectoized approach with NumPy module that supports broadcasting for implementing vectorized methods -

import numpy as np
import pandas as pd

# Sample input dataframe
df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

# Convert to numpy array for counting purposes
A = np.array(df['price'])

W = 5# Window size# Initialize another column for storing counts
df['price_count'] = np.nan

# Get counts and store as a new column in dataframe
C = (A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
df['price_count'][W-1:] = C

Sample run -

>>>df
    price
0    50.0
1    51.0
2    52.0
3    50.5
4    51.0
5    50.0
6    50.5
7    53.0
8    52.0
9    49.0
10   51.0
>>>A = np.array(df['price'])>>>W = 5# Window size>>>df['price_count'] = np.nan>>>>>>C=(A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)>>>df['price_count'][W-1:] = C>>>df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2

Post a Comment for "Comparing Single Dataframe Value To Previous 10 In Same Column"