Modify Function To Return Dataframe With Specified Values
With reference to the test data below and the function I use to identify values within variable thresh of each other. Can anyone please help me modify this to show the desired out
Solution 1:
use mask and sub with axis=1
df2.mask(df2.sub(df2.apply(closeCols2,1),0).abs()> thresh)
AAA BBB CCC DDD EEE
0NaNNaN100981031NaNNaN5050502NaN30.025252537.0NaN10101049.011.0101010510.010.0111111note:
I'd redefine closeCols to include thresh as a parameter. Then you could pass it in the apply call.
defcloseCols2(df, thresh):
max_value = Nonefor k1,k2 in combinations(df.keys(),2):
ifabs(df[k1] - df[k2]) < thresh:
if max_value isNone:
max_value = max(df[k1],df[k2])
else:
max_value = max(max_value, max(df[k1],df[k2]))
return max_value
df2.apply(closeCols2, 1, thresh=5)
extra credit
I vectorized and embedded your closeCols for some mind numbing fun.
Notice there is no apply
numpybroadcasting to get all combinations of columns subtracted from each other.np.abs<= 5sum(-1)I arranged the broadcasting such that the difference of say row0, columnAAAwith all of row0will be laid out across the last dimension.-1in thesum(-1)says to sum across last dimension.<= 1all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.
v = df2.values
df2.mask((np.abs(v[:,:, None]- v[:, None])<=5).sum(-1)<=1)
AAA BBB CCC DDD EEE
0NaNNaN100981031NaNNaN5050502NaN30.025252537.0NaN10101049.011.0101010510.010.0111111
Post a Comment for "Modify Function To Return Dataframe With Specified Values"