Modify Function To Return Dataframe With Specified Values
With reference to the test data below and the function I use to identify values within variable thresh of each other. Can anyone please help me modify this to show the desired out
Solution 1:
use mask
and sub
with axis=1
df2.mask(df2.sub(df2.apply(closeCols2,1),0).abs()> thresh)
AAA BBB CCC DDD EEE
0NaNNaN100981031NaNNaN5050502NaN30.025252537.0NaN10101049.011.0101010510.010.0111111
note:
I'd redefine closeCols
to include thresh
as a parameter. Then you could pass it in the apply
call.
defcloseCols2(df, thresh):
max_value = Nonefor k1,k2 in combinations(df.keys(),2):
ifabs(df[k1] - df[k2]) < thresh:
if max_value isNone:
max_value = max(df[k1],df[k2])
else:
max_value = max(max_value, max(df[k1],df[k2]))
return max_value
df2.apply(closeCols2, 1, thresh=5)
extra credit
I vectorized and embedded your closeCols
for some mind numbing fun.
Notice there is no apply
numpy
broadcasting to get all combinations of columns subtracted from each other.np.abs
<= 5
sum(-1)
I arranged the broadcasting such that the difference of say row0
, columnAAA
with all of row0
will be laid out across the last dimension.-1
in thesum(-1)
says to sum across last dimension.<= 1
all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.
v = df2.values
df2.mask((np.abs(v[:,:, None]- v[:, None])<=5).sum(-1)<=1)
AAA BBB CCC DDD EEE
0NaNNaN100981031NaNNaN5050502NaN30.025252537.0NaN10101049.011.0101010510.010.0111111
Post a Comment for "Modify Function To Return Dataframe With Specified Values"