Skip to content Skip to sidebar Skip to footer

Pandas Groupby Count And Then Conditional Mean

I have a dataframe like this: col1 col2 0 a 100 1 a 200 2 a 150 3 b 1000 4 c 400 5 c 200 what I want to do is group by col1 and count the number

Solution 1:

Use groupby.mean + DataFrame.where with Series.value_counts:

df.groupby('col1').mean().where(df['col1'].value_counts().ge(2)).reset_index()

#you can select columns you want
#(df.groupby('col1')[['col2']]
#   .mean()
#   .where(df['col1'].value_counts().ge(2)).reset_index())

Output

  col1   col2
0    a  150.0
1    b    NaN
2    c  300.0

if you really want blanks:

df.groupby('col1').mean().where(df['col1'].value_counts().ge(2), '').reset_index()

  col1 col2
0    a  150
1    b     
2    c  300

Solution 2:

Custom agg function

df.groupby('col1').agg(lambda d: np.nan if len(d) == 1 else d.mean())

       col2
col1       
a     150.0
b       NaN
c     300.0

Solution 3:

I'd go with GroupBy and mask:

g = df.groupby('col1')
g.mean().mask(g.size().eq(1))

      col2
col1       
a     150.0
b       NaN
c     300.0

Solution 4:

df.groupby('col1')['col2'].apply(lambda x: x.mean() if x.count() >= 2 else np.nan)


col1
a    150.0
b      NaN
c    300.0

Edit:

%timeit df.groupby('col1')['col2'].apply(lambda x: x.mean() if x.count() >= 2 else np.nan)
2.36 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# piRSquared
%timeit df.groupby('col1').agg(lambda d: np.nan if len(d) == 1 else d.mean())
5.9 ms ± 30 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# ansev
%timeit df.groupby('col1').mean().where(df['col1'].value_counts().ge(2)).reset_index()
7.01 ms ± 23.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Post a Comment for "Pandas Groupby Count And Then Conditional Mean"