Skip to content Skip to sidebar Skip to footer

Pandas Groupby With Dropna Set To True Generating Wrong Output

In the following snippet: import pandas as pd import numpy as np df = pd.DataFrame( { 'a': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'b': [1, np.nan, 1, np.nan, 2, 1, 2, n

Solution 1:

It's because None and None are the same thing:

>>> None == None
True
>>> 

You have to use np.nan:

>>> np.NaN == np.NaN
False
>>> 

So try this:

df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6, 7, 8, 9], 
        "b": [1, np.NaN, 1, np.NaN, 2, 1, 2, np.NaN, 1]
    }
)
df_again = df.groupby("b", dropna=False).apply(lambda x: x)

Now df and df_again would be the same:

>>> df
   a    b
0  1  1.0
1  2  NaN
2  3  1.0
3  4  NaN
4  5  2.0
5  6  1.0
6  7  2.0
7  8  NaN
8  9  1.0
>>> df_again
   a    b
0  1  1.0
1  2  NaN
2  3  1.0
3  4  NaN
4  5  2.0
5  6  1.0
6  7  2.0
7  8  NaN
8  9  1.0
>>> df.equals(df_again)
True
>>> 

Solution 2:

This was a bug introduced in pandas 1.2.0 as described here and was solved here.


Post a Comment for "Pandas Groupby With Dropna Set To True Generating Wrong Output"