Pandas Groupby With Dropna Set To True Generating Wrong Output
In the following snippet: import pandas as pd import numpy as np df = pd.DataFrame( { 'a': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'b': [1, np.nan, 1, np.nan, 2, 1, 2, n
Solution 1:
It's because None
and None
are the same thing:
>>> None == None
True
>>>
You have to use np.nan
:
>>> np.NaN == np.NaN
False
>>>
So try this:
df = pd.DataFrame(
{
"a": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"b": [1, np.NaN, 1, np.NaN, 2, 1, 2, np.NaN, 1]
}
)
df_again = df.groupby("b", dropna=False).apply(lambda x: x)
Now df
and df_again
would be the same:
>>> df
a b
0 1 1.0
1 2 NaN
2 3 1.0
3 4 NaN
4 5 2.0
5 6 1.0
6 7 2.0
7 8 NaN
8 9 1.0
>>> df_again
a b
0 1 1.0
1 2 NaN
2 3 1.0
3 4 NaN
4 5 2.0
5 6 1.0
6 7 2.0
7 8 NaN
8 9 1.0
>>> df.equals(df_again)
True
>>>
Post a Comment for "Pandas Groupby With Dropna Set To True Generating Wrong Output"