Skip to content Skip to sidebar Skip to footer

Validating Dataframe Column Data

I have a below pseudocode which I need to write using pandas. if group_min_size && group_max_size if group_min_size == 0 && group_max_size > 0 if

Solution 1:

Just answer your questions step by step. Begin by creating your booleans:

min_equal_0 = df['group_min_size'] == 0min_above_0 = df['group_min_size'] > 0min_above_equal_2 = df['group_min_size'] >= 2min_below_2 = df['group_min_size'] < 2max_equal_0 = df['group_max_size'] == 0max_above_0 = df['group_max_size'] > 0max_above_equal_2 = df['group_max_size'] >= 2max_below_2 = df['group_max_size'] < 2

Now we can look at creating our masks according to the pseudo-code:

first_mask = ~(min_equal_0 & max_above_0 & (max_below_2 | max_above_equal_2))
second_mask = ~(max_equal_0 & min_above_0 & (min_below_2 | min_above_equal_2))

If we combine the two:

>> first_mask & second_mask

0False1True2False3False4True5True6True7True8True
dtype: bool

If you want to treat NaN as False, just add them:

min_is_not_null = df['group_min_size'].notnull()
max_is_not_null = df['group_max_size'].notnull()
>> min_is_not_null & max_is_not_null & first_mask & second_mask
0False1True2False3False4False5True6True7True8True
dtype: bool

Post a Comment for "Validating Dataframe Column Data"