Skip to content Skip to sidebar Skip to footer

Checking A Pandas Dataframe For Outliers

Plot of sensor I have an experiment on a sensor that contains 8 electrodes. The image above is a plot of the electrode output vs time. As you can see on the plot, one of the 8 elec

Solution 1:

Scatter plots or distribution plots are good for pointing outliers. But in context to the question of pandas data frames here's how I would do it.

df.decribe()

Will give you a good matrix of mean, max, and all percentile. Look into the max of the column to point out the outlier if its greater than 75 percentile of values.

Then df['Sensor Value'].value_counts()should give you the frequency of the values. You will have the outliers shown right here with greater values and that of less frequency.

Get their indexes and just drop them using df.drop(indexes_list, inplace=True)

EDIT: You could also check outlier with mean +/- 3 * standard deviation.

Example code:

outliers = df[df[col] > df[col].mean() + 3 * df[col].std()]

Post a Comment for "Checking A Pandas Dataframe For Outliers"