Skip to content Skip to sidebar Skip to footer

Seaborn Kde Plot Plotting Probabilities Instead Of Density (histplot Without Bars)

I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it al

Solution 1:

The y-axis of a histplot with stat="probability" corresponds to the probability that a value belongs to a certain bar. The value of 0.23 for the highest bar, means that there is a probability of about 23% that a flipper length is between 189.7 and 195.6 mm (being the edges of that specific bin). Note that by default, 10 bins are spread out between the minimum and maximum value encountered.

The y-axis of a kdeplot is similar to a probability density function. The height of the curve is proportional to the approximate probability of a value being within a bin of width 1 of the corresponding x-value. A value of 0.031 for x=191 means there is a probability of about 3.1 % that the length is between 190.5 and 191.5.

Now, to directly get probability values next to a kdeplot, first a bin width needs to be chosen. Then the y-values can be divided by that bin with to correspond to an x-value being within a bin of that width. The PercentageFormatter provides a way to set such a correspondence, using ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)).

The code below illustrates an example with a binwidth of 5 mm, and how a histplot can match a kdeplot.

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import PercentFormatter

fig, ax1 = plt.subplots()
penguins = sns.load_dataset("penguins")
binwidth = 5
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="Probabilities",
             binwidth=binwidth, ax=ax1)
ax2 = ax1.twinx()
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density", ls=':', lw=5, ax=ax2)
ax2.set_ylim(0, ax1.get_ylim()[1] / binwidth)  # similir limits on the y-axis to align the plots
ax2.yaxis.set_major_formatter(PercentFormatter(1 / binwidth))  # show axis such that 1/binwidth corresponds to 100%
ax2.set_ylabel(f'Probability for a bin width of {binwidth}')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()

example plot

PS: To only show the kdeplot with a probability, the code could be:

binwidth = 5
ax = sns.kdeplot(data=penguins, x="flipper_length_mm")
ax.yaxis.set_major_formatter(PercentFormatter(1 / binwidth))  # show axis such that 1/binwidth corresponds to 100%
ax.set_ylabel(f'Probability for a bin width of {binwidth}')

Another option could be to draw a histplot with kde=True, and remove the generated bars. To be interpretable, a binwidth should be set. With binwidth=1 you'd get the same y-axis as a density plot. (kde_kws={'cut': 3}) lets the kde smoothly go to about zero, default the kde curve is cut off with the minimum and maximum of the data).

ax = sns.histplot(data=penguins, x="flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'cut': 3})
ax.containers[0].remove() # remove the bars
ax.relim() # the axis limits need to be recalculated without the bars
ax.autoscale_view()

Post a Comment for "Seaborn Kde Plot Plotting Probabilities Instead Of Density (histplot Without Bars)"