Distinct Contiguous Blocks In Pandas Dataframe

May 11, 2024 Post a Comment

I have a pandas dataframe looking like this: x1=[np.nan, 'a','a','a', np.nan,np.nan,'b','b','c',np.nan,'b','b', np.nan] ty1 = pd.DataFrame({'name':x1}) Do you know how I can get

Solution 1:

You can use shift and cumsum to create 'id's for each contiguous block:

In [5]: blocks = (ty1 != ty1.shift()).cumsum()

In [6]: blocks
Out[6]:
    name
01122232435465758697108118129

You are only interested in those blocks that are not NaN, so filter for that:

In [7]: blocks = blocks[ty1['name'].notnull()]

In [8]: blocks
Out[8]:
    name
122232657586108118

And then, we can get the first and last index for each 'id':

In[10]: blocks.groupby('name').apply(lambda x: (x.index[0], x.index[-1]))
Out[10]:
name2      (1, 3)
5      (6, 7)
6      (8, 8)
8    (10, 11)
dtype: object

Although, if this last step is necessary will depend on what you want to do with it (working with tuples as elements in dataframes in not really recommended). Maybe having the 'id's can already be enough.

Python Channel

Distinct Contiguous Blocks In Pandas Dataframe

Solution 1:

Post a Comment for "Distinct Contiguous Blocks In Pandas Dataframe"