Skip to content Skip to sidebar Skip to footer

Pandas - Identify Unique Triplets From A Df

I have a dataframe which represents unique items. Each item is uniquely identified by a set of varA, varB, and varC (so each item has 0 to n values for varA, varB, or varC). My df

Solution 1:

You can use chained boolean indexing using duplicated (pd.Series.duplicated):

If you want to keep the first occurence of a duplicated:

myfilter = ~df.varA.duplicated(keep='first') & \
           ~df.varB.duplicated(keep='first') & \
           ~df.varC.duplicated(keep='first')

If you don't want to

myfilter = ~df.varA.duplicated(keep=False) & \
           ~df.varB.duplicated(keep=False) & \
           ~df.varC.duplicated(keep=False)

Then you can for example give these an incremental uniqueID:

df.ix[myfilter, 'uniqueID'] = np.arange(myfilter.sum(), dtype='int')
df


   ID varA varB varC  uniqueID
0   1    a    b    c       0.0
1   2    d    e    f       1.0
2   3    a    k    l       NaN
3   4    m    e  NaN       NaN
4   5    Z  NaN    t       2.0

Post a Comment for "Pandas - Identify Unique Triplets From A Df"