Pandas - Identify Unique Triplets From A Df
I have a dataframe which represents unique items. Each item is uniquely identified by a set of varA, varB, and varC (so each item has 0 to n values for varA, varB, or varC). My df
Solution 1:
You can use chained boolean indexing using duplicated (pd.Series.duplicated):
If you want to keep the first occurence of a duplicated:
myfilter = ~df.varA.duplicated(keep='first') & \
           ~df.varB.duplicated(keep='first') & \
           ~df.varC.duplicated(keep='first')
If you don't want to
myfilter = ~df.varA.duplicated(keep=False) & \
           ~df.varB.duplicated(keep=False) & \
           ~df.varC.duplicated(keep=False)
Then you can for example give these an incremental uniqueID:
df.ix[myfilter, 'uniqueID'] = np.arange(myfilter.sum(), dtype='int')
df
   ID varA varB varC  uniqueID
0   1    a    b    c       0.0
1   2    d    e    f       1.0
2   3    a    k    l       NaN
3   4    m    e  NaN       NaN
4   5    Z  NaN    t       2.0
Post a Comment for "Pandas - Identify Unique Triplets From A Df"