Pandas - Identify Unique Triplets From A Df
I have a dataframe which represents unique items. Each item is uniquely identified by a set of varA, varB, and varC (so each item has 0 to n values for varA, varB, or varC). My df
Solution 1:
You can use chained boolean indexing using duplicated
(pd.Series.duplicated):
If you want to keep the first occurence of a duplicated:
myfilter = ~df.varA.duplicated(keep='first') & \
~df.varB.duplicated(keep='first') & \
~df.varC.duplicated(keep='first')
If you don't want to
myfilter = ~df.varA.duplicated(keep=False) & \
~df.varB.duplicated(keep=False) & \
~df.varC.duplicated(keep=False)
Then you can for example give these an incremental uniqueID:
df.ix[myfilter, 'uniqueID'] = np.arange(myfilter.sum(), dtype='int')
df
ID varA varB varC uniqueID
0 1 a b c 0.0
1 2 d e f 1.0
2 3 a k l NaN
3 4 m e NaN NaN
4 5 Z NaN t 2.0
Post a Comment for "Pandas - Identify Unique Triplets From A Df"