Compare Df1 Column 1 To All Columns In Df2 Returning The Index Of Df2
I'm new to pandas so likely overlooking something but I've been searching and haven't found anything helpful yet. What I'm trying to do is this. I have 2 dataframes. df1 has only
Solution 1:
I think you can use isin
for testing matching of Series
created from df2
by stack
with Series
created from one column df1
by squeeze
. Last reshape by unstack
:
df3 = df2.stack().isin(df1.squeeze()).unstack()
print (df3)
12345678302813476FalseFalseFalseFalseFalseFalseFalse8302813477FalseFalseFalseFalseFalseFalseFalse8302813478FalseFalseTrueFalseFalseFalseFalse
Then get find all values where at least one True
by any
:
a = df3.any(axis=1)
print (a)
8302813476False8302813477False8302813478True
dtype: bool
And last boolean indexing
:
print (a[a].index)
Int64Index([8302813478], dtype='int64')
Another solution is instead squeeze
use df1['col'].unique()
, thank you Ted Petrou:
df3 = df2.stack().isin(df1['col'].unique()).unstack()
print (df3)
12345678302813476FalseFalseFalseFalseFalseFalseFalse8302813477FalseFalseFalseFalseFalseFalseFalse8302813478FalseFalseTrueFalseFalseFalseFalse
---
I like squeeze
more, but same output is simple selecting column of df1
:
df3 = df2.stack().isin(df1['col']).unstack()
print (df3)
12345678302813476FalseFalseFalseFalseFalseFalseFalse8302813477FalseFalseFalseFalseFalseFalseFalse8302813478FalseFalseTrueFalseFalseFalseFalse
Solution 2:
As an interesting numpy alternative
l1 = df1.values.ravel()
l2 = df2.values.ravel()
pd.DataFrame(
np.equal.outer(l1, l2).any(0).reshape(df2.values.shape),
df2.index, df2.columns
)
or using set
, list
and comprehension
l1 = set(df1.values.ravel().tolist())
l2 = df2.values.ravel().tolist()
pd.DataFrame(
np.array([bool(l1.intersection([d])) for d in l2]).reshape(df2.values.shape),
df2.index, df2.columns
)
Post a Comment for "Compare Df1 Column 1 To All Columns In Df2 Returning The Index Of Df2"