Efficiently Obtaining The Union Of Pandas Indices
I have two pandas dataframes df1 and df2 and I want their 'merged index'. By that I mean the index that is obtained when I do for instance df1.add(df2, fill_value=0).index (basical
Solution 1:
I finally found out that pandas Index
object had an __or__
implementation.
Hopefully the following version of associate_tag
avoids superfluous operations:
from operator import or_ as union
from itertools import repeat
from functools import reduce
defassociate_tag(dfs, tag):
idx = reduce(union, (df.index for df in dfs))
return pd.DataFrame(list(zip(idx, repeat(tag)))).set_index(0)
Solution 2:
Based on your comment here is an amended solution:
Two parts: Combining your dataframes, depending on your column names, you could just pd.concat your whole list of dataframes once you've made sure the column names line up. So if: dfA_1 is:
col1 col2
index
idA_1 2 0
idA_2 1 0
idA_3 0 2
and dfA_2 is:
col1 col2 col3
index
idA_1 3 2 1
idA_3 2 6 2
idA_4 4 0 2
then
final = pd.concat([dfA_1,dfA_2])
final
col1 col2 col3
index
idA_1 20NaN
idA_2 10NaN
idA_3 02NaN
idA_1 321.0
idA_3 262.0
idA_4 402.0
To fill those NaNs with zeros:
final.fillna(0, inplace=True)
Part 2, the tags: Once you have that creating the tags is as easy as defining a map for the index, you can either write a simple function, hardcode a dict, or use a lambda:
final['tag'] = final.index.map(lambda x: x[2])
final
col1 col2 col3 tag
index
idA_1 200.0 A
idA_2 100.0 A
idA_3 020.0 A
idA_1 321.0 A
idA_3 262.0 A
idA_4 402.0 A
Post a Comment for "Efficiently Obtaining The Union Of Pandas Indices"