Skip to content Skip to sidebar Skip to footer

Efficiently Obtaining The Union Of Pandas Indices

I have two pandas dataframes df1 and df2 and I want their 'merged index'. By that I mean the index that is obtained when I do for instance df1.add(df2, fill_value=0).index (basical

Solution 1:

I finally found out that pandas Index object had an __or__ implementation.

Hopefully the following version of associate_tag avoids superfluous operations:

from operator import or_ as union
from itertools import repeat
from functools import reduce

defassociate_tag(dfs, tag):   
    idx = reduce(union, (df.index for df in dfs))
    return pd.DataFrame(list(zip(idx, repeat(tag)))).set_index(0)

Solution 2:

Based on your comment here is an amended solution:

Two parts: Combining your dataframes, depending on your column names, you could just pd.concat your whole list of dataframes once you've made sure the column names line up. So if: dfA_1 is:

       col1  col2
index            
idA_1     2     0
idA_2     1     0
idA_3     0     2

and dfA_2 is:

       col1  col2  col3
index                  
idA_1     3     2     1
idA_3     2     6     2
idA_4     4     0     2

then

final = pd.concat([dfA_1,dfA_2])

final
       col1  col2  col3
index                  
idA_1     20NaN
idA_2     10NaN
idA_3     02NaN
idA_1     321.0
idA_3     262.0
idA_4     402.0

To fill those NaNs with zeros:

final.fillna(0, inplace=True)

Part 2, the tags: Once you have that creating the tags is as easy as defining a map for the index, you can either write a simple function, hardcode a dict, or use a lambda:

final['tag'] = final.index.map(lambda x: x[2])

final
       col1  col2  col3 tag
index                      
idA_1     200.0   A
idA_2     100.0   A
idA_3     020.0   A
idA_1     321.0   A
idA_3     262.0   A
idA_4     402.0   A

Post a Comment for "Efficiently Obtaining The Union Of Pandas Indices"