Skip to content Skip to sidebar Skip to footer

Python Pandas: Replace Values Multiple Columns Matching Multiple Columns From Another Dataframe

I searched a lot for an answer, the closest question was Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python, but the answer to this p

Solution 1:

You can use the update function (requires setting the matching criteria to index). I've modified your sample data to allow some mismatch.

# your data# =====================# df1 pos is modified from 10020 to 10010print(df1)

   chr      snp  xpos a1 a2
011-10020010010  G  A
111-10056010056  C  G
211-10108010108  C  G
311-10109010109  C  G
411-10139010139  C  T

print(df2)

            ID  CHR   STOP  OCHR  OSTOP
0  rs376643643    1100401100201  rs373328635    1100661100562   rs62651026    1102081101083  rs376007522    1102091101094  rs368469931    330247110139# processing# ==========================# set matching columns to multi-level index
x1 = df1.set_index(['chr', 'pos'])['snp']
x2 = df2.set_index(['OCHR', 'OSTOP'])['ID']
# call update function, this is inplace
x1.update(x2)
# replace the values in original df1
df1['snp'] = x1.values
print(df1)

   chr          snp  xpos a1 a2
011-10020010010  G  A
11  rs373328635  010056  C  G
21   rs62651026  010108  C  G
31  rs376007522  010109  C  G
41  rs368469931  010139  C  T

Solution 2:

Start by renaiming the columns you want to merge in df2

df2.rename(columns={'OCHR':'chr','OSTOP':'pos'},inplace=True)

Now merge on these columns

df_merged = pd.merge(df1, df2, how='inner', on=['chr', 'pos']) # you might have to preserve the df1 index at this stage, not sure

Next, you want to

updater = df_merged[['D','CHR','STOP']] #this will be your update frame
updater.rename( columns={'D':'snp','CHR':'chr','STOP':'pos'},inplace=True) # rename columns to update original

Finally update (see bottom of this link):

df1.update( df1_updater) #updates in place#  chr          snp  x    pos a1 a2#0   1  rs376643643  0  10040  G  A#1   1  rs373328635  0  10066  C  G#2   1   rs62651026  0  10208  C  G#3   1  rs376007522  0  10209  C  G#4   3  rs368469931  0  30247  C  T

update works by matching index/column so you might have to string along the index of df1 for the entire process, then do df1_updater.re_index(... before df1.update(df1_updater)

Post a Comment for "Python Pandas: Replace Values Multiple Columns Matching Multiple Columns From Another Dataframe"