Python Pandas: Replace Values Multiple Columns Matching Multiple Columns From Another Dataframe
I searched a lot for an answer, the closest question was Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python, but the answer to this p
Solution 1:
You can use the update
function (requires setting the matching criteria to index). I've modified your sample data to allow some mismatch.
# your data# =====================# df1 pos is modified from 10020 to 10010print(df1)
chr snp xpos a1 a2
011-10020010010 G A
111-10056010056 C G
211-10108010108 C G
311-10109010109 C G
411-10139010139 C T
print(df2)
ID CHR STOP OCHR OSTOP
0 rs376643643 1100401100201 rs373328635 1100661100562 rs62651026 1102081101083 rs376007522 1102091101094 rs368469931 330247110139# processing# ==========================# set matching columns to multi-level index
x1 = df1.set_index(['chr', 'pos'])['snp']
x2 = df2.set_index(['OCHR', 'OSTOP'])['ID']
# call update function, this is inplace
x1.update(x2)
# replace the values in original df1
df1['snp'] = x1.values
print(df1)
chr snp xpos a1 a2
011-10020010010 G A
11 rs373328635 010056 C G
21 rs62651026 010108 C G
31 rs376007522 010109 C G
41 rs368469931 010139 C T
Solution 2:
Start by renaiming the columns you want to merge in df2
df2.rename(columns={'OCHR':'chr','OSTOP':'pos'},inplace=True)
Now merge on these columns
df_merged = pd.merge(df1, df2, how='inner', on=['chr', 'pos']) # you might have to preserve the df1 index at this stage, not sure
Next, you want to
updater = df_merged[['D','CHR','STOP']] #this will be your update frame
updater.rename( columns={'D':'snp','CHR':'chr','STOP':'pos'},inplace=True) # rename columns to update original
Finally update (see bottom of this link):
df1.update( df1_updater) #updates in place# chr snp x pos a1 a2#0 1 rs376643643 0 10040 G A#1 1 rs373328635 0 10066 C G#2 1 rs62651026 0 10208 C G#3 1 rs376007522 0 10209 C G#4 3 rs368469931 0 30247 C T
update works by matching index/column so you might have to string along the index of df1 for the entire process, then do df1_updater.re_index(...
before df1.update(df1_updater)
Post a Comment for "Python Pandas: Replace Values Multiple Columns Matching Multiple Columns From Another Dataframe"