Skip to content Skip to sidebar Skip to footer

Pandas - Replace Multiple Column Values With Previous Column Value When Condition Is Met

I have a large dataframe that looks like this: Start End Alm_No1 Val_No1 Alm_No2 Val_No2 Alm_No3 Val_No3 1/1/19 0:00 1/2/19 0:00 1 0 2 1 3

Solution 1:

Never loop over the rows of a dataframe. You should set columns all in one operation.

for i in range(1,4): 
    df[f'Val_No{i}'] *= df[f'Alm_No{i}'] 

Solution 2:

I feel silly answering my own questions just a few minutes later but I think I found something that works:

for x in val_list:
    df.loc[df.iloc[:,x]==1,df.columns[x]] = df.iloc[:, x-1]

Worked like a charm!

234 ms ± 15.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 3:

I came up with a solution working for arbitrary number of Alm_No... / Val_No... columns.

Let's start from a function to be applied to each row:

def fn(row):
    for i in range(2, row.size, 2):
        j = i + 1
        if row.iloc[j]:
            row.iloc[j] = row.iloc[i]
    return row

Note the construction of the for loop. It starts from 2 (position of Alm_No1 column), with step 2 (the distance to Alm_No2 column).

j holds the number of the next column (Val_No...).

If the "current" Val_No != 0 then substitute here the value from the "current" Alm_No.

After the loop completes the changed row is returned.

So the only thing to do is to apply this function to each row:

df.apply(fn, axis=1)

My timeit measurements indicated that my solution runs a little (7 %) quicker than yours and about 35 times quicker than the one proposed by BallpointBen.

Apparently, the usage of f-strings has some share in this (quite significant) difference.


Post a Comment for "Pandas - Replace Multiple Column Values With Previous Column Value When Condition Is Met"