Skip to content Skip to sidebar Skip to footer

Skip Specific Set Of Columns When Reading Excel Frame - Pandas

I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this: import pandas as pd

Solution 1:

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed'notin x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

Solution 2:

You can use the following technique. Let the columns we don't want(want to skip) are 2 5 8, then find all reamining columns we DO WANT TO KEEP as cols such that:

In [7]: cols2skip = [2,5,8]  
In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then we can use those remaining columns(which we DO WANT TO KEEP) using usecols:

df = pd.read_excel(filename, usecols=cols)

Post a Comment for "Skip Specific Set Of Columns When Reading Excel Frame - Pandas"