Skip to content Skip to sidebar Skip to footer

Dataset Selective Picking And Transformation

I have a dataset in .xlsx with hundreds of thousands of rows as follow: slug symbol name date ranknow open high low close volume market close_ratio spread compa

Solution 1:

In python using pandas, this should work.

import pandas as pd

df = pd.read_excel("/path/to/file/Book1.xlsx")
df = df.loc[:, ['symbol', 'name', 'date', 'close']]
df = df.set_index(['symbol', 'name', 'date'])
df = df.unstack(level=[0,1])
df = df['close']

to read the symbols file file and then filter out symbols not in the dataframe:

symbols = pd.read_csv('/path/to/file/symbols.txt', sep=" ", header=None)
symbols = symbols[0].tolist()
symbols = pd.Index(symbols).unique()
symbols = symbols.intersection(df.columns.get_level_values(0))

And the output will look like:

print(df[symbols])


symbol                   AAA        LA        YC
name                companyA    Lancer   Yocomin
date                                            
2018-09-01 00:00:00     None  0,422736      None
2018-10-01 00:00:00     None  0,487106      None
2018-11-01 00:00:00     None  0,331977      None

Post a Comment for "Dataset Selective Picking And Transformation"