Dataset Selective Picking And Transformation
I have a dataset in .xlsx with hundreds of thousands of rows as follow: slug symbol name date ranknow open high low close volume market close_ratio spread compa
Solution 1:
In python using pandas, this should work.
import pandas as pd
df = pd.read_excel("/path/to/file/Book1.xlsx")
df = df.loc[:, ['symbol', 'name', 'date', 'close']]
df = df.set_index(['symbol', 'name', 'date'])
df = df.unstack(level=[0,1])
df = df['close']
to read the symbols file file and then filter out symbols not in the dataframe:
symbols = pd.read_csv('/path/to/file/symbols.txt', sep=" ", header=None)
symbols = symbols[0].tolist()
symbols = pd.Index(symbols).unique()
symbols = symbols.intersection(df.columns.get_level_values(0))
And the output will look like:
print(df[symbols])
symbol AAA LA YC
name companyA Lancer Yocomin
date
2018-09-01 00:00:00 None 0,422736 None
2018-10-01 00:00:00 None 0,487106 None
2018-11-01 00:00:00 None 0,331977 None
Post a Comment for "Dataset Selective Picking And Transformation"