Pandas: Need A Speedier Way Of Index Slicing
Anyone care to take a stab at speeding up this dataframe index slicing scheme? I'm trying to slice and dice some huge dataframes, so every bit counts. I need to somehow find a fast
Solution 1:
You can use a dictionary comprehension together with loc
to do the dataframe indexing:
finDict = {pair: df.loc[pd.IndexSlice[:, pair[0], pair[1]], :]
for pair in pd.unique(initFrame[['bar1', 'bar4']].values).tolist()}
>>> finDict
{(5, 1): bar1 bar2 bar3 bar4
ifoo1 ifoo2 ifoo3
LABEL1 515 a 11116 b 222,
(6, 2): bar1 bar2 bar3 bar4
ifoo1 ifoo2 ifoo3
LABEL2 625 c 331,
(6, 3): bar1 bar2 bar3 bar4
ifoo1 ifoo2 ifoo3
LABEL2 636 d 443}
Solution 2:
I don't know what you really want to do, but here is some hint to speedup your code:
change
uniqueList = list(pd.unique(initFrame[['bar1','bar4']].values))
to
uniqueList = initFrame[["bar1", "bar4"]].drop_duplicates().values.tolist()
and the for loop to :
g = initFrame.groupby(level=(1, 2))
uniqueSet = set(uniqueList)
dict((key, df) forkey, df in g ifkeyin uniqueSet)
or:
g = initFrame.groupby(level=(1, 2))
dict((key, g.get_group(key)) forkeyin uniqueList)
Here is the %timeit compare:
import numpy as np
import pandas as pdarr= np.random.randint(0, 10, (10000, 2))
df = pd.DataFrame(arr, columns=("A", "B"))
%timeit df.drop_duplicates().values.tolist()
%timeit list(pd.unique(arr))
outputs:
100 loops, best of 3: 3.51 ms per loop10 loops, best of 3: 94.7 ms per loop
Solution 3:
Not as a answer but just to visualise a thought re my comment, since multi-indexes are grouped, we can simply & possibly just compare and skip the loop if value of ('bar1', 'bar4') equals to the previous value, then perform the dict update.
It may not be speedier, but if your dataset is huge, it could potentially save you a memory consumption problem, pseudo code:
# ...replace timer1...
prev, finDict = None, {}
for n in initFrame[['bar1', 'bar4']].iterrows():
current = (n[0][1], n[0][2])
if current == prev: continue
prev = current
#... whatever faster way to solve your 2nd timer...
Personally I think @Alexander answers your 2nd timer rather nicely.
Post a Comment for "Pandas: Need A Speedier Way Of Index Slicing"