Skip to content Skip to sidebar Skip to footer

Python Applymap Taking Time To Run

I have a matrix of data ( 55K X8.5k) with counts. Most of them are zeros, but few of them would be like any count. Lets say something like this: a b c 0 4 3 3 1 1 2 1 2

Solution 1:

UPDATE:

read this topic and this issue in regards to your error

Try to save your DF as HDF5 - it's much more convenient.

You may also want to read this comparison...

OLD answer:

try this:

In[110]: (df>0).astype(np.int8)
Out[110]:
   abc01111111211031014101

.applymap() - one of the slowest method, because it goes to each cell (basically it performs nested loops inside).

df>0 works with vectorized data, so it does it much faster

.apply() - will work faster than .applymap() as it works on columns, but still much slower compared to df>0

UPDATE2: time comparison on a smaller DF (1000 x 1000), as applymap() will take ages on (55K x 9K) DF:

In [5]: df = pd.DataFrame(np.random.randint(0, 10, size=(1000, 1000)))

In [6]: %timeit df.applymap(lambda x: np.where(x >0, 1, 0))
1 loop, best of 3: 3.75 s per loop

In [7]: %timeit df.apply(lambda x: np.where(x >0, 1, 0))
1 loop, best of 3: 256 ms per loop

In [8]: %timeit (df>0).astype(np.int8)
100 loops, best of 3: 2.95 ms per loop

Solution 2:

You could use a scipy sparsematrix. This would make the calculations only relevant to the data that is actually there instead of operating on all the zeros.

Post a Comment for "Python Applymap Taking Time To Run"