Python Applymap Taking Time To Run

February 22, 2024 Post a Comment

I have a matrix of data ( 55K X8.5k) with counts. Most of them are zeros, but few of them would be like any count. Lets say something like this: a b c 0 4 3 3 1 1 2 1 2

Solution 1:

UPDATE:

read this topic and this issue in regards to your error

Try to save your DF as HDF5 - it's much more convenient.

You may also want to read this comparison...

OLD answer:

try this:

In[110]: (df>0).astype(np.int8)
Out[110]:
   abc01111111211031014101

.applymap() - one of the slowest method, because it goes to each cell (basically it performs nested loops inside).

Baca Juga

df>0 works with vectorized data, so it does it much faster

.apply() - will work faster than .applymap() as it works on columns, but still much slower compared to df>0

UPDATE2: time comparison on a smaller DF (1000 x 1000), as applymap() will take ages on (55K x 9K) DF:

In [5]: df = pd.DataFrame(np.random.randint(0, 10, size=(1000, 1000)))

In [6]: %timeit df.applymap(lambda x: np.where(x >0, 1, 0))
1 loop, best of 3: 3.75 s per loop

In [7]: %timeit df.apply(lambda x: np.where(x >0, 1, 0))
1 loop, best of 3: 256 ms per loop

In [8]: %timeit (df>0).astype(np.int8)
100 loops, best of 3: 2.95 ms per loop

Solution 2:

You could use a scipy sparsematrix. This would make the calculations only relevant to the data that is actually there instead of operating on all the zeros.

Python Channel

Python Applymap Taking Time To Run

Solution 1:

Solution 2:

Post a Comment for "Python Applymap Taking Time To Run"