Python Applymap Taking Time To Run
I have a matrix of data ( 55K X8.5k) with counts. Most of them are zeros, but few of them would be like any count. Lets say something like this: a b c 0 4 3 3 1 1 2 1 2
Solution 1:
UPDATE:
read this topic and this issue in regards to your error
Try to save your DF as HDF5 - it's much more convenient.
You may also want to read this comparison...
OLD answer:
try this:
In[110]: (df>0).astype(np.int8)
Out[110]:
abc01111111211031014101
.applymap()
- one of the slowest method, because it goes to each cell (basically it performs nested loops inside).
df>0
works with vectorized data, so it does it much faster
.apply()
- will work faster than .applymap()
as it works on columns, but still much slower compared to df>0
UPDATE2: time comparison on a smaller DF (1000 x 1000), as applymap()
will take ages on (55K x 9K) DF:
In [5]: df = pd.DataFrame(np.random.randint(0, 10, size=(1000, 1000)))
In [6]: %timeit df.applymap(lambda x: np.where(x >0, 1, 0))
1 loop, best of 3: 3.75 s per loop
In [7]: %timeit df.apply(lambda x: np.where(x >0, 1, 0))
1 loop, best of 3: 256 ms per loop
In [8]: %timeit (df>0).astype(np.int8)
100 loops, best of 3: 2.95 ms per loop
Solution 2:
You could use a scipy sparsematrix. This would make the calculations only relevant to the data that is actually there instead of operating on all the zeros.
Post a Comment for "Python Applymap Taking Time To Run"