How To Speed Up Reading From Compressed Hdf5 Files

July 31, 2024 Post a Comment

I have several big HDF5 file stored on an SSD (lzf compressed file size is 10–15 GB, uncompressed size would be 20–25 GB). Reading the contents from such a file into RAM for fu

Solution 1:

h5py handles decompression of LZF files via a filter. The source code of the filter, implemened in C, is available on the h5py Github here. Looking at the implementation of lzf_decompress, which is the function causing your bottleneck, you can see it's not parallelized (No idea if it's even parallelizable, I'll leave that judgement to people more familiar to LZF inner workings).

With that said, I'm afraid there's no way to just take your huge compressed file and multithread-decompress it. Your options, as far as I can tell, are:

Split the huge file in smaller individually-compressed chunks, parallel-decompress each chunk on a separate core (multiprocessing might help there but you'll need to take care about inter-process shared memory) and join everything back together after it's decompressed.
Just use uncompressed files.

Python Channel

How To Speed Up Reading From Compressed Hdf5 Files

Solution 1:

Post a Comment for "How To Speed Up Reading From Compressed Hdf5 Files"