How To Store And Load Huge Images Dataset?
I have a large image dataset to store. I have 300,000 images. Each image is a vector of 28800 pixels, which means that I have a matrix of (300000, 28800) I stored that as follow im
Solution 1:
If you're saving 300,000 x 28,000 data to csv, then assuming a float representation you're looking at an output file size of just shy of a terabyte, depending on the precision of the output. Even if you have a terabyte of disk space lying around, CSV is incredibly inefficient at this scale.
I'd suggest some binary storage scheme in this case (e.g. hdf5). You might check out the xarray package: it's pretty well-suited to working with dense array data of this size, it has an API that's very similar to NumPy, and it even leverages Dask for transparent support of parallel and/or memory-mapped computation.
Post a Comment for "How To Store And Load Huge Images Dataset?"