How To Force Zlib To Decompress More Than X Bytes?

December 27, 2023 Post a Comment

I have a file that consists of compressed content plus a 32 byte header. The header contains info such as timestamp, compressed size, and uncompressed size. The file itself is abou

Solution 1:

You clearly have a corrupted file.

You won't be able to force zlib to ignore the corruption—and, if you did, you'd most likely get either 700MB of garbage, or some random amount of garbage, or… well, it depends on what the corruption is and where. But the chances that you could get anything useful are pretty slim.

zlib's blocks aren't random-accessable, or delimited, or even byte-aligned; it's very hard to tell when you've reached the next block unless you were able to handle the previous block.

Plus, the trees grow from block to block, so even if you could skip to the next block, your trees would be wrong, and you'd be decompressing garbage unless you get very, very lucky and don't need the broken part of the tree. Even worse, any block can restart the trees (or even switch the compressor); if you miss that, you're decompressing garbage even if you do get very lucky. And it's not just a matter of "skip this string because I don't recognize it", you don't even know how many bits long the string is if you don't recognize, so you can't skip it. Which brings us back to the first point—you can't even skip a single string, much less a whole block.

To understand this better, see RFC 1951, which describes the format used by zlib. Try manually working through a few trivial examples (just a couple strings in the first block, a couple new ones in the second block) to see how easy it is to corrupt them in a way that's hard to undo (unless you know exactly how they were corrupted). It's not impossible (after all, cracking encrypted messages isn't impossible), but I don't believe it could be fully automated, and it's not something you're likely to do for fun.

If you've got critical data (and can't just re-download it, roll back to the previous version, restore from backup, etc.), some data recovery services claim to be able to recover corrupted zlib/gz/zip files. I'm guessing this costs an arm and a leg, but it may be the right answer for the right data.

And of course I could be wrong about this not being automatable. There are a bunch of zip recovery tools out there. As far as I know, all they can do with broken zlib streams is skip that file and recover the other files… but maybe some of them have some tricks that work in some cases with broken streams.

Solution 2:

You need to check zlib.error to see why it stopped. Why did it stop?

Python Channel

How To Force Zlib To Decompress More Than X Bytes?

Solution 1:

Solution 2:

Post a Comment for "How To Force Zlib To Decompress More Than X Bytes?"