Skip to content

zlib.error: Error -3 while decompressing data: incorrect data check #422

@zegrep

Description

@zegrep

We have do deal with a huge amount of broken PDF files. The creator is "jsPDF 1.x-master". These files are not totally corrupted. It would be nice to get the readable content.
I found a solution on stackoverflow and it works fine for our needs.

PyPDF2/filters.py

    def decompress(data):
        try:
            return zlib.decompress(data)
        except zlib.error:
            return decompress_corrupted(data)

    def decompress_corrupted(data):
        d = zlib.decompressobj(zlib.MAX_WBITS | 32)
        f = StringIO(data)
        result_str = b''
        buffer = f.read(1)
        try:
            while buffer:
                result_str += d.decompress(buffer)
                buffer = f.read(1)
        except zlib.error:
            pass
        return result_str

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions