-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
We have do deal with a huge amount of broken PDF files. The creator is "jsPDF 1.x-master". These files are not totally corrupted. It would be nice to get the readable content.
I found a solution on stackoverflow and it works fine for our needs.
PyPDF2/filters.py
def decompress(data):
try:
return zlib.decompress(data)
except zlib.error:
return decompress_corrupted(data)
def decompress_corrupted(data):
d = zlib.decompressobj(zlib.MAX_WBITS | 32)
f = StringIO(data)
result_str = b''
buffer = f.read(1)
try:
while buffer:
result_str += d.decompress(buffer)
buffer = f.read(1)
except zlib.error:
pass
return result_str
chinmaydd and Pragabhava
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness