-
Notifications
You must be signed in to change notification settings - Fork 1.1k
pdfminer bug #2244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdfminer bug #2244
Conversation
…ate variable names
…into helper function
e1e4ebb
to
0d07c5a
Compare
# Conflicts: # unstructured/partition/pdf_image/pdf.py
@christinestraub, I'm still hitting the TypeError with these changes when partitioning via |
@Coniferish Does it get the TypeError with the result? or only raise the TypeError? |
@christinestraub Ah, it gets the TypeError and the results |
@Coniferish For clarity, I reverted the changes (refactoring) in the "fast" strategy workflow. I think it would be better to do this refactoring work in any future PR (a refactor PR). |
…ed to process PDF page after repairing
Closes #2212.
Summary
This PR implements logic to fall back to the "inferred_layout + OCR" if pdfminer fails in the
hi_res
pipeline (discussed in this slack channel.Testing
PDF: NASA-SNA-8-D-027III-Rev2-CsmLmSpacecraftOperationalDataBook-Volume3-MassProperties-pg856.pdf