Skip to content

Conversation

Coniferish
Copy link
Contributor

@Coniferish Coniferish commented Dec 11, 2023

Closes #2212.

Summary

This PR implements logic to fall back to the "inferred_layout + OCR" if pdfminer fails in the hi_res pipeline (discussed in this slack channel.

Testing

PDF: NASA-SNA-8-D-027III-Rev2-CsmLmSpacecraftOperationalDataBook-Volume3-MassProperties-pg856.pdf

elements = partition_pdf(
    filename="NASA-SNA-8-D-027III-Rev2-CsmLmSpacecraftOperationalDataBook-Volume3-MassProperties-pg856.pdf",
    strategy="hi_res",
)

@Coniferish
Copy link
Contributor Author

@christinestraub, I'm still hitting the TypeError with these changes when partitioning via hi_res

@christinestraub
Copy link
Contributor

christinestraub commented Dec 11, 2023

@christinestraub, I'm still hitting the TypeError with these changes when partitioning via hi_res

@Coniferish Does it get the TypeError with the result? or only raise the TypeError?

@Coniferish
Copy link
Contributor Author

@christinestraub Ah, it gets the TypeError and the results

@christinestraub
Copy link
Contributor

christinestraub commented Dec 11, 2023

@Coniferish For clarity, I reverted the changes (refactoring) in the "fast" strategy workflow. I think it would be better to do this refactoring work in any future PR (a refactor PR).

@christinestraub christinestraub added this pull request to the merge queue Dec 13, 2023
Merged via the queue into main with commit d3a404c Dec 13, 2023
@christinestraub christinestraub deleted the jj/2212-pdfminer-bug branch December 13, 2023 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: pdf partitioning fails for certain NASA docs

2 participants