-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Skip NameObject when building outline #1068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip NameObject when building outline #1068
Conversation
@MartinThoma Think I got it now. Thanks again for the help. |
Never mind, need to update some tests it seems. |
I tried numerous "bad" PDFs but couldn't get this error to raise. Will have to look into creating a PDF with garbage anchors and uploading to the samples repo. |
@LamerLink Could it be that your PR just solved an issue that def test_unexpected_destination():
url = "https://corpora.tika.apache.org/base/docs/govdocs1/913/913678.pdf"
name = "tika-913678.pdf"
reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name)))
merger = PdfMerger()
merger.append(reader) in this PR? |
@MartinThoma Good call, updated. I thought about adding an Looks like all checks have passed! |
Codecov Report
@@ Coverage Diff @@
## main #1068 +/- ##
=======================================
Coverage 92.26% 92.26%
=======================================
Files 24 24
Lines 4794 4794
Branches 990 990
=======================================
Hits 4423 4423
Misses 230 230
Partials 141 141 Continue to review full report at Codecov.
|
@LamerLink Thank you for your effort. There was another PR #1076 which also solved the same issue, but that one is clearer to me why it solves the issue. As the issue is now (hopefully) gone, this PR seems no longer necessary. Please let me know if I got something wrong - I can always re-open a PR |
@MartinThoma No problem at all, I read over that PR and it makes sense to me! Thanks for all the effort. |
I've just noticed that your pr increases coverage quite a bit. That is unexpected to me. I want to understand that better |
I'm not sure why it originally showed that the test coverage was increased by that much. If we would do it now with #1158, the coverage would decrease slightly (which is what I expected) |
Fixes #193, uses @hannal's solution to resolve read/merge issues with wkhtmltopdf pdfs by skipping NameObject entities. Related to #778.