Skip to content

gofpdi fails to correctly parse streams on some pdfs #53

@napalu

Description

@napalu

When reading some PDFs (seen this typically when importing scanned-in PDFs), gofpdi will fail to detect 'endstream', panicking with panic: Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: dstream.

When reading a PDF stream the reader should start reading stream after the first CRLF sequence but instead skips all leading whitespace which can result in reading past the 'endstream' token.

Here's a test PDF with described behaviour.
BRW2C6FC94B5488_000827.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions