Skip to content

[Bug]: Missing empty struct nodes via getStructTree #20324

@edoardocavazza

Description

@edoardocavazza

Attach (recommended) or Link to PDF file

The getStructTree builds up the tree starting from content refs.
If a struct node (such as a TD) does not include a content, no StructElementNode is created for that branch.

I don't know if this is actually a bug or the intended behavior. If you don't consider it a bug, would you accept a PR that modifies this behavior via a property to get the full tree?

Web browser and its version

Operating system and its version

PDF.js version

v5.4.149

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

  1. Load the attached PDF
  2. Get page 1
  3. Get structured tree for page 1

table.pdf

What is the expected behavior?

The table branch should contain all 4 TD nodes.

What went wrong?

Table contains the first two TD nodes only (The first has the word word, the second a whitespace, third and fourth are empty).

Link to a viewer

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions