Landscape tables result in jumbled text when using extractor.start_document_analysis with TextractFeatures.TABLES

❌  Tested on both v.1.8.5 & v1.9.0 and both fail

Example Document:
Page 8 & 9 of this document ([07432326.pdf](https://github.com/user-attachments/files/19191484/07432326.pdf)) have tables in landscape

Expected:

Broxtowe Borough Council | Foster Avenue, Beeston, Nottingham, NG9 1AB
-- | --

Actual:

Council Borough Broxtowe | 1AB NG9 Nottingham, Beeston, Avenue, Foster
-- | --

> [!NOTE]  
> This issue does not exist on portrait tables

Full textraction:
[07432326_ocr.txt](https://github.com/user-attachments/files/19191898/07432326_ocr.txt)

```py
extractor = Textractor()

document = extractor.start_document_analysis(
                    file_source=xxxx,
                    save_image=False,
                    features=[TextractFeatures.TABLES],
                    s3_upload_path=xxxx,
                )

return document.response
```

> [!IMPORTANT]  
> ✅ This used to work on **v1.4.5** - here's the same document on that version

Example extraction:
[v1.4.5.txt](https://github.com/user-attachments/files/19193251/v1.4.5.txt)


Broxtowe Borough Council | Foster Avenue, Beeston, Nottingham, NG9 1AB
-- | --

and we actually get:

Broxtowe Borough Council | Foster Avenue, Beeston, Nottingham, NG9 1AB
-- | --

Here's a diff between textractor.py v.1.4.5 (left) and v.1.9.0 (right) https://www.diffchecker.com/4JsE2FLv/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Landscape tables result in jumbled text when using extractor.start_document_analysis with TextractFeatures.TABLES #420

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Landscape tables result in jumbled text when using extractor.start_document_analysis with TextractFeatures.TABLES #420

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions