Skip to content

Issue with Markdown output (textractprettyprinter) #274

@jpbalarini

Description

@jpbalarini

There's an issue when I get the text in Markdown format. For some reason, all the lists duplicate the text. First as "plaintext" and then with the proper Markdown format.

Here's how I'm generating my Markdown file:

input_document='s3://.../MY_FILE.pdf'

textract_json = call_textract(
  input_document=input_document, features=[Textract_Features.LAYOUT, Textract_Features.TABLES]
)
layout = get_text_from_layout_json(
  textract_json=textract_json,
  generate_markdown=True,
  exclude_page_header=True,
  exclude_page_footer=True,
  save_txt_path="./output"
)

Example:
Screenshot 2023-11-13 at 16 12 39

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions