Migrate PDF Processing Core from Marker to Docling

**Title:** Migrate PDF Processing Core from Marker to Docling

**Description:**

We are currently using **Marker** as the core PDF parsing module in our system. This issue aims to **replace Marker with Docling as the main PDF parsing engine.

The updated implementation must **retain the same functionality** and **output structure** as the current Marker-based version to ensure backward compatibility with downstream processing components in Omniparse.

### 🧪 Requirements

* [ ] Replace Marker with Docling in the PDF parsing core.
* [ ] Ensure the output format is identical to what Marker currently produces (or provide a compatibility adapter).
* [ ] All existing test cases for Marker must pass with Docling.
* [ ] Provide a Google Colab notebook demonstrating the updated implementation and validating its output with test PDFs.
* [ ] Ensure performance is comparable or better than Marker in terms of speed and memory usage.

### 🛠️ Tips

* Check out Docling's segment and node extraction tools—they should map closely to Marker’s annotation and token-level representations.
* You may need to write a thin compatibility layer to normalize Docling outputs to Marker-style structures.

### ✅ Acceptance Criteria

* Functionality parity with Marker: same sections, headers, paragraphs, tokens.
* Tests green ✅ in CI.
* Colab notebook demo included and reproducible.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate PDF Processing Core from Marker to Docling #109

🧪 Requirements

🛠️ Tips

✅ Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Migrate PDF Processing Core from Marker to Docling #109

Description

🧪 Requirements

🛠️ Tips

✅ Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions