New VLM JSON to stylized HTML notebook draft #17

CodesLikeIcarus · 2025-09-18T01:07:07Z

No description provided.

review-notebook-app · 2025-09-18T01:07:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

MKhalusova

Thank you for adding the notebook! This is a really cool example for our collection, however I do have some feedback we should address before we publish it.
Mainly, we need to add some narrative, more context, examples, and guide the reader through the notebook.

Detailed feedback (not necessarily in order):

In the intro, you briefly mention why one would use it - “perfect for web apps, knowledge bases, or downstream AI workflows.”  Start with the “why”. Got lots of unstructured data in various formats that you’d like to visually standardize for XYZ…. You can do it with Unstructured. 2 sentences about what Unstructured does, and what the standard output of the VLM parser contains. Give a bit more context for the reader, who’s new to Unstructured. Give a TLDR of what the reader will learn to do. Then move on to the steps.
List prerequisites: Unstructured API key + uploading some PDF to your google colab environment. Check an example of prerequisites here: https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Redis.ipynb
Add a sentence or two on environment variables management.
Maybe explain why only certain file types are eligible for VLM partitioner? Also, you’re not using the VLM_ELIGIBLE_FILE_TYPES. Maybe replace it with markdown narrative about supported file types?
Please explain what type of partitioners Unstructured offers, what we recommend the VLM partitioner for, and why in this case it is the only option.
Mention that in this example for illustration purposes you’re partitioning just one file, but if they want to do this at scale for many files, they would need to use the Workflows Endpoint with connectors. Link to the docs for the partition endpoint: https://docs.unstructured.io/api-reference/partition/overview, and for the workflow endpoint: https://docs.unstructured.io/api-reference/workflow/overview
After you partition the file, print out an example of an element. This helps to make it more interactive. Reader can understand what we’re working with, and also when they try it for themselves, this gives them a checkpoint.
Explain what happens in every step, guide the learner. E.g. Step 2 is a lot of code, explain how you leverage the metadata. Link to the docs where we explain what the our output JSON looks like. Ideally, show an example of an element. https://docs.unstructured.io/api-reference/partition/document-elements
Step 3: Stylize the outputs. Say a sentence or two about this Step.
Show a screenshot of the result in a markdown cell to give the reader an idea of what the output looks like. Add the wow factor.
Write a conclusion with what could be next steps (e.g. building a production workflow with connectors, and then applying the style to batches of processed documents).
Add a CTA - encourage the reader to try it with their documents. Lead them to sign up for the platform, mention the free trial
Minor nitpicks: “# Platform Partition URL” -> “Unstructured Partition Endpoint URL”

CodesLikeIcarus · 2025-09-18T21:15:42Z

Great feedback. I'll address each of these and push an update.

cursor · 2025-09-19T23:22:18Z

notebooks/Convert_Documents_to_Stylized_HTML_using_the_Unstructured_API.ipynb

+      },
+      "outputs": [],
+      "source": [
+        "file_as_html = VlmJsonToHtmlConverter(title).convert(json_data)\n",


Bug: Undefined Variables Cause Conversion Error

The notebook attempts to convert JSON to HTML using VlmJsonToHtmlConverter(title).convert(json_data), but both title and json_data are undefined. The title variable is never declared, and json_data is likely a typo for file_as_json, which was defined earlier. This results in a NameError when the cell executes.

New notebook draft

c7bab36

CodesLikeIcarus requested a review from MKhalusova September 18, 2025 01:07

This comment was marked as outdated.

Sign in to view

MKhalusova requested changes Sep 18, 2025

View reviewed changes

Daniel Schofield added 2 commits September 19, 2025 18:18

Progress - still missing CTA and migration away from partition endpoint

f3da8d5

Removed test doc

5d2e55d

cursor bot reviewed Sep 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New VLM JSON to stylized HTML notebook draft #17

New VLM JSON to stylized HTML notebook draft #17

Uh oh!

CodesLikeIcarus commented Sep 18, 2025

Uh oh!

review-notebook-app bot commented Sep 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

MKhalusova left a comment

Uh oh!

CodesLikeIcarus commented Sep 18, 2025

Uh oh!

cursor bot Sep 19, 2025

Uh oh!

Uh oh!

New VLM JSON to stylized HTML notebook draft #17

Are you sure you want to change the base?

New VLM JSON to stylized HTML notebook draft #17

Uh oh!

Conversation

CodesLikeIcarus commented Sep 18, 2025

Uh oh!

review-notebook-app bot commented Sep 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

MKhalusova left a comment

Choose a reason for hiding this comment

Uh oh!

CodesLikeIcarus commented Sep 18, 2025

Uh oh!

cursor bot Sep 19, 2025

Choose a reason for hiding this comment

Bug: Undefined Variables Cause Conversion Error

Uh oh!

Uh oh!