HTML/XML to JSON Converter

A lightweight, zero-dependency library for bidirectional conversion between HTML/XML and JSON

Transform HTML/XML markup into clean JSON trees and render them back to markup with full fidelity. Perfect for parsing, manipulating, and generating HTML/XML programmatically.

Features

Zero Dependencies - Pure JavaScript, no external libraries required
TypeScript Support - Fully typed with comprehensive type definitions
Bidirectional - Parse HTML/XML to JSON and render JSON back to HTML/XML
High Fidelity - Preserves structure, attributes, text nodes, and comments
Lightweight - Minimal footprint, fast parsing
Flexible - Works with HTML and XML, supports namespaces
Sanitization Ready - Built-in option to ignore unwanted tags (script, style, etc.)
Pretty Printing - Optional formatted output with customizable indentation
Well Tested - 58 comprehensive tests covering all features

Installation

npm install @lemonadejs/html-to-json

Import Options

You can import both functions from the main package:

// Recommended: Import both from main package
import { parser, render } from '@lemonadejs/html-to-json';

TypeScript Usage

The library includes comprehensive type definitions:

import { parser, render, type Node, type ParserOptions, type RenderOptions } from '@lemonadejs/html-to-json';

// Fully typed parser with options
const options: ParserOptions = { ignore: ['script', 'style'] };
const tree: Node | undefined = parser('<div>Hello</div>', options);

// Fully typed renderer with options
const renderOpts: RenderOptions = { pretty: true, indent: '  ' };
const html: string = render(tree, renderOpts);

Quick Start

Parse HTML/XML to JSON

import { parser } from '@lemonadejs/html-to-json';

const html = '<div class="card"><h1>Title</h1><p>Content</p></div>';
const tree = parser(html);

console.log(JSON.stringify(tree, null, 2));

Output:

{
  "type": "div",
  "props": [
    { "name": "class", "value": "card" }
  ],
  "children": [
    {
      "type": "h1",
      "children": [
        {
          "type": "#text",
          "props": [{ "name": "textContent", "value": "Title" }]
        }
      ]
    },
    {
      "type": "p",
      "children": [
        {
          "type": "#text",
          "props": [{ "name": "textContent", "value": "Content" }]
        }
      ]
    }
  ]
}

Render JSON back to HTML/XML

import { parser, render } from '@lemonadejs/html-to-json';

const tree = parser('<div class="greeting">Hello World</div>');
const html = render(tree);

console.log(html);
// Output: <div class="greeting">Hello World</div>

Pretty Printing

import { render } from '@lemonadejs/html-to-json';

const tree = {
  type: 'article',
  props: [{ name: 'class', value: 'post' }],
  children: [
    {
      type: 'h2',
      children: [
        { type: '#text', props: [{ name: 'textContent', value: 'Article Title' }] }
      ]
    },
    {
      type: 'p',
      children: [
        { type: '#text', props: [{ name: 'textContent', value: 'Article content here.' }] }
      ]
    }
  ]
};

const html = render(tree, { pretty: true, indent: '  ' });

console.log(html);

Output:

<article class="post">
  <h2>
    Article Title
  </h2>
  <p>
    Article content here.
  </p>
</article>

📖 API Reference

`parser(html, options)`

Parses HTML or XML string into a JSON tree structure.

Parameters:

html (string) - The HTML or XML string to parse
options (Object, optional) - Parser options

Options:

Option	Type	Default	Description
`ignore`	string[]	`[]`	Array of tag names to ignore during parsing

Returns: Object - JSON tree representation

Examples:

// Basic parsing
const tree = parser('<div id="app">Hello</div>');

// Ignore script and style tags
const clean = parser(html, { ignore: ['script', 'style'] });

// Case-insensitive tag matching
const tree = parser('<div><SCRIPT>bad</SCRIPT></div>', { ignore: ['script'] });

`render(tree, options)`

Renders a JSON tree back into HTML or XML markup.

Parameters:

tree (Object|Array) - The JSON tree to render
options (Object, optional) - Rendering options

Options:

Option	Type	Default	Description
`pretty`	boolean	`false`	Format output with newlines and indentation
`indent`	string	`' '`	Indentation string (used when `pretty` is `true`)
`selfClosingTags`	string[]	See below*	Override default void elements list
`xmlMode`	boolean	`false`	Self-close all empty elements using `<tag />` syntax

*Default self-closing tags: area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr

Returns: string - Rendered HTML/XML markup

Examples:

// Basic rendering
const html = render(tree);

// Pretty printing
const formatted = render(tree, { pretty: true });

// Custom indentation
const tabbed = render(tree, { pretty: true, indent: '\t' });

// XML mode
const xml = render(tree, { xmlMode: true });

// Custom self-closing tags
const custom = render(tree, {
  selfClosingTags: ['br', 'hr', 'img', 'custom-element']
});

🎯 JSON Tree Structure

Element Node

{
  "type": "tagName",
  "props": [
    { "name": "attributeName", "value": "attributeValue" }
  ],
  "children": [...]
}

Text Node

{
  "type": "#text",
  "props": [
    { "name": "textContent", "value": "text content here" }
  ]
}

Comment Node

{
  "type": "#comments",
  "props": [
    { "name": "text", "value": " comment text " }
  ]
}

Template Wrapper (Multiple Root Elements)

{
  "type": "template",
  "children": [
    { "type": "div", ... },
    { "type": "span", ... }
  ]
}

📦 TypeScript Types

The library exports the following TypeScript types:

Core Types

Node - Union type for all possible node types (ElementNode | TextNode | CommentNode | TemplateNode)
ElementNode - HTML/XML element with type, props, and children
TextNode - Text content node with type: '#text'
CommentNode - Comment node with type: '#comments'
TemplateNode - Wrapper for multiple root elements with type: 'template'
NodeProp - Property object with name and value

Options Types

ParserOptions - Options for the parser function
RenderOptions - Options for the render function

import type {
  Node,
  ElementNode,
  TextNode,
  CommentNode,
  TemplateNode,
  NodeProp,
  ParserOptions,
  RenderOptions
} from '@lemonadejs/html-to-json';

💡 Use Cases

1. HTML Sanitization

import { parser, render } from '@lemonadejs/html-to-json';

// Remove potentially dangerous tags using the ignore option
function sanitizeHTML(html) {
  const tree = parser(html, {
    ignore: ['script', 'style', 'iframe', 'object', 'embed']
  });
  return render(tree);
}

const dirty = '<div>Hello<script>alert("xss")</script><style>bad{}</style>World</div>';
const clean = sanitizeHTML(dirty);
console.log(clean); // <div>HelloWorld</div>

2. HTML Transformation

// Add class to all divs
function addClassToAllDivs(tree, className) {
  if (tree.type === 'div') {
    if (!tree.props) tree.props = [];
    const classAttr = tree.props.find(p => p.name === 'class');
    if (classAttr) {
      classAttr.value += ` ${className}`;
    } else {
      tree.props.push({ name: 'class', value: className });
    }
  }

  if (tree.children) {
    tree.children.forEach(child => addClassToAllDivs(child, className));
  }

  return tree;
}

const html = '<div><div>Nested</div></div>';
const tree = parser(html);
addClassToAllDivs(tree, 'highlight');
console.log(render(tree));
// <div class="highlight"><div class="highlight">Nested</div></div>

3. XML Processing

// Parse and extract data from XML
const xml = `
<catalog>
  <book isbn="978-0-123456-78-9">
    <title>Sample Book</title>
    <author>John Doe</author>
    <price>29.99</price>
  </book>
</catalog>`;

const tree = parser(xml);

function extractBooks(node) {
  if (node.type === 'book') {
    const isbn = node.props?.find(p => p.name === 'isbn')?.value;
    const title = node.children?.find(c => c.type === 'title')
      ?.children?.[0]?.props?.[0]?.value;
    const author = node.children?.find(c => c.type === 'author')
      ?.children?.[0]?.props?.[0]?.value;

    return { isbn, title, author };
  }

  if (node.children) {
    return node.children.map(extractBooks).filter(Boolean).flat();
  }

  return [];
}

const books = extractBooks(tree);
console.log(books);
// [{ isbn: '978-0-123456-78-9', title: 'Sample Book', author: 'John Doe' }]

4. Complex HTML with Inline CSS

const complexHTML = `
<div style="padding: 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);">
  <h1 style="color: white; margin: 0;">Welcome</h1>
  <p style="color: rgba(255,255,255,0.9);">Beautiful styled content</p>
</div>`;

const tree = parser(complexHTML);
const rendered = render(tree, { pretty: true });

console.log(rendered);
// Perfectly preserves all inline CSS with gradients, rgba colors, etc.

🔍 Advanced Features

XML Namespaces Support

const xml = '<root xmlns:custom="http://example.com"><custom:element>Value</custom:element></root>';
const tree = parser(xml);
const output = render(tree);
// Preserves namespace colons in tag names

Self-Closing Tags

const html = '<div><br /><img src="test.jpg" /><input type="text" /></div>';
const tree = parser(html);
const output = render(tree);
// Properly handles void elements

Comments Preservation

const html = '<div><!-- Important comment --><span>Content</span></div>';
const tree = parser(html);
const output = render(tree);
// Comments are preserved in the output

Multiple Root Elements

const html = '<div>First</div><span>Second</span>';
const tree = parser(html);
// Returns: { type: 'template', children: [...] }

🧪 Testing

Run the comprehensive test suite:

npm test

Test Coverage:

✅ Basic HTML elements (div, span, nested structures)
✅ Self-closing tags (br, img, input, hr, meta, link)
✅ Attributes (single, multiple, special characters, quotes)
✅ Text content with escaping
✅ HTML comments
✅ XML documents with namespaces
✅ Complex real-world examples (forms, navigation, tables)
✅ Edge cases (empty input, whitespace, consecutive tags)
✅ Parser behavior (no parent references, unclosed tags)
✅ Parser options (ignore tags - script, style, nested, case-insensitive)
✅ Renderer options (pretty printing, XML mode)
✅ Complex HTML with extensive inline CSS (11,000+ characters)

58 tests passing • 1 skipped

⚡ Performance

The parser is designed for speed and efficiency:

Streaming parser - Single-pass character-by-character parsing
No regex in main loop - Only simple character matching
Minimal allocations - Reuses objects where possible
Stack-based - Efficient memory usage for deeply nested structures

Typical performance:

Small HTML (< 1KB): < 1ms
Medium HTML (10KB): ~5ms
Large HTML (100KB+): ~50ms
Complex HTML with CSS (11KB): ~10ms

⚠️ Known Limitations

HTML Entities: Not decoded during parsing. They are stored as-is and escaped on render.
- Input: <p>&</p> → Stored: "&" → Output: <p>&amp;</p>
- Workaround: Use raw characters instead of entities in source
Whitespace: Fully preserved in text nodes, no normalization applied.
Doctype: <!DOCTYPE html> declarations are parsed as text nodes, not special nodes.
CDATA: <![CDATA[...]]> sections are not specially handled.
Processing Instructions: <?xml ...?> are not parsed.
Error Reporting: Parser is lenient and produces a tree even for malformed HTML. No detailed error messages.
Attribute Order: May differ from source in rendered output.
Quotes: Renderer always uses double quotes for attributes.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone the repository
git clone https://github.com/lemonadejs/html-to-json.git
cd html-to-json

# Install dependencies
npm install

# Run tests
npm test

# Run tests in watch mode
npm test -- --watch

📄 License

🔗 Links

Repository: https://github.com/lemonadejs/html-to-json
NPM Package: https://www.npmjs.com/package/@lemonadejs/html-to-json
Issues: https://github.com/lemonadejs/html-to-json/issues
Documentation: https://github.com/lemonadejs/html-to-json#readme

🙏 Acknowledgments

Built with ❤️ by the Jspreadsheet Team

Star this repo ⭐ if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

License

lemonadejs/html-to-json

Folders and files

Latest commit

History

Repository files navigation