A lightweight, zero-dependency library for bidirectional conversion between HTML/XML and JSON
Transform HTML/XML markup into clean JSON trees and render them back to markup with full fidelity. Perfect for parsing, manipulating, and generating HTML/XML programmatically.
- Zero Dependencies - Pure JavaScript, no external libraries required
- TypeScript Support - Fully typed with comprehensive type definitions
- Bidirectional - Parse HTML/XML to JSON and render JSON back to HTML/XML
- High Fidelity - Preserves structure, attributes, text nodes, and comments
- Lightweight - Minimal footprint, fast parsing
- Flexible - Works with HTML and XML, supports namespaces
- Sanitization Ready - Built-in option to ignore unwanted tags (script, style, etc.)
- Pretty Printing - Optional formatted output with customizable indentation
- Well Tested - 58 comprehensive tests covering all features
npm install @lemonadejs/html-to-jsonYou can import both functions from the main package:
// Recommended: Import both from main package
import { parser, render } from '@lemonadejs/html-to-json';The library includes comprehensive type definitions:
import { parser, render, type Node, type ParserOptions, type RenderOptions } from '@lemonadejs/html-to-json';
// Fully typed parser with options
const options: ParserOptions = { ignore: ['script', 'style'] };
const tree: Node | undefined = parser('<div>Hello</div>', options);
// Fully typed renderer with options
const renderOpts: RenderOptions = { pretty: true, indent: ' ' };
const html: string = render(tree, renderOpts);import { parser } from '@lemonadejs/html-to-json';
const html = '<div class="card"><h1>Title</h1><p>Content</p></div>';
const tree = parser(html);
console.log(JSON.stringify(tree, null, 2));Output:
{
"type": "div",
"props": [
{ "name": "class", "value": "card" }
],
"children": [
{
"type": "h1",
"children": [
{
"type": "#text",
"props": [{ "name": "textContent", "value": "Title" }]
}
]
},
{
"type": "p",
"children": [
{
"type": "#text",
"props": [{ "name": "textContent", "value": "Content" }]
}
]
}
]
}import { parser, render } from '@lemonadejs/html-to-json';
const tree = parser('<div class="greeting">Hello World</div>');
const html = render(tree);
console.log(html);
// Output: <div class="greeting">Hello World</div>import { render } from '@lemonadejs/html-to-json';
const tree = {
type: 'article',
props: [{ name: 'class', value: 'post' }],
children: [
{
type: 'h2',
children: [
{ type: '#text', props: [{ name: 'textContent', value: 'Article Title' }] }
]
},
{
type: 'p',
children: [
{ type: '#text', props: [{ name: 'textContent', value: 'Article content here.' }] }
]
}
]
};
const html = render(tree, { pretty: true, indent: ' ' });
console.log(html);Output:
<article class="post">
<h2>
Article Title
</h2>
<p>
Article content here.
</p>
</article>Parses HTML or XML string into a JSON tree structure.
Parameters:
html(string) - The HTML or XML string to parseoptions(Object, optional) - Parser options
Options:
| Option | Type | Default | Description |
|---|---|---|---|
ignore |
string[] | [] |
Array of tag names to ignore during parsing |
Returns: Object - JSON tree representation
Examples:
// Basic parsing
const tree = parser('<div id="app">Hello</div>');
// Ignore script and style tags
const clean = parser(html, { ignore: ['script', 'style'] });
// Case-insensitive tag matching
const tree = parser('<div><SCRIPT>bad</SCRIPT></div>', { ignore: ['script'] });Renders a JSON tree back into HTML or XML markup.
Parameters:
tree(Object|Array) - The JSON tree to renderoptions(Object, optional) - Rendering options
Options:
| Option | Type | Default | Description |
|---|---|---|---|
pretty |
boolean | false |
Format output with newlines and indentation |
indent |
string | ' ' |
Indentation string (used when pretty is true) |
selfClosingTags |
string[] | See below* | Override default void elements list |
xmlMode |
boolean | false |
Self-close all empty elements using <tag /> syntax |
*Default self-closing tags: area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr
Returns: string - Rendered HTML/XML markup
Examples:
// Basic rendering
const html = render(tree);
// Pretty printing
const formatted = render(tree, { pretty: true });
// Custom indentation
const tabbed = render(tree, { pretty: true, indent: '\t' });
// XML mode
const xml = render(tree, { xmlMode: true });
// Custom self-closing tags
const custom = render(tree, {
selfClosingTags: ['br', 'hr', 'img', 'custom-element']
});{
"type": "tagName",
"props": [
{ "name": "attributeName", "value": "attributeValue" }
],
"children": [...]
}{
"type": "#text",
"props": [
{ "name": "textContent", "value": "text content here" }
]
}{
"type": "#comments",
"props": [
{ "name": "text", "value": " comment text " }
]
}{
"type": "template",
"children": [
{ "type": "div", ... },
{ "type": "span", ... }
]
}The library exports the following TypeScript types:
Node- Union type for all possible node types (ElementNode | TextNode | CommentNode | TemplateNode)ElementNode- HTML/XML element with type, props, and childrenTextNode- Text content node withtype: '#text'CommentNode- Comment node withtype: '#comments'TemplateNode- Wrapper for multiple root elements withtype: 'template'NodeProp- Property object with name and value
ParserOptions- Options for the parser functionRenderOptions- Options for the render function
import type {
Node,
ElementNode,
TextNode,
CommentNode,
TemplateNode,
NodeProp,
ParserOptions,
RenderOptions
} from '@lemonadejs/html-to-json';import { parser, render } from '@lemonadejs/html-to-json';
// Remove potentially dangerous tags using the ignore option
function sanitizeHTML(html) {
const tree = parser(html, {
ignore: ['script', 'style', 'iframe', 'object', 'embed']
});
return render(tree);
}
const dirty = '<div>Hello<script>alert("xss")</script><style>bad{}</style>World</div>';
const clean = sanitizeHTML(dirty);
console.log(clean); // <div>HelloWorld</div>// Add class to all divs
function addClassToAllDivs(tree, className) {
if (tree.type === 'div') {
if (!tree.props) tree.props = [];
const classAttr = tree.props.find(p => p.name === 'class');
if (classAttr) {
classAttr.value += ` ${className}`;
} else {
tree.props.push({ name: 'class', value: className });
}
}
if (tree.children) {
tree.children.forEach(child => addClassToAllDivs(child, className));
}
return tree;
}
const html = '<div><div>Nested</div></div>';
const tree = parser(html);
addClassToAllDivs(tree, 'highlight');
console.log(render(tree));
// <div class="highlight"><div class="highlight">Nested</div></div>// Parse and extract data from XML
const xml = `
<catalog>
<book isbn="978-0-123456-78-9">
<title>Sample Book</title>
<author>John Doe</author>
<price>29.99</price>
</book>
</catalog>`;
const tree = parser(xml);
function extractBooks(node) {
if (node.type === 'book') {
const isbn = node.props?.find(p => p.name === 'isbn')?.value;
const title = node.children?.find(c => c.type === 'title')
?.children?.[0]?.props?.[0]?.value;
const author = node.children?.find(c => c.type === 'author')
?.children?.[0]?.props?.[0]?.value;
return { isbn, title, author };
}
if (node.children) {
return node.children.map(extractBooks).filter(Boolean).flat();
}
return [];
}
const books = extractBooks(tree);
console.log(books);
// [{ isbn: '978-0-123456-78-9', title: 'Sample Book', author: 'John Doe' }]const complexHTML = `
<div style="padding: 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);">
<h1 style="color: white; margin: 0;">Welcome</h1>
<p style="color: rgba(255,255,255,0.9);">Beautiful styled content</p>
</div>`;
const tree = parser(complexHTML);
const rendered = render(tree, { pretty: true });
console.log(rendered);
// Perfectly preserves all inline CSS with gradients, rgba colors, etc.const xml = '<root xmlns:custom="http://example.com"><custom:element>Value</custom:element></root>';
const tree = parser(xml);
const output = render(tree);
// Preserves namespace colons in tag namesconst html = '<div><br /><img src="test.jpg" /><input type="text" /></div>';
const tree = parser(html);
const output = render(tree);
// Properly handles void elementsconst html = '<div><!-- Important comment --><span>Content</span></div>';
const tree = parser(html);
const output = render(tree);
// Comments are preserved in the outputconst html = '<div>First</div><span>Second</span>';
const tree = parser(html);
// Returns: { type: 'template', children: [...] }Run the comprehensive test suite:
npm testTest Coverage:
- β Basic HTML elements (div, span, nested structures)
- β Self-closing tags (br, img, input, hr, meta, link)
- β Attributes (single, multiple, special characters, quotes)
- β Text content with escaping
- β HTML comments
- β XML documents with namespaces
- β Complex real-world examples (forms, navigation, tables)
- β Edge cases (empty input, whitespace, consecutive tags)
- β Parser behavior (no parent references, unclosed tags)
- β Parser options (ignore tags - script, style, nested, case-insensitive)
- β Renderer options (pretty printing, XML mode)
- β Complex HTML with extensive inline CSS (11,000+ characters)
58 tests passing β’ 1 skipped
The parser is designed for speed and efficiency:
- Streaming parser - Single-pass character-by-character parsing
- No regex in main loop - Only simple character matching
- Minimal allocations - Reuses objects where possible
- Stack-based - Efficient memory usage for deeply nested structures
Typical performance:
- Small HTML (< 1KB): < 1ms
- Medium HTML (10KB): ~5ms
- Large HTML (100KB+): ~50ms
- Complex HTML with CSS (11KB): ~10ms
-
HTML Entities: Not decoded during parsing. They are stored as-is and escaped on render.
- Input:
<p>&</p>β Stored:"&"β Output:<p>&amp;</p> - Workaround: Use raw characters instead of entities in source
- Input:
-
Whitespace: Fully preserved in text nodes, no normalization applied.
-
Doctype:
<!DOCTYPE html>declarations are parsed as text nodes, not special nodes. -
CDATA:
<![CDATA[...]]>sections are not specially handled. -
Processing Instructions:
<?xml ...?>are not parsed. -
Error Reporting: Parser is lenient and produces a tree even for malformed HTML. No detailed error messages.
-
Attribute Order: May differ from source in rendered output.
-
Quotes: Renderer always uses double quotes for attributes.
Contributions are welcome! Please feel free to submit a Pull Request.
# Clone the repository
git clone https://github.com/lemonadejs/html-to-json.git
cd html-to-json
# Install dependencies
npm install
# Run tests
npm test
# Run tests in watch mode
npm test -- --watchMIT Β© Jspreadsheet Team
- Repository: https://github.com/lemonadejs/html-to-json
- NPM Package: https://www.npmjs.com/package/@lemonadejs/html-to-json
- Issues: https://github.com/lemonadejs/html-to-json/issues
- Documentation: https://github.com/lemonadejs/html-to-json#readme
Built with β€οΈ by the Jspreadsheet Team
Star this repo β if you find it useful!