Skip to content

Commit 91f29ea

Browse files
committed
HTML Tidy now parses HTML non-recursively.
Instead of recursive calls for each nested level of HTML, the next level is pushed to a stack on the heap, and returned to the main loop. This prevents stack overflow at _n_ depth (where _n_ is operating-system dependent). It's probably still possible to use all of the heap memory, but Tidy's allocators already fail gracefully in this circumstance. Please report any regressions of your own HTML! NOTE: the XML parser is not affected, and is probably still highly recursive.
1 parent b6f7e43 commit 91f29ea

22 files changed

+4088
-4234
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Config for test case.
2+
tidy-mark: no
3+
indent: yes
4+
wrap: 999
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!--
2+
This test case represents HTML…
3+
-->
4+
<!DOCTYPE html>
5+
<html>
6+
<head>
7+
<title>This is a title</title>
8+
</head>
9+
10+
<body>
11+
<div>
12+
<p>This is the first paragraph.</p>
13+
<p>Now now, second paragraph?</p>
14+
<div>
15+
<p>I'm nested in a div.</p>
16+
<ul>
17+
<li>List item one.
18+
<li>List item two. There isn't a third. Hahaha.</li>
19+
</ul>
20+
<p>Because, you know, lists should have a minimum of three items.</p>
21+
</div>
22+
<p>Penultimate paragraphs are sometimes the best.</p>
23+
</div>
24+
<p>Don't Cray; Buy Amiga!</p>
25+
</body>
26+
</html>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Config for test case.
2+
tidy-mark: no
3+
indent: yes
4+
wrap: 999
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!--
2+
This test case tests the datalist element and the datalist parser.
3+
Oddly, there's not an existing test case that has the datalist element.
4+
-->
5+
<!DOCTYPE html>
6+
<html>
7+
<head>
8+
<title>This is a title</title>
9+
</head>
10+
11+
<body>
12+
<label for="ice-cream-choice">Choose a flavor:</label>
13+
<input list="ice-cream-flavors" id="ice-cream-choice" name="ice-cream-choice" />
14+
15+
<datalist id="ice-cream-flavors">
16+
<option value="Chocolate">
17+
<option value="Coconut">
18+
<option value="Mint">
19+
<option value="Strawberry">
20+
<option value="Vanilla">
21+
</datalist>
22+
23+
<label for="myBrowser">Choose a browser from this list:</label>
24+
<input list="browsers" id="myBrowser" name="myBrowser" />
25+
<datalist id="browsers">
26+
<option value="Chrome">
27+
<option value="Firefox">
28+
<option value="Internet Explorer">
29+
<option value="Opera">
30+
<option value="Safari">
31+
<option value="Microsoft Edge">
32+
</body>
33+
</html>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Config for test case.
2+
tidy-mark: no
3+
indent: yes
4+
wrap: 999
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
<!--
2+
This test case tests the definition list element and parser.
3+
-->
4+
<!DOCTYPE html>
5+
<html>
6+
<head><title>case-003</title></head>
7+
<body>
8+
9+
<dl>
10+
<dd>
11+
<div>
12+
<table summary="">
13+
<tr>
14+
<center>
15+
<td>What is up?</td>
16+
</tr>
17+
</table>
18+
</div>
19+
<dd>
20+
</dd>
21+
<center>Hello</center>
22+
</dl>
23+
24+
</body>
25+
</html>
26+
27+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Config for test case.
2+
tidy-mark: no
3+
indent: yes
4+
wrap: 999
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
<!--
2+
This test case tests the optgroup element and parser.
3+
-->
4+
<!DOCTYPE html>
5+
<html>
6+
<head><title>case-004</title></head>
7+
<body>
8+
9+
<label for="dino-select">Choose a dinosaur:</label>
10+
<select id="dino-select">
11+
<optgroup label="Theropods">
12+
<option>Tyrannosaurus</option>
13+
<option>Velociraptor</option>
14+
<option>Deinonychus</option>
15+
</optgroup>
16+
<optgroup label="Sauropods">
17+
<option>Diplodocus</option>
18+
<option>Saltasaurus</option>
19+
<option>Apatosaurus</option>
20+
</optgroup>
21+
</select>
22+
23+
<optgroup label="Body Parts">
24+
<option>Claws</option>
25+
<option>Teeth</option>
26+
<option>Tail Spikes</option>
27+
</optgroup>
28+
29+
<optgroup label="Movies">
30+
<optgroup label="Scifi">
31+
<option>Jurassic Park</option>
32+
</optgroup>
33+
<option>The Good Dinosaur</option>
34+
<option>The Land Before Time</option>
35+
</optgroup>
36+
37+
38+
</body>
39+
</html>
40+
41+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
<!--
2+
This test case represents HTML…
3+
-->
4+
<!DOCTYPE html>
5+
<html>
6+
<head>
7+
<title>
8+
This is a title
9+
</title>
10+
</head>
11+
<body>
12+
<div>
13+
<p>
14+
This is the first paragraph.
15+
</p>
16+
<p>
17+
Now now, second paragraph?
18+
</p>
19+
<div>
20+
<p>
21+
I'm nested in a div.
22+
</p>
23+
<ul>
24+
<li>List item one.
25+
</li>
26+
<li>List item two. There isn't a third. Hahaha.
27+
</li>
28+
</ul>
29+
<p>
30+
Because, you know, lists should have a minimum of three items.
31+
</p>
32+
</div>
33+
<p>
34+
Penultimate paragraphs are sometimes the best.
35+
</p>
36+
</div>
37+
<p>
38+
Don't Cray; Buy Amiga!
39+
</p>
40+
</body>
41+
</html>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
line 17 column 13 - Info: missing optional end tag </li>
2+
Info: Document content looks like HTML5
3+
No warnings or errors were found.
4+
5+
About HTML Tidy: https://github.com/htacg/tidy-html5
6+
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
7+
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
8+
Latest HTML specification: https://html.spec.whatwg.org/multipage/
9+
Validate your HTML documents: https://validator.w3.org/nu/
10+
Lobby your company to join the W3C: https://www.w3.org/Consortium
11+
12+
Do you speak a language other than English, or a different variant of
13+
English? Consider helping us to localize HTML Tidy. For details please see
14+
https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
<!--
2+
This test case tests the datalist element and the datalist parser.
3+
Oddly, there's not an existing test case that has the datalist element.
4+
-->
5+
<!DOCTYPE html>
6+
<html>
7+
<head>
8+
<title>
9+
This is a title
10+
</title>
11+
</head>
12+
<body>
13+
<label for="ice-cream-choice">Choose a flavor:</label> <input list="ice-cream-flavors" id="ice-cream-choice" name="ice-cream-choice"> <datalist id="ice-cream-flavors">
14+
<option value="Chocolate">
15+
</option>
16+
<option value="Coconut">
17+
</option>
18+
<option value="Mint">
19+
</option>
20+
<option value="Strawberry">
21+
</option>
22+
<option value="Vanilla">
23+
</option>
24+
</datalist> <label for="myBrowser">Choose a browser from this list:</label> <input list="browsers" id="myBrowser" name="myBrowser"> <datalist id="browsers">
25+
<option value="Chrome">
26+
</option>
27+
<option value="Firefox">
28+
</option>
29+
<option value="Internet Explorer">
30+
</option>
31+
<option value="Opera">
32+
</option>
33+
<option value="Safari">
34+
</option>
35+
<option value="Microsoft Edge">
36+
</option>
37+
</datalist>
38+
</body>
39+
</html>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
line 32 column 1 - Warning: discarding unexpected </body>
2+
line 33 column 1 - Warning: discarding unexpected </html>
3+
line 25 column 5 - Warning: missing </datalist>
4+
Info: Document content looks like HTML5
5+
Tidy found 3 warnings and 0 errors!
6+
7+
About HTML Tidy: https://github.com/htacg/tidy-html5
8+
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
9+
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
10+
Latest HTML specification: https://html.spec.whatwg.org/multipage/
11+
Validate your HTML documents: https://validator.w3.org/nu/
12+
Lobby your company to join the W3C: https://www.w3.org/Consortium
13+
14+
Do you speak a language other than English, or a different variant of
15+
English? Consider helping us to localize HTML Tidy. For details please see
16+
https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--
2+
This test case tests the definition list element and parser.
3+
-->
4+
<!DOCTYPE html>
5+
<html>
6+
<head>
7+
<title>
8+
case-003
9+
</title>
10+
</head>
11+
<body>
12+
<dl>
13+
<dd>
14+
<div>
15+
<table summary="">
16+
<tr>
17+
<td>
18+
What is up?
19+
</td>
20+
</tr>
21+
</table>
22+
</div>
23+
</dd>
24+
<dd></dd>
25+
</dl>
26+
<center>
27+
Hello
28+
</center>
29+
</body>
30+
</html>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
line 14 column 7 - Warning: <center> isn't allowed in <tr> elements
2+
line 13 column 5 - Info: <tr> previously mentioned
3+
line 14 column 7 - Warning: missing </center> before <td>
4+
line 10 column 3 - Info: missing optional end tag </dd>
5+
line 12 column 5 - Warning: The summary attribute on the <table> element is obsolete in HTML5
6+
line 14 column 7 - Warning: trimming empty <center>
7+
line 21 column 3 - Warning: <center> element removed from HTML5
8+
line 12 column 5 - Warning: <table> attribute "summary" not allowed for HTML5
9+
Info: Document content looks like HTML5
10+
Tidy found 6 warnings and 0 errors!
11+
12+
One or more empty elements were present in the source document but
13+
dropped on output. If these elements are necessary or you don't want
14+
this behavior, then consider setting the option "drop-empty-elements"
15+
to no.
16+
17+
About HTML Tidy: https://github.com/htacg/tidy-html5
18+
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
19+
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
20+
Latest HTML specification: https://html.spec.whatwg.org/multipage/
21+
Validate your HTML documents: https://validator.w3.org/nu/
22+
Lobby your company to join the W3C: https://www.w3.org/Consortium
23+
24+
Do you speak a language other than English, or a different variant of
25+
English? Consider helping us to localize HTML Tidy. For details please see
26+
https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<!--
2+
This test case tests the optgroup element and parser.
3+
-->
4+
<!DOCTYPE html>
5+
<html>
6+
<head>
7+
<title>
8+
case-004
9+
</title>
10+
</head>
11+
<body>
12+
<label for="dino-select">Choose a dinosaur:</label> <select id="dino-select">
13+
<optgroup label="Theropods">
14+
<option>
15+
Tyrannosaurus
16+
</option>
17+
<option>
18+
Velociraptor
19+
</option>
20+
<option>
21+
Deinonychus
22+
</option>
23+
</optgroup>
24+
<optgroup label="Sauropods">
25+
<option>
26+
Diplodocus
27+
</option>
28+
<option>
29+
Saltasaurus
30+
</option>
31+
<option>
32+
Apatosaurus
33+
</option>
34+
</optgroup>
35+
</select>
36+
<optgroup label="Body Parts">
37+
<option>
38+
Claws
39+
</option>
40+
<option>
41+
Teeth
42+
</option>
43+
<option>
44+
Tail Spikes
45+
</option>
46+
</optgroup>
47+
<optgroup label="Movies">
48+
<optgroup label="Scifi">
49+
<option>
50+
Jurassic Park
51+
</option>
52+
</optgroup>
53+
<option>
54+
The Good Dinosaur
55+
</option>
56+
<option>
57+
The Land Before Time
58+
</option>
59+
</optgroup>
60+
</body>
61+
</html>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
line 30 column 5 - Warning: <optgroup> can't be nested
2+
Info: Document content looks like HTML5
3+
Tidy found 1 warning and 0 errors!
4+
5+
About HTML Tidy: https://github.com/htacg/tidy-html5
6+
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
7+
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
8+
Latest HTML specification: https://html.spec.whatwg.org/multipage/
9+
Validate your HTML documents: https://validator.w3.org/nu/
10+
Lobby your company to join the W3C: https://www.w3.org/Consortium
11+
12+
Do you speak a language other than English, or a different variant of
13+
English? Consider helping us to localize HTML Tidy. For details please see
14+
https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md

0 commit comments

Comments
 (0)