Skip to content

(Possibly) invalid output of tidy -ashtml #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hosiet opened this issue Nov 3, 2018 · 1 comment
Open

(Possibly) invalid output of tidy -ashtml #767

hosiet opened this issue Nov 3, 2018 · 1 comment
Labels
Milestone

Comments

@hosiet
Copy link

hosiet commented Nov 3, 2018

I'm forwarding some longstanding downstream issues here, one of which is about -ashtml. Previous reports:

Test case can be found at https://bugs.debian.org/562004 with email attachment tidy.crashtest.zip (downloadable on that page) but anyway I'm attaching a copy here:
tidy.crashtest.zip

The error information

Content of test0.html:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.6: http://docutils.sourceforge.net/" />
<title>Test</title>
<link rel="stylesheet" href="/usr/lib/pymodules/python2.5/docutils/writers/html4css1/html4css1.css" type="text/css" />
</head>
<body>
<div class="document" id="test">
<h1 class="title">Test</h1>


<p>Some text</p>
</div>
</body>
</html>

Tidy output:

$ tidy -ashtml -m test0.html
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN"
Info: Document content looks like XHTML 1.0 Strict
No warnings or errors were found.

tidy -ashtml -m test0.html
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like HTML 4.01 Strict
No warnings or errors were found.

$ tidy -ashtml -m test0.html
line 2 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 2 warnings and 0 errors!

$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

$ tidy -ashtml -m test0.html'
> ^C
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

The problem is that when converting XHTML document with <?xml ...?> header to HTML, the <?xml ...?> line was never stripped. Besides, on third invocation the DOCTYPE was missing. Was that expected or is that a bug?

@geoffmcl
Copy link
Contributor

geoffmcl commented Nov 6, 2018

@hosiet thanks for re-opening this long time bug... hopefully it will get some attention here...

Aside from not removing the <?xml ...?> header, tidy also messes with the DOCTYPE, on repeated invocation... a big NO NO ;=))

Given a full DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> it will choose to only output <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">... why clip this? ... seems wrong... not required, desired... or some other reasoning...

And then if you run tidy on this result, it chooses to drop it altogether!!! No doctype... ugh Why? And, of course running tidy on this result, will only add a HTML5 DOCTYPE - <!DOCTYPE html>... so the damage has been done, and needs correction...

Added the embedded sample given... thanks for that... to my collection in_767.html - seems the same as that in tidy.crashtest.zip - and a 2nd in_767-1.html to better illustrate the DOCTYPE problem(s)...

Look forward to further feedback, comments, patches, PR, to fix this BIG BAD BUG, a situation that has been around since at least 2004... thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants