You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ tidy -ashtml -m test0.html
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN"
Info: Document content looks like XHTML 1.0 Strict
No warnings or errors were found.
tidy -ashtml -m test0.html
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like HTML 4.01 Strict
No warnings or errors were found.
$ tidy -ashtml -m test0.html
line 2 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 2 warnings and 0 errors!
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!
$ tidy -ashtml -m test0.html'
> ^C
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!
The problem is that when converting XHTML document with <?xml ...?> header to HTML, the <?xml ...?> line was never stripped. Besides, on third invocation the DOCTYPE was missing. Was that expected or is that a bug?
The text was updated successfully, but these errors were encountered:
@hosiet thanks for re-opening this long time bug... hopefully it will get some attention here...
Aside from not removing the <?xml ...?> header, tidy also messes with the DOCTYPE, on repeated invocation... a big NO NO ;=))
Given a full DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> it will choose to only output <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">... why clip this? ... seems wrong... not required, desired... or some other reasoning...
And then if you run tidy on this result, it chooses to drop it altogether!!! No doctype... ugh Why? And, of course running tidy on this result, will only add a HTML5 DOCTYPE - <!DOCTYPE html>... so the damage has been done, and needs correction...
Added the embedded sample given... thanks for that... to my collection in_767.html - seems the same as that in tidy.crashtest.zip - and a 2nd in_767-1.html to better illustrate the DOCTYPE problem(s)...
Look forward to further feedback, comments, patches, PR, to fix this BIG BAD BUG, a situation that has been around since at least 2004... thanks...
I'm forwarding some longstanding downstream issues here, one of which is about
-ashtml
. Previous reports:Test case can be found at https://bugs.debian.org/562004 with email attachment tidy.crashtest.zip (downloadable on that page) but anyway I'm attaching a copy here:
tidy.crashtest.zip
The error information
Content of
test0.html
:Tidy output:
The problem is that when converting XHTML document with
<?xml ...?>
header to HTML, the<?xml ...?>
line was never stripped. Besides, on third invocation the DOCTYPE was missing. Was that expected or is that a bug?The text was updated successfully, but these errors were encountered: