Skip to content

Tidy fails on style block with comment block #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CMB opened this issue Oct 1, 2015 · 9 comments
Closed

Tidy fails on style block with comment block #280

CMB opened this issue Oct 1, 2015 · 9 comments
Labels
Milestone

Comments

@CMB
Copy link
Contributor

CMB commented Oct 1, 2015

When Tidy encounters a style block with a comment block, it does not
parse the comment block correctly.
In fact, it will be confused if the comment block contains HTML, such
as an opening style tag.

Example:
http://the-brannons.com/styletest1.html

This also affects script tags, apparently.

Using a comment inside style or script was a way to hide the
contents from old browsers that don't understand the style or script tags,
so that they didn't try to render their contents as text.
It's apparently still being used in some HTML 5 pages. How should it
be handled today, by Tidy?

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 1, 2015

@cmd thank you for this issue, and yes the sample html you pointed to, now imbedded here, is a perfect example of the big problem in tidy ;=((

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Nested style tag in comment block</title>
<style>
<!--
.stuff { color: red }
<style>
-->
</style>
</head>
<body>
<p class="stuff">para 1</p>
</body>
</html>

This sample passes the W3C validator with no problem...

In all browser tried shows the text para 1 in red, so adding the html comment block <!-- ... --> in the <style> does not 'hide' the CSS from a browser... it still reads and uses it...

Nor does it seem the browsers tested open a second level of <stype>, on seeing the second <style> and thus expect two closing tags </style>, like tidy does...

This tripping over a second <style> causes tidy to render rubbish!

This tidy counting of the opens of the container, in the GetCDATA(...) service, was introduced a long time ago into the code. And this increment of a nested variable causes tidy to continue to seek a second close </style>, thus, in this case, treating the balance of the document as part of the <style> tag text... with very bad results...

I think the nested variable was added to sf cvs rev 1.85, Apr 9 2003 by @hoehrmann, mentioning bug 443678... there is a test case input/in_443678.html - https://sourceforge.net/p/tidy/bugs/98/ - all now closed...

BUT, this is a different bug, like supporting <script src="my.js"> with no close element... a very good and important change...

So that does not show, or explain the reason for the nested increment... but I hesitate changing this very old code... Maybe it is needed for other tags that pass through here like <script>, not sure...

My suggestion is to add another option TidySkipComments.

When this is enabled, on getting a < from the stream, would test if this is within a c-start <!-- ... --> c-end comment block. If yes, ignore and continue, waiting for a </sytle> close...

This option is similar to the TidySkipQuotes recently inroduced, #65, to get over problems in the <script> tag, tripping over what seems like new tags...

This new option, TidySkipComments, could default to off thus have no effect on the default behaviour of tidy, although there may be a case for this option to default to on?

The W3C seems to allow this comment block addition to the <style> tag. See here http://www.w3.org/TR/html5/document-metadata.html#the-style-element, and maybe in other references...

The relevant quote is "The textContent of a style element must match the style production in the following ABNF, the character set for which is Unicode."

style         = no-c-start *( c-start no-c-end c-end no-c-start )
no-c-start    = < any string that doesn't contain a substring that matches c-start >
c-start       = "<!--"
no-c-end      = < any string that doesn't contain a substring that matches c-end >
c-end         = "-->"

This seems to indicate that when there is a comment block open <!--, any string is allowed until c-end, -->... the TidySkipComments options would do this...

As usual, with changes in such old established code are considered, look for to comments from others before implementing... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 6, 2015

@foxinushka opened a similar issue, Tidy fails with following HTML #283, now transferred here..

Test html:
https://www.dropbox.com/s/d4me1itots9j8sm/test.html?dl=0

If I remove <style> ... </style> tag and its content, tidy works well.
Otherwise it adds next text:

/*]]>*/
</style>
</head>
<body>
</body>
</html>

after </html> tag.

In this case the in container counter is incremented on finding a comment block with another style open - /* <style> ... */ and tidy will not exit the style until it finds a second close </style> with catastrophic consequences...

geoffmcl added a commit that referenced this issue Oct 7, 2015
This is in the GetCDATA function. If the container is script or style and
this option is on, avoid bumping nested.

This addresses issues #65 (1642186) and #280.

All attempts at parsing script data are now abandoned as a bad direction.
geoffmcl added a commit that referenced this issue Oct 7, 2015
@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 7, 2015

@cmd @foxinushka see #65 for a possible fix pushed to the issue-65 branch...

I hope you will get a chance to do $ git checkout issue-65, pull, build and test again with this 5.1.14.EXP1 version of tidy... thanks...

@foxinushka
Copy link

I've checked the fix. The behaviour is the same as I described in #283.

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 8, 2015

@foxinushka well I have re-tested your dropbox test.html and find no problem now...

  1. are you sure your checkout of issue-65 branch went ok?
  2. did you then do a git pull to update that branch?
  3. did you delete the CMakeCache.txt?
  4. when you run ./tidy -v what version do you see?
  5. are you adding --skip-quotes yes to the config?

You must use tidy version 5.1.14.EXP1, with special emphasis that the version must end in .EXP1!

For this sample you must use the --skip-quotes yes configuration option, otherwise you will see very bad results...

Without that additional config option set to on the whole file becomes just one style block, with just 2 warnings. And using -i, the output ends with the very bad -

  <!-- end footer-->
  </tbody>
  </table>
  </body>
  </html>
  </style>
</head>
<body>
</body>
</html>

With that option on you will get some 27 warnings, and using -i the output ends correctly with -

      </tr><!-- end footer-->
    </tbody>
  </table>
</body>
</html>

Very puzzled... Please check again and advise...

@foxinushka
Copy link

Sorry, I've forgotten to enable TidySkipQuotes option. Now it works. Thanks!

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 8, 2015

@foxinushka no problem...

And make a clear note the name of this option TidySkipQuotes will probably change to something like TidySkipNested, or ??? in the final version... after some testing time...

And of course skip-quotes to say skip-nested...

@geoffmcl geoffmcl mentioned this issue Oct 8, 2015
@CMB
Copy link
Contributor Author

CMB commented Oct 18, 2015

This seems to be fixed for me.

@balthisar
Copy link
Member

Two confirmations of a fix, and a check of the cases using master branch incline me toward closing this issue. Thanks, everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants