Skip to content

SVG elements and attributes are being lowercased #365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mojavelinux opened this issue Feb 4, 2016 · 17 comments
Closed

SVG elements and attributes are being lowercased #365

mojavelinux opened this issue Feb 4, 2016 · 17 comments

Comments

@mojavelinux
Copy link

SVG elements such as "clipPath" and attributes such as "viewBox" (when included in the document via the <svg> element) are being lowercased. This could cause SVGs to stop working (though in my tests it doesn't seem to break inside the browser). At the very least, it's annoying because it causes unnecessary diffs.

Input:

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewBox="0 0 16 9"></svg>

Output

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" viewbox="0 0 16 9"></svg>

There are many such elements and attributes, as you can see here: https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute

Is there any way we could disable the automatic lowercasing within SVG regions?

@balthisar
Copy link
Member

Will input-xml or output-xml work for your use case? They both preserve attribute case.

@mojavelinux
Copy link
Author

It's true that there are certain HTML documents that are able to masquerade as XML going in, which does result in the element/attribute case being preserved (regardless of output setting). However, the first time the parser his a perfectly valid short element (e.g., <img src="..." alt="...">), it blows up. I also notice that by enabling input-xml, there are other side effects. For instance, the newlines around content in a <pre> element show up again (fixed earlier in 5.1 for HTML).

But I think we're missing the point. An inline SVG is an XML document inside of an HTML document. Therefore, it should be handled as XML, and that, in part, means preserving the case of elements and attributes.

The way I've been able to workaround this is to wrap the SVG in CDATA before tidy touches it, then unwrap it afterwards. That seems to keep tidy from touching the SVG data. Not pretty, but it gets the job done.

@mojavelinux
Copy link
Author

Interesting to note that there is a related discussion in Jsoup.

jhy/jsoup#272

@mojavelinux
Copy link
Author

It seems like this behavior might actually be desired. In a reply on Twitter to $subject, Sara Soueidan said:

I always lowercase [because] some attributes had problems w/ uppercase in IE a while back.

The concern I had, as well as others, was that by changing the case, it would break SVG rendering. However, it seems like that might not be true (and in the case of older IEs, it might actually help the situation).

I'm curious to hear what others think.

@balthisar
Copy link
Member

I see your points. Due to the followup questions you raised, I won't tag this as a bug or feature request, but I'll leave it open for discussion.

@mojavelinux
Copy link
Author

👍 I agree we're definitely at the discussion phase. I'll try to collect feedback using various channels.

@mojavelinux
Copy link
Author

So here's the official word from the W3C, courtesy of Romain Deltour.

In the HTML syntax, tag names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.
https://www.w3.org/TR/html51/syntax.html#syntax-tag-name

It's the "even those for foreign elements" that confirms it does not matter that the elements and attributes in SVG are lowercased.

What that leaves us with is a style preference. Would tidy be willing to consider having a "match" setting for case sensitivity in HTML, so that the output matches whatever the input used? (basically the same as what input-xml does).

@balthisar
Copy link
Member

Would tidy be willing to consider having a "match" setting for case sensitivity in HTML, so that the output matches whatever the input used?

For all tags? Or only for SVG? Style is a valid use case; Tidy is also a pretty printer, after all.

@geoffmcl
Copy link
Contributor

geoffmcl commented Feb 4, 2016

Keep in mind we already have the Boolean options uppercase-attributes and uppercase-tags, both of which default to no, and this conversion is done at a low level stream reading, thus the original user case of tags and attributes is lost very early in the process... so like say a match-user-case option would need to be done at that same low level collection phase...

@balthisar
Copy link
Member

As I go through these old issues, I'd still like to clarify some things, and possibly close this issue. I'll explain my current understanding below, and so I expect to be corrected when I'm wrong.

First, the svg tag is an HTML tag, not an XML tag. Presumably the actual XML document is the content between the &lt;svg> and &lt;/svg tags, right? Meaning that the case of the attribute should not be important because it's not part of the XML; it only brackets it. If this is the case, I don't see a reason to treat svg as special, or to introduce a new option to pass-through attribute names as given.

Now, I don't know what happens to the content of svg tags; should we branch processing to Tidy's XML processor? There might be a case for this; I'd have to investigate all the ramifications/possibilities. Are there other tags (such as math) that have XML content?

So, @geoffmcl and @mojavelinux, I'm sorry we've not acted on this, but is any action required? If not, we can close this. If there's a case for using the XML parser instead of the HTML parser for the content of certain tags, then perhaps we can look into it.

Some additional guidance is appreciated.

@mojavelinux
Copy link
Author

In my mind, this is about reproducibility. The use case many parsers miss is to open a document, make a correction and get out without causing the whole document to be rewritten. I call it the polite factor. When the parser rewrites stuff unrelated to the operation (or unrelated to what you want to style, in this case), it introduces side effects. There are very good reasons why you sometimes have to modify a document without making unrelated changes...even when tidying it. It would be nice if tidy could be polite in this regard. Just because a renderer doesn't care about casing doesn't mean that it isn't important to the humans reading it.

@mojavelinux
Copy link
Author

...and it could very well be that something downstream from tidy is dependent on the tag case (software that simply cannot be changed), and tidy inadvertently breaks such a toolchain with the current behavior.

@balthisar
Copy link
Member

@mojavelinux, I understand that. I'm trying to get clarity on whether you would expect the new behavior on the svg tag itself (which is an HTML tag, not XML), or only on the XML elements contained within the bounds of the svg tag.

If the former, then the question really becomes -- because we prefer not to have a new option applicable to only a single tag -- do we want to add an option such as preserve to the uppercase-tags configuration option, to enable this desired behavior for all HTML tags? Implementation would be simple, but is there a valid use case for it? I would suggest that enabling someone's preference is a good use case, but is it good enough?

@mojavelinux
Copy link
Author

I wouldn't make a special exception. I think my point holds regardless of what part of the document we're talking about. In additional to SVG attributes, certain web frameworks rely on custom attributes in the HTML that leverage mixed casing.

This was referenced May 12, 2017
@balthisar
Copy link
Member

@mojavelinux, if you have a chance, have a look at #554, which potentially solves this issue.

@geoffmcl
Copy link
Contributor

@balthisar sorry for the delay in reviewing this... for some silly reason this issue did not get into my Issue Database... it is now...

I found that even back in html4 legacy documents a clear statement Attribute names are always case-insensitive., so any browser that does not honor this should be thrown in the dust bin... and begin to wonder why Tidy was so strict in this regard... so glad this was made global...

And glad you were able to leverage the existing option uppercase-tags, adding a preserve option, and as you point out, works well with the new PickListItems, PR #553, now merged...

I have now tested the PR #554 on at least the simple sample case give, in_365.html, and it works fine...

And have run the regression tests using Tidy 5.5.24.I365, and no problem...

Of course if you add the new option --uppercase-attributes preserve, there will be lot of diffs, but this just shows the efficiency of the new option... and all those diffs disappear if you add diff -i... so this is a 100% PASS!

I would certainly vote for merging this #554 Feature Request... thanks...

@geoffmcl geoffmcl added this to the 5.5 milestone May 20, 2017
@balthisar
Copy link
Member

Great. Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants