Skip to content

Feature request: add meta tag that matches actual output character encoding #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AthanasiusOfAlex opened this issue Sep 27, 2016 · 30 comments

Comments

@AthanasiusOfAlex
Copy link

Tidy currently allows you to output a document in 14 character encodings. The problem is, unless you use ascii, the web browser generally gets confused by it. It would be nice if Tidy had the option of adding the "meta" tag that matches the encoding, e.g.,

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

for UTF-8 (which is the default). For the moment, I have to add this line manually.

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 2, 2016

@AthanasiusOfAlex thanks for the Feature Request

It seems a reasonable feature request, as a new option, to add this, styled depending on the document type detected, like <meta charset="utf-8"> for html5, especially since it seems the W3C validator barks at you if it is missing...

Maybe the option could be say --add-meta-charset, or something... Boolean, default no...

There is an OPTIONS.md which details the adding of an option... quite simple...

Would certainly consider this addition, if you or others presented code... thanks...

@geoffmcl geoffmcl added this to the 5.3 milestone Oct 2, 2016
@marcoscaceres
Copy link
Contributor

I'm currently learning c, so I wouldn't mind taking this on (as it seems like a simple enough feature request).

@geoffmcl, could you mentor me on it? (i.e., give me some pointers as to where it might be good place to insert this in the code tree, etc.).

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 3, 2016

@marcoscaceres, thanks for your offer to take this on, and for sure I will help where I can...

As mentioned OPTIONS.md gives the 4 steps to do this... and we should discuss and decide here the values for 1, 2, and 3... my first suggestion is as follows, but am open to other ideas -

  1. ID - TidyMetaCharset
  2. table - MS, "add-meta-charset", BL, no, ParseBool, boolPicks
  3. description - "This option adds a charset meta tag, appropriate to the document type, in the document head."

As stated, these are just quick, first thoughts, suggestions, and look forward to them being massaged...

As to part 4., using it in the code, take a look at AddGenerator, in lexer.c. You could add a new say TY_(AddMetaCharset)(doc), which does a similar thing - FindHEAD, and if found, search for, and maybe fix any existing meta charset entry, and return, or continue down and add a new one if none, inserting it in the node tree...

Note the different styles, based on whether this is html5, xhtml, or a legacy document... and like the TidyMark, this could be called from tidyDocCleanAndRepair() in tidylib.c, if the option is on...

Look forward to your code, either as a patch, or make a fork, add say a branch, and present it as a PR... thanks!

@marcoscaceres
Copy link
Contributor

@geoffmcl thanks for all the pointers, this is super helpful!

Only additional requirement is to make sure that the meta charset appears in the first 512 bytes, as per HTML5. I'll try to send this in parts over the next week.

@marcoscaceres
Copy link
Contributor

Regarding

  1. TidyMetaCharset, SGTM.
  2. add-meta-charset, SGTM.
  3. Description: /**< Adds/checks/fixes meta charset in the head, based on document type */

Let's see how I go with 4 now :)

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 6, 2016

See WIP PR #458

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 1, 2017

Since PR #458 has already been move to milestone 5.5, doing likewise here...

I hope you get the time to finish this great WIP soon... please ask if you need further help... thanks...

@geoffmcl geoffmcl modified the milestones: 5.5, 5.3 Mar 1, 2017
@marcoscaceres
Copy link
Contributor

Thanks @geoffmcl - doing my best to find time. If anyone wants to jump in and do a bit more hacking on it, I would certainly be supportive. It's nearly there!

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 1, 2017

Just noted we have an older #361 which also has some good notes on this... the effort should be combined...

I will look deeper into this in the coming days, since as stated, always prefer that tidy does not output invalid html...

@geoffmcl
Copy link
Contributor

@AthanasiusOfAlex, @marcoscaceres, @balthisar as agreed, further discussion on this new option --add-meta-charset should be moved from PR #458, back to here...

I have continued with some testing, using tidy-tests and find a few problems...

  1. Tidy 5.5.20.I458 crashes on case 1117013. You will note here that the attr.value can be NULL.
  2. And @marcoscaceres did ask can one dereference a TidyBuffer to get a pointer. No, one must use the buf.bp member - a pointer to contents, but it can also be a NULL.

But these are not big problems, and easily addressed...

But this regression testing also showed that I think, at least initially, this option should default to no. Otherwise, when this option is eventually merged a considerable number of test cases would need to be reset... There might be a day when we do default this to yes, but not on first introduction... but am willing to be swayed on this... feedback welcome...

And @balthisar, yes I will get around to setting up a issue-456 branch, containing the current state of this WIP feature, so we can all work on it... I have already tested this, no problem, but have one quick question...

@marcoscaceres what is the purpose of the two new files, attrask.c and attrget.c? They do not seem in any way included in the cmake build, or any other files, but maybe I have missed something... Simply can they be deleted?* Thanks...

@balthisar
Copy link
Member

@geoffmcl, I think those are legacy files, given that he based off of master instead of next. I think they were all deprecations early on.

geoffmcl added a commit that referenced this issue May 13, 2017
This pulls the work done by @marcoscaceres WIP #458 into the issue-456
branch, to complete the new add-meta-charset option.
@geoffmcl
Copy link
Contributor

@balthisar yes, thanks for legacy info... have dealt with that and merged @marcoscaceres branch meta-charset from his fork, to an issue-456 branch, but no fixes applied at this stage... but did default the option to no...

The open issues identified at this stage are the 2 mentioned above, the addition of the two messages, replacing the printf, and some other decisions on what to do in case of a clash of encoding... at present there is only the warning. Maybe tidy should also fix the encoding...

Also at present I think if it finds a meta charset it does not warn, or change the type, when it conflicts with the current output doctype... but is this wrong anyway... need to research that...

So still some work to be done perhaps even before a PR is created...

Any help appreciated... thanks...

@balthisar
Copy link
Member

@geoffmcl, I'll be happy to look at it, too. I'm working through our outstanding issues by oldest first, so I think this one will come up in the queue soon. If you beat me to, then you're welcome to implement the fixes, otherwise I'm hoping I'll be allowed some time this weekend.

@geoffmcl
Copy link
Contributor

@balthisar well I started to look at it and do some fixing, but then noted some more small problems...

Although I had suggested using a TidyBuffer, it might not be appropriate since the string in the buf.bp is not guaranteed to be a zero terminated C string, so even code like TY_(AddAttribute)( doc, metaTag, "content", (char*)buf.bp); can not be used...

It may be possible to ensure it is zero terminated with something like tidyBufAppend(&buf, "\0", 1); but need to check that... seems possible...

Anyway, ended up pushing nothing....

So if you get a chance to look at it, at least removing the printf by creating appropriate messages, that would be most appreciated... thanks... not sure I can get back to it before next week...

And then there is the question if an encoding mis-match is detected, should tidy fix it... or just warn... I would like a fix I think...

Quite an interesting service as you get into it, with many choices... ;=))

@geoffmcl
Copy link
Contributor

@balthisar just a quick ping to say I did find time to work on this, and think I have solved all the problems... and maybe we do not need any new messages...

Have prepared some tests, and am preparing a report, with still some questions to decide, but hope to push my results later today... just a heads up in case you also started looking at it...

geoffmcl added a commit that referenced this issue May 14, 2017
@geoffmcl
Copy link
Contributor

@balthisar, @marcoscaceres first it seems a TidyBuffer can be used using the suggested tidyBufAppend(&buf, "\0", 1);... that is good... and it would be trivial to always ensure the buffer append service added 1 extra byte, and always was zero terminated, but that is a separate issue...

As I prepared some test files, the first thing noted that Tidy has been silently correcting the output encoding of the <meta http-equiv...> tag, probably for years - did not check when the service TY_(VerifyHTTPEquiv)(doc, head) service was added - and imagine my surprise that by the time we reached the new service TidyMetaCharset(doc) the encoding on this particular tag was always correct...

Of course we still have to correct the <meta charset="value"> tag, but like the http-equiv change, maybe we do not need to warn. But on the other hand do not see why not! Tidy is modifying a tag, and as other comments here and there, maybe it should be a warning, or maybe at least an Info.

But if added here then should also be added if VerifyHTTPEquiv makes such a modification. Seek feedback on this...

So now I am thinking this new service should always be called, to fix the <meta charset="value"> tag, and only if --add-meta-charset yes would we add this tag if none existed in the document. And this could mean we in fact eliminate the VerifyHTTPEquiv service, and do it all here. Again seek feedback on this?

Just 6 tests files added at this time, covering most situations...

  1. in_456-1.html - html5, no charset meta, add if option yes - no warn
  2. in_456-2.html - html4, no charset meta, add if option yes - no warn
  3. in_456-3.html - xhtml5, with meta, nothing changed - no warn
  4. in_456-4.html - html4, 2 charset meta, correct one and discard one - warn
  5. in_456-5.html - html5, correct meta to utf-8 - no warn
  6. in_456-6.html - html5, 2 meta, correct one, discard one - warn

Test 4 is the same as case 1117013, but now no crash...

At this stage have coded for no warning on meta addition, if option on, nor on modification, as before, but warn on discards, and depreciated VerifyHTTPEquiv just with #if 0 ... #endif at this stage...

Now set to always calling this new service TidyMetaCharset... to potentially silently correct each meta type, as before, and only add a missing meta charset if requested, again silently fixing the document...

Also maybe this new service should be moved out of lexer.c, and added to clean.c, where the old service resided...

Hope others will get the chance to checkout this issue-456 branch, and take it for a spin... it feels quite complete, but needs testing... thanks...

@balthisar
Copy link
Member

@geoffmcl, afraid I won't get to it for a few days, unfortunately, in all likelihood. Will try to sneak a peak...

@geoffmcl
Copy link
Contributor

With latest commits, moved the new TidyMetaCharset to clean.c|h, out of lexer.c|h, and avoid doing any head cleaning if just showing body only...

@geoffmcl
Copy link
Contributor

Arrgh! just got around to regression tests... number of problems detected...

Immediate issue if the use of in-place tmbstrtolower, which permanently alters the attr string... must either copy, or avoid...

Working on it... all look solvable...

@balthisar
Copy link
Member

@geoffmcl, I've still not reviewed this, but this sounds familiar with #554, where I'm playing with the case of attributes selectively... maybe have a look.

@geoffmcl
Copy link
Contributor

@balthisar thanks for the #554 reminder, and indeed one of the remaining issues is case related... but this is the case of the =value, not in the actual attribute name...

Have effected some code changes, but there remains three (3), outstanding regression issues -

  1. 378b - But this seems related to a recent change of --fix-uri no
  2. 586562 - missing space - expects content="text/html; charset=iso-8859-1 - results "text/html;charset=iso-8859-1"!
  3. 676205 - expects iso-8859-1, results ISO-8859-1 - ie changed case of content =value only.

Case 1: 378b: This issue is about a utf-8, e acute, 0xC3 0xA9, in a HREF. The expects has No warnings or errors were found., but now 5.5.21.I456-2 does issue a warning...

Ok, that change was very recent, #378, so maybe I need to rebase/merge next into issue-456... that is bring it up to 5.5.24, then add all I546 on top... BUT somehow that always gets me in a big mess, with the final issue-456, containing first all @marcoscaceres commits, and my later commits repeated multiple times in the git log... seems/feels very bad!

How can I avoid this? What should I do? @balthisar I sometimes see you creating a new branch?

It would be really great if someone could checkout issue-456 branch, and kick it forward to 5.5.24, or later if that, push it, and advise the exact steps taken, so I can do this next time... thanks...

Case 2: 586562: This space is missing in the tesbase\case-586562.html, but is present in the testbase-expects\case-586582.html. It seems the old retired service VerifyHTTPEquiv made sure there was a space, while the new TidyMetaCharset preserves the input...

Now this case appears generated by the old Microsoft FrontPage 4.0, which obviously felt there was no need for this space... and I too think it is optional, but would be nice to have...

As stated, need to recheck what VerifyHTTPEquiv was doing, and see if it is easy to get TidyMetaCharset to do the same... or else we just update the expects?

Would appreciate feedback on this...

Case 3: 676205: What is the correct case for the meta http-equiv contents value?

Looking at say a IANA REF most are in upper case, but some are mixed. Tidy's internal enc2iana[] table is effectively all lowercase.

Testbase input is uppercase - ISO-8859-1, testbase-expects is lowercase - iso-8859-1, our new results preserves the uppercase of the input...

This would not be trivial change in TidyMetaCharset, to not use tmbsubstr, which in effect then uses tmbstrcasecmp, so does not flag the uppercase input as a mismatch...

There used to be two tmbsubstrncase and tmbsubstr, but at some point the former was commented out, and the latter made case-insensitive...

The IANA reference clearly states - no distinction is made between use of upper and lower case letters. - so again we could just update the expects...

I tend towards using the users input, since it is not wrong, but on the other hand we could use tidy's lowercase table for general consistency...

Again seek feedback on this...

So far I have not pushed my fixes, in clean.c - they are quite messy at this point, and need to be cleaned up... but below attach the messy diff, just to see my thinking, and some questions raised in comments...

But would really like Case 1: 378b: to drop off my radar first... any help getting there with git magic help would be most appreciated... thanks...

diff --git a/src/clean.c b/src/clean.c
index b4e9a38..026bfe4 100644
--- a/src/clean.c
+++ b/src/clean.c
@@ -2309,8 +2309,8 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
     Node *prevNode;
     TidyBuffer buf;
     TidyBuffer charsetString;
-    tmbstr httpEquivAttrValue;
-    tmbstr lcontent;
+    /* tmbstr httpEquivAttrValue; */
+    /* tmbstr lcontent; */
     tmbstr newValue;
     /* We can't do anything we don't have a head or encoding is NULL */
     if (!head || !enc || !TY_(tmbstrlen)(enc))
@@ -2330,7 +2330,7 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
     tidyBufAppend(&charsetString, "charset=", 8);
     tidyBufAppend(&charsetString, (char*)enc, TY_(tmbstrlen)(enc));
     tidyBufAppend(&charsetString, "\0", 1); /* zero terminate the buffer */
-                                            /* process the children of the head */
+    /* process the children of the head */
     for (currentNode = head->content; currentNode; currentNode = currentNode->next)
     {
         if (!nodeIsMETA(currentNode))
@@ -2339,10 +2339,10 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
         httpEquivAttr = attrGetHTTP_EQUIV(currentNode);
         if (!charsetAttr && !httpEquivAttr)
             continue;   /* has no charset attribute */
-                        /*
-                        Meta charset comes in quite a few flavors:
-                        1. <meta charset="value"> - expected for (X)HTML5.
-                        */
+        /*
+            Meta charset comes in quite a few flavors:
+            1. <meta charset="value"> - expected for (X)HTML5.
+         */
         if (charsetAttr && !httpEquivAttr)
         {
             /* we already found one, so remove the rest. */
@@ -2355,8 +2355,8 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
                 continue;
             }
             charsetFound = yes;
-            /* Fix mismatched attribute value */
-            if (TY_(tmbstrcmp)(TY_(tmbstrtolower)(charsetAttr->value), enc) != 0)
+            /* Fix mismatched attribute value - note case insensitive match */
+            if (TY_(tmbstrcasecmp)(charsetAttr->value, enc) != 0)
             {
                 newValue = (tmbstr)TidyDocAlloc(doc, TY_(tmbstrlen)(enc) + 1);   /* allocate + 1 for 0 */
                 TY_(tmbstrcpy)(newValue, enc);
@@ -2367,11 +2367,16 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
                 TidyDocFree(doc, charsetAttr->value);   /* free current value */
                 charsetAttr->value = newValue;
             }
+#if 0 /* 000000000000000000000000000000000 not sure about this 
+                but have read certain documents where the charset
+                should be present in the first 1024 bytes of the 
+                doc... and what about likewise for the html4 form? */
             /* Make sure it's the first element. */
             if (currentNode != head->content->next) {
                 TY_(RemoveNode)(currentNode);
                 TY_(InsertNodeAtStart)(head, currentNode);
             }
+#endif /* 00000000000000000000000000000000 */
             continue;
         }
         /*
@@ -2391,24 +2396,30 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
                 currentNode = prevNode;
                 continue;
             }
-            httpEquivAttrValue = TY_(tmbstrtolower)(httpEquivAttr->value);
-            if (TY_(tmbstrcmp)(httpEquivAttr->value, (tmbstr) "content-type") != 0)
-                continue;   /* is not 'content-type' */
+            /* httpEquivAttrValue = TY_(tmbstrtolower)(httpEquivAttr->value); */
+            if (TY_(tmbstrcasecmp)(httpEquivAttr->value, (tmbstr) "Content-Type") != 0)
+                continue;   /* is not 'Content-Type' */
             if (!contentAttr->value)
             {
+#if 0 /* 00000000000000000000000000000 
+                case 1117013 (in_456-4) keeps this 'content=""'! But WHY? */
                 prevNode = currentNode->prev;
                 /* maybe need better message here */
                 TY_(ReportError)(doc, head, currentNode, DISCARDING_UNEXPECTED);
                 TY_(DiscardElement)(doc, currentNode);
                 currentNode = prevNode;
+#endif /* 0000000000000000000000000000 */
                 continue;
             }
             /* check encoding matches
             If a miss-match found here, fix it. previous silently done
             in void TY_(VerifyHTTPEquiv)(TidyDocImpl* doc, Node *head)
             */
-            lcontent = TY_(tmbstrtolower)(contentAttr->value);
-            if (TY_(tmbsubstr)(lcontent, charsetString.bp))
+            /* lcontent = TY_(tmbstrtolower)(contentAttr->value); 
+               Note: 'tmbsubstr' uses 'tmbstrcasecmp`, so 'ISO-' 
+               will match 'iso-'! Is this desired?
+               see cases 586562 and 676205 */
+            if (TY_(tmbsubstr)(contentAttr->value, charsetString.bp))
             {
                 /* we already found one, so remove the rest. */
                 if (charsetFound)
@@ -2446,7 +2457,8 @@ Bool TY_(TidyMetaCharset)(TidyDocImpl* doc)
         }
         /*
         3. <meta charset="utf-8" http-equiv="Content-Type" content="...">
-        This is generally bad. Discard and warn.
+           This is generally bad. Discard and warn.
+           Not so sure about this, but seems a reasonable idea?
         */
         if (httpEquivAttr && charsetAttr)
         {

Any and all feedback very welcome... this seems a good option, but important to get it right... thanks

@geoffmcl
Copy link
Contributor

Updated the issue-456 branch, ie merged next, and created a WIP PR #565...

But this is still a WIP!

Somehow, I am getting stale on this... I just can not seem to get it right...

With the last desperate commit, reusing the old service if the option is off, which it is by default, so I still use that service, but should not have to...

But still about 3 regression tests fail in the diff... the diff shows a change in some message encoding when showing an attribute value... cases 427664-1, 427664 and 427672, where a value 1/2 is now output as 122... so maybe this difference is nothing to do with what this issue is about... maybe a previous commit... more checking needed...

Any help appreciated... maybe a new pair of eyes will quickly see what I am missing ... thanks...

@geoffmcl
Copy link
Contributor

Updated the issue-456 branch, ie merged next, to pick up the fix for #395, PR #564... now at least the issue-456 branch passes the regression tests - PHEW - which it should with this options default no...

Will try to find the energy to continue with this PR #565... it does seem so close... but as always, any help appreciated... at least reading the code comments, and commenting... thanks...

@balthisar
Copy link
Member

@geoffmcl, sorry, I've been incommunicado in the great forests of Canada. I'll try to have a look over the weekend.

geoffmcl added a commit that referenced this issue Jun 4, 2017
@geoffmcl
Copy link
Contributor

geoffmcl commented Jun 4, 2017

@balthisar no problem... hope it was a great adventure...

It seems I was shooting myself in the foot with a stupid reversed logic error... BAH!

Have fixed that, and now with the default no for this option, it 100% passes the regression tests...

Of course, with this option yes, lots of tests now get a new meta charset, so there is lot of difference, as expected, since many tests did not have one before... but still to fully check that output - like seems I saw one case where the wrong type of meta was added... but need to study more...

So this has taken a good step forward... have now 100% replaced the previous service, with the new single service, which can be discussed and refined...

As suggested have put a number of comments in the code, and will try to find time to enumerate and discuss them here... trying to make the right choices... so when you get the chance... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Jun 4, 2017

This issue is moving forward, but there are a numbers of QUESTIONS which I seek feedback on...

There seems no problem when there is no existing meta charset in the document. Simply add one of the appropriate type, and encoding value, namely -

  • html5 <meta charset="utf-8">
  • others <meta content="text/html; charset=iso-8859-1">

Oops, but that does raise a question. I now believe the others should be -

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

But that's an easy fix in the new single service...

The questions pile up when there is an existing meta charset, of one type or the other, or indeed more than one... so -

  1. Multiple charset declarations

While have never actually seen this in RL, it could happen, and Tidy needs to choose what to do about it.

At present it just discards the 2nd or later meta... just keeping the first...

But if we do choose to discard, and warn about the discard, a further improvement would be to discard those that do not match the doctype...

  1. Charset with wrong encoding

Obvious Tidy should fix this, but this could involve, lead to the next case...

  1. Charset encoding of wrong type for doctype

At present nothing is done about this...

But to be perfectly correct, Tidy should change the meta to a declaration that matches the doctype...

  1. To warn or not, or at least info

Presently tidy does what it does in other case - silently fixes problems - As has been addressed, discussed, in several other issues, maybe there should be at least a suppressible Info: message if it adds or modifies a charset meta...

Then there is the strange test case 1117013.html which has what is really a mal-formed meta -

<META HTTP-EQUIV="Content-Type" CONTENT="">

The original new service was coded to discard this, but have commented it out for now, due to conforming to the test expects... But as I think about this more, I do think this should be discarded, and the expects adjusted accordingly...

Probably some of these issues could be decided re-reading the W3C specs, and maybe by presenting a sample to the W3C validator, and choose an action accordingly...

As mentioned above I have already pushed 6 test cases - in_456-[1..6].html - to my test repo. Maybe these could be the basis of tests eventually added to the tidy-tests repo...

With this option set yes, each need an expects created, and that output validated... tedious work... any help in this process would be most appreciated... thanks...

Maybe more tests need to be added to specifically address the above questions, again matching with W3C specs, and validated... again help and feedback appreciated... thanks...

PS: One unrelated minor quibble. Believing that this meta should be declared as early as possible in the document, I now do not like that the <meta name="generator" content="...Tidy..."> gets put above this... but maybe a later small issue...

geoffmcl added a commit that referenced this issue Jun 4, 2017
It also fixes the addition of the constant 'http-equiv="Content-Type"
attribute.
@geoffmcl
Copy link
Contributor

geoffmcl commented Jun 4, 2017

@balthisar the above push adds an Info: type message when Tidy adds an appropriate <meta charset=... node to the document, because this option was yes, and none were found...

It also fixes the addition of the http-equiv="Content-Type" attribute noted above...

Moving forward slowly, hopefully... but look forward to feedback... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Jun 5, 2017

@balthisar the above push adds an Info: type message when Tidy modifies, corrects, replaces the charset value with the current output encoding...

As mentioned in #561, leveraged an existing BAD_ATTRIBUTE_VALUE_REPLACED, creating a new ATTRIBUTE_VALUE_REPLACED message, but in the ReportAttrError function, switch (code) tumble created a message with TidyInfo... This is manageable but messy...

And while it gives good info to the user, it does not say what it replaced it with... but is ok... I suppose... tests in_456-5.html and in_456-6.html exercise this new message... which can be suppressed with --show-info no option...

Testing and feedback welcome... thanks...

geoffmcl added a commit that referenced this issue Jun 9, 2017
@geoffmcl
Copy link
Contributor

geoffmcl commented Jun 9, 2017

To fully maintain backward compatibility, add new option --show-meta-change Bool...

In the past Tidy fixed the <meta http-equiv="content-type" content="text/html; charset=UTF-8"> without showing this fix... a silent change... not always liked, but a fact...

This option, default to no, maintains that compatibility, but not for the HTML5 charset changes, or addition, which are always reported as an Info: type message... which can be suppressed by --show-info no...

With this default option no, no regression test is changed... all pass, 100%...

This new option --add-meta-charset now feels complete, see PR #565

As stated, testing and feedback welcome on this issue-456 branch... thanks...

@balthisar
Copy link
Member

It's working quite nicely for me. I pushed back changes to address some merge conflicts in light of other PR's that were merged. If no objections or you don't beat me to it, this is a nice fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants