Skip to content

Message Handling "cleanup" #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
balthisar opened this issue May 21, 2017 · 6 comments
Closed

Message Handling "cleanup" #561

balthisar opened this issue May 21, 2017 · 6 comments

Comments

@balthisar
Copy link
Member

balthisar commented May 21, 2017

I started to write this in response to #412, but it's actually a broader issue, and I'd like to spur some discussion on a bit of refactoring to simplify Tidy, as well as address #412.

In #412, we're looking for a way to allow Tidy to show additional information whenever it performs any type of fix, including the fixes that are currently "quiet" fixes. The original proposal was to add another new configuration option to the mix. Instead, I'd like to step back and see if we can simplify this completely.

Currently we have messages categorized per this enum:

typedef enum
{
    TidyInfo = 350,       /**< Report: Information about markup usage */
    TidyWarning,          /**< Report: Warning message */
    TidyConfig,           /**< Report: Configuration error */
    TidyAccess,           /**< Report: Accessibility message */
    TidyError,            /**< Report: Error message - output suppressed */
    TidyBadDocument,      /**< Report: I/O or file system error */
    TidyFatal,            /**< Report: Crash! */
    TidyDialogueInfo,     /**< Dialogue: Non-document related information */
    TidyDialogueSummary,  /**< Dialogue: Summary-related information */
    TidyDialogueDoc,      /**< Dialogue: Document-related information */
} TidyReportLevel;

Nominally only TidyWarning and TidyError result in a non-zero exit code from the CLI, although file errors and the like can change the exit code, too.

The TidyInfo isn't used very much; perhaps it should be used a little bit more for cases like this, i.e., "silent changes" become TidyInfo messages that we can handle appropriately. Or maybe we insert a new category between TidyInfo and TidyWarning, e.g., `TidyTouchup" or something.

Before considering yet another new option, though, what do we already have?

  • show-info will squelch any TidyInfo messages, but it also squelches some of the "dialogue-like" output that Tidy delivers (the new TidyDialogueInfo) category.
  • show-warnings can be used to hide the warnings
  • show-errors isn't a boolean, but places a limit to the amount of errors that will be output, and setting this to zero does squelch all errors. Now the thing about show-errors is that it only limits errors, not warnings or info. So if you set show-errors to, say, 10, but have 100 warnings, you'll still see all of them.

If you're using a callback filter, none of these options have any effect, as it becomes the responsibility of the host program to implement all of this functionality.

All of the document-related messages are generated and output in the order they are detected during Tidying, so the message-table output is a mix of warnings and errors, and not necessarily in document line number order (e.g., from the lexer, from the parser, from the post-processing that Accessibility produces).

Tidy's output is also mixed in that document-related messages can be interspersed with operational messages. For example, missing config file isn't related to a document issue but precedes the document information table. The Accessibility message interrupts the document message table for no good reason, and finally summary information and other dialogue-type information is added at the end.

So, what would we do if we were starting from scratch?

What I would do is eliminate all of these filtering and limited functions from LibTidy completely, and move them to the CLI, and let the CLI handle this stuff just like any other LibTidy user is expected to do. I tend to think that this would be an unpopular move, though, and it would complicate configuration files, and we have a rich history of allowing these options in the config file. So, I won't go there.

I think we need to carefully define the granularity that is expected from document messages versus dialogue messages, and attack them separately.

To handle the document messages, I suggest that we eliminate show-warnings and show-info entirely, and replace show-errors with something else. Instead, we have a new show-document-messages option with the values possible values of "None" or "Quiet", "Errors", "Warnings", and "Info", each of which show progressively more information.

  • "None" or "Quiet" shows no document messages at all.
  • "Errors" shows only errors.
  • "Warnings" shows errors and warnings. This would be default.
  • "Info" shows everything.

I would also buffer all of the document message output, and emit it in line and column number order instead of the current order, which is line and column order for each piece of Tidy handling the code. I don't think we support any embedded systems with such little memory that we can't spare a few K to hold document messages in memory until we're done Tidying, so that we can sort them. However, is a bit tricky because tidyParseFile() (or stdin), tidyCleanAndRepair(), and tidyRunDiagnostics() each produce their own messages. Maybe for a "phase 2" down the line...

show-errors would also be replaced with limit-document-messages which limits the number of messages, not just the number of errors.

Moving on, there are also several other types of non-document messages that are one of the "dialogue" message types. Things such as reporting missing config files, giving helpful information, giving the error count summary, showing information about Tidy, are all "dialogue" type messages. Right now both show-info and quiet affect these, but it's not currently clearly, logically defined what and why certain pieces of this information is hidden.

Maybe we replace quiet and show-info with another enum type option, e.g., "show-other-messages" with values that bring some sanity to this.

  • "None" or "Quiet" shows nothing, except for errors (such as missing config files) that we must report.
  • "Summary" shows only the summary.
  • "Messages" shows the summary and other helpful messages.
  • "All" shows the summary, helpful messages, and the Tidy information. This would be default.
@geoffmcl
Copy link
Contributor

@balthisar thanks for this issue... sorry for the small delay on this...

I must have read, and re-read this message umpteen times, or more!, but still a little unclear on exactly what you are proposing... sorry...

Does it come down to a Feature Request to add an new enumeration option show-other-messages?

But some other things you touched on -

buffer all of the document message output

As you point out, an app using libtidy does, or can, do this, and could then sort them, if it felt necessary...

But should our console sample use of libtidy do this? I think not! Why are we, as libtidy maintainers making our console app even more complicated? Just to show it can be done, or what?

As you do understand, the current order is the result of the phases tidy goes through... As you get to know tidy this in itself is a clear indication of which phase the warning, error, came from... It works great for debug mode, where I even output more messages...

And take an extreem example of a crash, a segfault, now we would see nothing... the user would not be able to give us some indication of what output was already given before the event... and neither would I see them in debug mode...

So, no I do not think the messages should be buffered, or sorted... by libtidy or console tidy... and as you agree a 3rdParty app using libTidy can already do this... Let them offer a TidyPro with all the bells and whistles possible...

Where is the gain for HTML Tidy? More maintenance...

And the new suggested show-other-messages enum could be in addition to the option already in existence, most of which have been there since the beginning, for 17 years plus, except show-info, which I added much, much later, mainly as a debug aid... As I run tidy thousands of times in testing, debugging, I got sick of that final blah-blah-blah, and added this option... it does not even suppress all that I want...

As an additive option, the users have the choice of what to use... we should not be the dictator, or arbiter, of what is best in this regards... no matter how more logical you think this would be...

And so far none of this really touches what is started in #412... and we have had other requests to improve the message content, especially when libTidy has applied a fix. And I agree some messages should perhaps be much more explicit about what libtidy is doing...

But that does not seem to suggest we should restructure the message system... maybe it does mean improving message wording... but on the other hand, after a few days working with tidy, it should become very clear that just about every warning/error means tidy did something... touched your html... fixed it...

And I, like others, get confused by some message, like say <p>Para 1<p> will output Warning: trimming empty <p>. Huh? Takes some reading to understand the error fixed... I am sure everyone could point out many of these... but none of these are solved by, for some reason, re-enumerating the message types...

Then there are the so called silent changes... these need to be singularly identified, and message constructed, and maybe we could use Info: for these... but I would need to see and think about them on an individual case by case... and I certainly do not see the need to establish yet another message type...

You do make a good point that maybe show-error should perhaps cover all output, not just errors, but again that's probably a separate issue, and does not mean revamping the message system itself...

Did I miss something in your comment? Please bring it up again if I did...

But this at least gives you some of my perspective on this -

  1. How can we simplify console tidy?
  2. You, or others can fork it, and build a TidyPro
  3. What specific messages do you want to change?
  4. What messages do you want to add, or remove?
  5. Do you want to add a new option show-other-messages?
  6. Are you suggesting the show-error should apply to all output?
  7. What messages do you want to change the type on?
  8. Other items?

In fact maybe each of these could be a separate issue... discussed, decided... certainly not all lumped together under the catch-all title Message Handling "cleanup"...

As stated, still quite confused... what cleanup do you want to do? Thanks...

@geoffmcl
Copy link
Contributor

@balthisar maybe your messages_experiment branch is a WIP, and that can be understood...

But a rough review of the 130K diff file, shows some alarming things... If this is truely an experiment, and will be deleted, then skip this quick review... no problem... sorry for the noise...

RESEARCH:
New Messages:
+    FN(ENCODING_IO_CONFLICT)          \
+    FN(UNESCAPED_ELEMENT)             \
+    FN(COERCE_TO_ENDTAG_WARN)         \
-    FN(UNEXPECTED_ENDTAG)             \
-    FN(UNEXPECTED_ENDTAG_XML)         \
+    FN(UNEXPECTED_ENDTAG)
Maybe all good, but could we **not** discuss these, one by one?
New enumeration:
Essentially replaced TidyBadFile with TidyBadDocument, but then lots of spaces changes...
and
-    { TidyBadFile,            0,   "File: "          },
+    { TidyBadDocument,        0,   "Document: "      },
Could this not be discussed?
Name Change:
-            TY_(Report)(doc, NULL, node, PROPRIETARY_ELEMENT);
+            TY_(ReportError)(doc, NULL, node, PROPRIETARY_ELEMENT);
and
-        TY_(ReportFileError)( doc, filnam, FILE_CANT_OPEN );
+        TY_(FileError)( doc, filnam, TidyError, FILE_CANT_OPEN );
which might be all ok, but WHY? Again, discussion...
Changes in FR language file:
Maybe correct, but should be a separate issue...
Function Deleted:
-static ctmbstr HTMLVersion( TidyDocImpl* doc )
Again maybe correct, but should be a separate issue...
Possible change in severity:
-                            TY_(Report)( doc, NULL, lexer->token, UNKNOWN_ELEMENT_LOOKS_CUSTOM );
+                            TY_(ReportFatal)( doc, NULL, lexer->token, UNKNOWN_ELEMENT_LOOKS_CUSTOM );
and
-                TY_(Report)(doc, element, node, NESTED_EMPHASIS);
+                TY_(ReportWarning)(doc, element, node, NESTED_EMPHASIS);
Several, many, of these...
As always, maybe **right**, **correct**, **perfect**, but should be discussed, reviewed, first...
Message.c changes:
It is almost **impossible** to review what has been done here... Again **maybe** everything is **great**, but how can a reviewer tell? Especially when mixed with all the rest...

Now, as expressed, maybe everything read is good for tidy, but there is no way such a melange of changes can be accepted without some discussion, justification, item by item... sorry...

As my first post indicated, split this up... show the new messages, and the reason, discuss the changes in the severity level of certain messages, if this is what is intended, and so on, and on...

Wow, we love the enthusiasm, and the effort put into this, but this is granddaddy tidy we are talking about... please take it easy... step by step... incremental...

Please respect the collaborative process... I could also open an all inclusive issue HTML Tidy "fixes", and put everything I can think of into that... and they might all be good... but would expect such a broad, inclusive issue/PR to be rejected... without an item by item discussion...

Maybe just a voice in the wilderness, previously accused of carmudgeony... perhaps I am, but I care for tidy to the best of my ability... please help... thanks...

@geoffmcl
Copy link
Contributor

@balthisar I cycle back to this... given that in every proposed change there must be an element of truth, and with that, maybe benefit... so I seek to find that... embrace that... accept that...

Ok, the biggest change here is to rename Report to ReportError... seems innocuous, except, in some cases, it is changed to ReportFatal, or ReportWarning... Oops...

So you want to move the classification of the Report type into many lower level parts of the code! How can that be beneficial? Each part of the code should just Report... as it does now...

To me, everywhere, and anywhere, tidy wants to Report, bad html, a problem fixed, a minor change, a big bad problem, etc, etc, ... it should call Report, and then we can 100% control, in the single service, Report, how that is handled... What it sets...

Does this not make sense? Seem very logical? Or have I got something real wrong... please tell me... explain why such a distributed system is better, benefits a libTidy developer, than exactly one Report service...

Explain why sprinkle, throughout tidy library code, such various categories of reports, is some how better, more logical, easier to understand, to a developer... I do not see it! And I suppose oppose it...

And this has nothing to do with the introduction of some new Reports... They could be welcomed, very necessary, but that should be the subject of a separate issue... discussed, decided, agreed...

Really, just seek understanding here... not opposition... thanks

@balthisar
Copy link
Member Author

@geoffmcl, thanks for the many walls of text. I won't address all of them right now, as I'm back from camping. The main point of the feature request is to start a public discussion, and you've provided a lot, which complements my huge wall of text.

Does it come down to a Feature Request to add an new enumeration option show-other-messages?

Possibly, but only considering the overall messages and deprecating the redundant options.

buffer all of the document message output

Comment made in passing. I have no intention of doing this, but it would be user-friendly.

It works great for debug mode, where I even output more messages...

Windows only, unfortunately. I made a commitment to you to make this cross platform, and I'll do that eventually, as it's clearly useful. However that's invisible to most users.

Where is the gain for HTML Tidy? More maintenance...

Generally, I'm not afraid of maintenance, as that's the commitment that we made for Tidy. I'm referring to maintenance in general, not in sorting output (as I said, not on the radar for me right now). It also means that I'm not afraid of things the potentially reduce future maintenance, even if they're large merges that fundamentally change internals. External API stability is important, but as we're working with next I see things like removing deprecations, adjusting options, etc., as simply par for the course.

and does not mean revamping the message system itself...

Actually the message system was massively revamped when we added the new extensible callback API. What I failed to do then was categorize every message, and determine the reason for its output. Most of the errors/warnings are obvious, but there's other seemingly random output that's beyond the users' control. When I say "revamp" now, I mean inventory every bit of output, categorize it, and allow the user to control what he or she sees, while at the same time, improving the flexibility of what we can output, without being so rigid. The messaging system is an example of something that's been patched and patched and patched over and over again, and rather than adding another patch, I'm trying to understand how it can be generalized, what's involved, and even if it's possible.

most of which have been there since the beginning, for 17 years plus

If something's been around for 17 means means that it's good enough, then we should simply stop maintaining Tidy! It's obviously good enough!

To me, everywhere, and anywhere, tidy wants to Report, bad html, a problem fixed, a minor change, a big bad problem, etc, etc, ... it should call Report, and then we can 100% control, in the single service, Report, how that is handled... What it sets...

Yes! That's the ultimate goal. This branch was pushed here instead of my repo. It's not a proposed PR, and it's mostly a learning exercise to see what's possible. It's definitely a WIP, and probably out of place for discussion in this issue thread. However, what I'm trying to do is:

  • Each string is associated with a single error level. Right now, some strings have duplicates, and some strings can be used for multiple error levels. The maintenance advantage is predictability of what a string will produce, but also flexibility if we want to change wording.
  • The ability to pass arbitrary, extra information for format strings.
  • A single "Report" type function that can be used anywhere. This appears possible, but not convenient. Right now during my little exercise, there are some helper functions that take only the required parameters instead of every parameter (e.g., we don't always need an element or a node). I prefer functions to macros, but it would be easy to define convenience macros instead of functions to use the one-Report-to-rule-them-all function.
  • Inventory all of the output. There are several strings that never have the potential to be called.
  • Assigned meaningful metadata to each piece of output, so that later changes can suppress them if wanted, in a generalized way.

I keep using the word "generalized." As a concept, it's not even that modern, but generalization is an important model that, while sometimes tough to implement, makes future maintenance simpler, reduces future API churn, and makes implementation of new or updated features that much easier. Infrastructure is a pain in the ass, but sometimes it's good for the long-term health of a project.

For the most part, ignore this branch for now. It's completely experimental while I continue to learn why and when Tidy spouts what it does, and develop a possibly generic approach to handling messages. I'll get there, or fail. Localization started off as an experiment, and the new callback API started as an experiment, and they were both huge changes that helped Tidy rather than hurting it. Have some confidence that I won't propose something that hurts Tidy.

@geoffmcl
Copy link
Contributor

geoffmcl commented Jun 5, 2017

@balthisar it was only recently that I realized I had done the diff backwards - new-to-old, instead of old-to-new - but you can see I clearly liked the one-Report-to-rule-them-all function ;=)) and even thought for a moment that was what we had!

For issue #456 I was just looking at how can I add a new message like Info: Modified meta charset from '%s' to '%s'., or something, but am struggling. It seems we have no handler to pass two such string, although we do have handlers that sort of do that...

We have say the BAD_ATTRIBUTE_VALUE, which might work, but would need to test it to see what I get... but would still be faced with the fact that this is a warning, and I want this only as info...

So I agree the problem with a single Report function is to be able to pass arbitrary, extra information for format strings.!

But again I would really suggest you separate what you want to do into different, completely separate, issues, like - just taken from the big diff -

  1. Retire strings that are not used. Which? Good idea...
  2. Deal with the TidyBadFile = TidyBadDocument separately
  3. Change in FILE_CANT_OPEN_CONFIG, but can't sort out what exactly...
  4. Changes in status of some messages... maybe... which?
  5. Maybe other small things I missed...

These have nothing to do with this idea of single-report-function. If presented separately, each can be easily discussed, reviewed, agreed, and merged... all done... no wall of words...

I do understand the single-report-function change would be massive... many files... and so far what I read looked very good... this is not about the size of the change per se... I just whine a little when there seems unrelated changes, that could, should, each be dealt with separately...

Look forward to further experiments... thanks...

@balthisar
Copy link
Member Author

I'll close this in favor of individual feature requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants