Skip to content

Tidy should word wrap its message output #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
balthisar opened this issue Mar 28, 2017 · 5 comments
Open

Tidy should word wrap its message output #519

balthisar opened this issue Mar 28, 2017 · 5 comments
Assignees
Milestone

Comments

@balthisar
Copy link
Member

Tidy should word wrap its message output when used in a console of arbitrary sizes. Currently all Tidy output is limited to 80 characters regardless of console dimensions.

  • Sometimes when working with multiple consoles, in a crowded window, the size is fewer than 80 columns.
    • While many consoles wrap, some do not.
    • Consoles that wrap do no break on whitespace; they break at the end of the line.
  • Often, though, the console far exceeds 80 columns.
    • Tidy wraps at 80 columns despite having additional columns available.
    • Users want to use the full width of their console.
  • All well-behaved console applications use the entire given console width.
    • Can you imagine reading a man page forced to 80 columns?
    • What if vi or nano didn’t use the full console width?
    • Tidy is currently not “well behaved” in this respect.
  • When outputting to a file, we often prefer soft-wrap rather than hard wraps, because editor size is like terminal size, above.
  • When fetching messages from the callback API, the current forced wrapping makes adaption to GUI rather ugly, in that there’s no fixed number of columns, and fonts are more often proportional in size than not.

Proposal

  • Remove line breaks (other than those present for formatting or as paragraph breaks) from Tidy’s strings.
  • Modify the current output functions to wrap text at an arbitrary number of columns.
  • Have the console application use the width of the terminal by default as the word wrap point.
  • If the file option is used, or any output is piped to a file, or Tidy is run in a non-interactive shell, it should default to 80 characters.

Optional

  • We should add a new configuration option (for example, console-width) to allow users to specify their own preference.
    • For example, some people are perfectly happy with 80 characters of output all the time.
    • Other uses may want to specify no wrapping at all for redirected output (so that text editors works with whole paragraphs rather than bits and fragments interrupted by carriage returns).

Alternative proposal

  • Do not include the new option.
  • Do not detect the tty width.
  • Do eliminate newlines in the source text.
  • Fix output to 80 columns as it is today, but let the word-wrap algorithm perform the wrapping rather than code it by hand.
    • This still addresses the callback API issues by providing sentences and paragraphs instead of broken fragments.

Summary:

  • Users that don’t want modern console behavior can use console-width to keep things statically frozen in time for console use
  • Others get to enjoy standard behavior in the console.
  • No change at all when used with file option, or pipes/redirection, or run in a login-only shell.
  • API delivers real sentences instead of fragments.
@geoffmcl
Copy link
Contributor

@balthisar thank you for this FR, and even a PR #518 implementing this, but unfortunately much more, which I do not 100% agree with...

As stated that PR has some very good Doxygen additions... and these should move forward... nothing to do with this...

In a simple reading of this FR I suppose I could add a +1 tag... but...

But, sorry, there are some issues... that I see... maybe I am real wrong! And only look for clarification...

Like While many consoles wrap, some do not.? Which consoles do not wrap? Yes, they for sure wrap at the console width, and do not respect a space as a wrap criteria, but this is the essence of a tty output...

So why should libtidy, get involved in this? Simply, IMO, tidy should not be involved in this... for many reasons...

If you have an app that uses libtidy, then you can deal with re-lining these messages, in a GUI, for instance... like libtidy deals with newline characters in the html input stream...

So I ask, what are we gaining here?

Yes, a new option, --console-width nn, but what is added? What is gained?

So, sorry, still on the negative side here, but can be convinced...

I think, for sure, Tidy messages should wrap at about 80 chars... any user app that uses libtidy can deal with that...

Should we re-set our message output, globally, removing these \n? I think NO!

But it seems just a voice in the wilderness here...

@balthisar
Copy link
Member Author

So why should libtidy, get involved in this? Simply, IMO, tidy should not be involved in this... for many reasons...

Actually, that's an excellent point! There are several more so called "configuration options" that I'd like to gut from LibTidy completely.

From the pure library standpoint, Tidy should be delivering strings, unbroken, without any assumptions about formatting. The console application -- as a LibTidy client -- should be responsible for all output, including wrapping the text, but other things such as displaying messages and errors. The Library shouldn't be doing any of this.

Simple task to move the word-wrapping from LibTidy into the console, leave the strings in LibTidy without stupid newlines, and purify the library a little bit.

Other things that should be removed are things like show-info, show-errors, etc., that are very definitely client-related and not library related, but they're on my longer-term to-do list. The new message callback API will make this a LOT easier by not counting on an output sink that thinks it know better about the environment than the user.

In any case, moving the functionality to client is something I'm willing to do, but we absolutely cannot have hard breaks in the language strings. That's just stupid: it's bad data management; they're not needed; and it's committing the cardinal sin of mixing presentation and content, something that Tidy itself tries to prevent by, for example, creating CSS classes where necessary.

But, yeah, this change would definitely permit removing the console-width option. This is a good change. Thanks for setting me straight.

@geoffmcl
Copy link
Contributor

@balthisar wow, how many more ways can you say these stupid newlines ;=))

But how this can be classified as a cardinal sin, which I thought were pride, greed, lust, envy, gluttony, wrath, and sloth is quite beyond me... or even informally, an unforgivable error or misjudgment...

If we really wanted a console Word Wrap option, and I obviously do not think we do, then the wrapping service could just as easily remove them, before applying the new wrapping... no need to change the strings themselves... but just an idea...

And even if a 3rd party app uses the new message callback API, which I hope you are not suggesting we use in tidy.c, it can handle the data however it wants... newline characters are not blocks of stone...

OT: Then you jump all over config option that you suggest are client, not library related? Yes, one of the library API actions is to decode, and store a few client only, options, like even -quiet... What is so bad about this? I do not think these should be gutted, especially with some not well understood or defined aim of purifying the library! Another religious reference?

And then I started to wonder again what are all these other changes in tidy.c, which I thought appeared on Doxygen formatting, like @file, @para, etc. But what has Doxygen got to do with tidy.c? I suppose maybe we could have a http://console.html-tidy.org/ site, but I do not think it should be blended with the current http://api.html-tidy.org/... or am I really missing something here?

There is no doubt we could improve and increase the tidy.c internal code documentation, like you have done with say the print2Columns - that is great, and helps understand the service, but that does not need to be Doxygen formatting, like /**< blah blah */, does it? Would not just /* blah blah */ be ok? But as stated, maybe way out of my tree here...

At this moment I still feel most of #518 is not needed, but can be swayed... thanks...

And a small point. If you do suggest a change in the base language_en_gb.h file, then you should just present that for consideration, not go on and change the other language files immediately, and certainly not auto-regenerate the *.po, and tidy.pot in the same pass...

If the change is agreed, then these additional, very necessary steps can come later... This would certainly reduce the diff review process... all such changes stem from the one language_en_gb.h I think, but again maybe I am wrong...

@balthisar
Copy link
Member Author

balthisar commented Mar 29, 2017

Do you have a valid argument against removing new lines? You're seem to be rejecting change without a valid reason why it's a bad thing.

@LWillms
Copy link

LWillms commented Mar 30, 2017

I agree with Balthisar to remove all presentation oriented stuff from the library.

Edit on next day:
With "presentation" I do not mean the formatting of the HTML document processed by Tidy, but the presentation of messages to the client, be it tidy.exe or some other program (like e.g. UltraEdit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants