Skip to content

Strip and chomp #385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ivan-pi opened this issue Apr 10, 2021 · 8 comments
Open

Strip and chomp #385

ivan-pi opened this issue Apr 10, 2021 · 8 comments
Labels
topic: strings String processing

Comments

@ivan-pi
Copy link
Member

ivan-pi commented Apr 10, 2021

@sgeard commented in #343 (comment):

I'm not convinced this is the right approach. String handling is really the realm of scripting languages and I think that the way Tcl handles these issues is better.

Strip: why not trim, trimleft and trimright?

Chomp: in over 30 years of writing scripts (cross platform) I've never needed such a function and Tcl doesn't have one - and the examples just don't seem real. I'd like to understand what real problem is being solved with this function. The name is terrible as well - reminds me of Perl to which I seem to be allergic!
One thing Tcl does do is interpret \n as a newline in a platform independent way so when you read a file you always get '\n' irrespective of platform. Embedding platform-dependent line-endings in strings is poor practice - they are better handled in the i/o layer.

@ivan-pi
Copy link
Member Author

ivan-pi commented Apr 10, 2021

Thanks for bringing up the issue @sgeard.

The strip and chomp functions are based upon the functions from Ruby (a scripting language similar to Python and Perl).

Python also has a strip function, complemented by rstrip and lstrip variants.

Personally, I'm also not a fan of the name chomp. The procedures are currently under the "experimental" namespace, so we can discuss and propose changes, both in terms of functionality but also in terms of suggested usage patterns.

The reason we can't use trim is because it is already an intrinsic Fortran function which only trims on the right side.

I believe one typical usage case of these functions (in Python at least) is line-based processing of a file. One will use strip to remove the trailing new-line character, and again to remove whitespace upon splitting the string into chunks.

cc @zbeekman

@sgeard
Copy link

sgeard commented Apr 10, 2021

Thanks for moving this into an issue - I should have done that in the first place.

I'd still prefer trimright, trimleft, trimboth as names. Fortran programmers know what trim does so these functions would behave as expected.

In general I think we should have a design philosophy so that we get a cohesive whole. My suggestions would be to use an existing API like std::string from C++ or the string command from Tcl.

As I said originally I've never needed a chomp function despite decades of writing cross-platform scripts that read and write text. That some languages seem to need it suggests a design deficiency in them which it would be nice not to replicate here.

@awvwgk
Copy link
Member

awvwgk commented Apr 10, 2021

My own exposure to Tcl is limited to writing environment modules, so I can't comment on Tcl here.

As for Fortran, the trim function is already quite useful when dealing with spaces and the trailing newline case hardly ever occurs in record based IO. I usually use trim(adjustl(line)) to remove leading and trailing spaces, but this can fail for other whitespace characters like tabs.

The implemented function chomp is a generalization of trim to remove all trailing whitespace, a custom set, or substrings. It uses the name of Ruby's chomp function right now, the chosen name might not be optimal and I'm open for suggestions. Regarding trimright as alternative name I find it confusing compared to the capabilities of the trim.

@ivan-pi
Copy link
Member Author

ivan-pi commented Apr 10, 2021

That's a good point. The fact that strip removes all characters from the set " "//TAB//VT//CR//LF//FF and not just the space character justifies a different name than trim.

One alternative might be to merge the behavior of chomp and strip with a triplet strip, lstrip and rstrip, which default to all whitespace characters, but optionally accept a set or substring argument.

@sgeard
Copy link

sgeard commented Apr 11, 2021

I simply suggested trimright because trim already exists but it would make much more sense to use trim if possible.

I understand what chomp is doing but I don't understand why it's ever useful. I'd have thought the use-case is so rare it would be better left to the regex parser. Adding functions to libraries always has downstream maintenance costs so there needs to be some justification for its inclusion.

As @awvwgk says, record based i/o doesn't return newline characters and that's true in Fortran, C++, C# and Tcl. The only case I think it could be good to support is when the line itself contains non-printing characters such as with tab-separated files. For that we need a tokenizer like a split(string,set) function/method (which we might have already - I haven't looked).

@awvwgk
Copy link
Member

awvwgk commented Apr 11, 2021

Split is currently discussed at #241.

@ivan-pi
Copy link
Member Author

ivan-pi commented Apr 23, 2021

Octave has a function called deblank(s) that removes trailing whitespace and nulls from s. The function strtrim(s) removes leading and trailing whitespace.

FWIW, I find the name deblank more meaningful than chomp, even if I have yet to meet a Fortran application which requires this function. On the other hand the is_blank function counts newline and tabs as "blank" characters, making the names somewhat confusing.

@ivan-pi
Copy link
Member Author

ivan-pi commented Apr 26, 2021

Two variations on the theme of getting rid of spaces, from the Fortran String Utilities by George Benthien:

  • subroutine compact(str) This routine converts multiple spaces and tabs to single spaces and deletes control characters.
  • subroutine remove_sp(str) This routine removes spaces, tabs, and control characters in string str.

@awvwgk awvwgk added the topic: strings String processing label Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: strings String processing
Projects
None yet
Development

No branches or pull requests

3 participants