Skip to content

Corner cases where getopt behavior is not mimiced: -- or --help as string values #601

Open
@rmunn

Description

@rmunn

The goal of this library is to mimic the behavior of getopt, but there are a few corner cases where this library behaves differently than getopt would: in the handling of -- or --help when they are the value of a string parameter.

How getopt behaves

First, an illustration of how getopt works with the particular corner case I'm demonstrating. Let's look at the standard gzip and gunzip tools found with any Linux distribution. They take many options, but one of them is --suffix (or -S for short); this lets you specify a different suffix than the standard .gz for the compressed file. E.g. if you have a README.md file in the current directory, then gzip -S .compressed README.md will create a README.md.compressed file instead of README.md.gz.

Now, what do you think will happen if I run this command?

gzip -S -- README.md

The correct answer is that it will create a compressed file named README.md-- in the current directory. Because the string -- was specified immediately after an option that takes a string value, it was processed as the value for that option (the --suffix option), and so gzip created a file with a -- suffix instead of .gz. Now look at these three examples:

1. gzip -- --help
2. gzip -S -- --help
3. gzip -S -- -- --help

What do you think these will do? Answer:

  1. This will compress a file named --help in the current directory, and create a file named --help.gz.
  2. This will print the help text, and do nothing else.
  3. This will compress a file named --help in the current directory, and create a file named --help--.

Why did gzip -S -- --help print the help text? Because -- was the value for the -S option, and so it was not treated as the "stop processing options now" marker. Then after the -S option was fully processed, the only remaining options were --help. Since --help was encountered, gzip displayed the help screen and did nothing else.

With the gzip -S -- -- --help line, OTOH, the first -- became the value for the -S option. Then the second -- was processed as an option, and had the "stop processing options now" meaning. So the --help text was treated as a value, and so it looked for a file named --help to compress. And since I specified that the compressed suffix should be --, the compressed file was named --help--.

What CommandLine does

The current way CommandLine works is to call a preprocessor function to look for any -- options and, if found, mark anything found after them as a value. But this would mean that in the gzip -S -- --help example, where the correct getopt-mimicing behavior would be to print the help text, CommandLine will instead return an error saying that -S needed a value and didn't get one.

This corner case actually shows a fundamental difference between the behavior of CommandLine and the behavior of getopt. CommandLine uses a tokenizer to parse the command-line arguments and decide, based on the presence of - or -- at the front, to treat them as Name tokens or Value tokens. But if you read the getopt source code and figure out what it's actually doing, it's parsing one argument at a time, deciding whether that argument needs a value, and then if a value is needed, it swallows the next argument without further processing. Which is why you can pass -- as the suffix in gzip, and it will happily accept that.

What CommandLine should do

The tokenizer, instead of processing all the arguments at once and deciding whether they're names or values, should process each argument one at a time. Then the decision tree should look like:

  • Is this option exactly -- and EnableDashDash is true? Then stop processing; the rest of the arguments are all values.
  • Is this option exactly -- and EnableDashDash is false? Then it is the value --; continue processing the next argument.
  • Does this option start with -- and contain an equals sign? Then split it into two tokens, the part before the = is the name, and the part after the equals is the value. (Split at the first equals sign; any equals signs after that point would become part of the value).
  • Does this option start with -- and not contain an equals sign? Then we look at the list of option longnames that the tokenizer was given:
    • Name matches a boolean option: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed).
    • NEW FEATURE: Name matches an int option and the option attribute has AllowMultiple=true: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed). (This allows for things like -v or --verbose to be passed multiple times, like -vvv, which the parser will turn into Verbose=3 in the final options instance.)
    • Name matches an option that's neither of the two cases above (boolean or int with AllowMultiple): this is a name token, and the next argument is a value token no matter what it is. "Swallow" the next argument, and resume tokenizing with the argument after next.
  • Does the option start with - and contain only letters that match shortnames? Split it into multiple shortnames. (I.e., -lR would become Name("l"), Name("R") if there are both -l and -R options).
  • Does the option start with - and its first letter matches a shortname, but the rest does not? Split it into first letter & rest, and that's two tokens: Name(first letter) and Value(rest).
  • Does the option start with - and have only one letter? Then it's a shortname, and we look at the type of the option with that shortname:
    • As above, if boolean, then don't swallow the next argument.
    • NEW FEATURE: As above, if int with AllowMultiple, then don't swallow the next argument.
    • As above, if other type, then swallow the next argument (WHATEVER it is) and treat it as a value.

Conclusion

Unfortunately, if the goal of getopt compatibility is to be achieved, a big rewrite of the guts of CommandLine's tokenizer and parser will be needed, so this is a big job. But if we want to mimic the behavior of getopt, then that's what will be needed. And the behavior I described above is how getopt works.

Also unfortunately, this is probably going to be a breaking change, so it might end up requiring a 3.0 version number. Because some people might be very surprised when --stringoption --booloption ends up being parsed with --booloption as the string value of --stringoption; they would probably have come to expect that to produce a MissingValueOptionError for --stringoption. But surprise or not, the correct way to handle that is for --booloption to be the string value of --stringoption in that example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions