-
Notifications
You must be signed in to change notification settings - Fork 214
Add digit-separators specification v1.0 #3846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# Digit Separators | ||
|
||
Author: Lasse Nielsen, Sam Rawlins | ||
|
||
Status: In-progress | ||
|
||
Version 1.0 | ||
|
||
## Motivation | ||
|
||
To make long number literals more readable, allow authors to inject [digit | ||
group separators][] inside numbers. Examples with different possible separators: | ||
|
||
```none | ||
100 000 000 000 000 000 000 // space | ||
100,000,000,000,000,000,000 // comma | ||
100.000.000.000.000.000.000 // period | ||
100'000'000'000'000'000'000 // apostrophe (C++) | ||
100_000_000_000_000_000_000 // underscore (many programming languages). | ||
``` | ||
|
||
## Proposal | ||
|
||
### Digit separators in number literals | ||
|
||
Allow one or more `_`s between any two otherwise adjacent _digits_ of a NUMBER | ||
or HEX\_NUMBER token. The following are not digits: The leading `0x` or `0X` in | ||
HEX\_NUMBER, and any `.`, `e`, `E`, `+` or `-` in NUMBER. | ||
|
||
That means only allowing `_`s between two `0-9` digits in NUMBER and between | ||
two `0-9`,`a-f`,`A-F` digits in HEX\_NUMBER. | ||
|
||
The grammar would be changing `<DIGIT>+` to `<DIGITS>` which is then `<DIGIT>`s | ||
with optional `_`s between, and same for hex digits: | ||
|
||
```bnf | ||
<NUMBER> ::= <DIGITS> (`.' <DIGITS>)? <EXPONENT>? | ||
\alt `.' <DIGITS> <EXPONENT>? | ||
|
||
<EXPONENT> ::= (`e' | `E') (`+' | `-')? <DIGITS> | ||
|
||
<DIGITS> ::= <DIGIT> (`_'* <DIGIT>)* | ||
|
||
<HEX\_NUMBER> ::= `0x' <HEX\_DIGITS> | ||
\alt `0X' <HEX\_DIGITS> | ||
|
||
<HEX\_DIGIT> ::= `a' .. `f' | ||
\alt `A' .. `F' | ||
\alt <DIGIT> | ||
|
||
<HEX\_DIGITS> ::= <HEX\_DIGIT> (`_'* <HEX\_DIGIT>)* | ||
``` | ||
|
||
### Examples | ||
|
||
```none | ||
100__000_000__000_000__000_000 // one hundred million million millions! | ||
0x4000_0000_0000_0000 | ||
0.000_000_000_01 | ||
0x00_14_22_01_23_45 // MAC address | ||
555_123_4567 // US Phone number | ||
``` | ||
|
||
**Invalid** literals: | ||
|
||
```none | ||
100_ | ||
0x_00_14_22_01_23_45 | ||
0._000_000_000_1 | ||
100_.1 | ||
1.2e_3 | ||
``` | ||
|
||
An identifier like `_100` is a valid identifier, and `_100._100` is a valid | ||
member access. If users learn the "separator only between digits" rule quickly, | ||
this will likely not be an issue. | ||
|
||
### Why choose underscores | ||
|
||
The syntax must work even with just a single separator, so it can't be anything | ||
that can already validly seperate two expressions (excludes all infix operators | ||
and comma) and should already be part of a number literal (excludes decimal | ||
point). | ||
|
||
So, the comma and decimal point are probably never going to work, even if they | ||
are already the standard "thousands separator" in text in different parts of | ||
the world. | ||
|
||
Space separation is dangerous because it's hard to see whether it's just space, | ||
or it's an accidental tab character. If we allow spacing, should we allow | ||
arbitrary whitespace, including line terminators? If so, then this suddenly | ||
become quite dangerous. Forget a comma at the end of a line in a multiline | ||
list, and two adjacent integers are automatically combined (we already have | ||
that problem with strings). So, probably not a good choice, even if it is the | ||
preferred formatting for print text. | ||
|
||
The apostrope is also the string single-quote character. We don't currently | ||
allow adjacent numbers and strings, but if we ever do, then this syntax becomes | ||
ambiguous. It's still possible (we disambiguate by assuming it's a digit | ||
separator). It is currently used by C++ 14 as a digit group separator, so it is | ||
definitely possible. | ||
|
||
That leaves underscore, which could be the start of an identifier. Currently | ||
`100_000` would be tokenized as "integer literal 100" followed by "identifier | ||
`_000`". However, users would never write an identifier adjacent to another | ||
token that contains identifier-valid characters (unlike strings, which have | ||
clear delimiters that do not occur anywher else), so this is unlikely to happen | ||
in practice. Underscore is already used by a large number of programming | ||
languages including Java, Swift, and Python. | ||
|
||
We also want to allow multiple separators for higher-level grouping, e.g.,: | ||
|
||
```none | ||
100__000_000_000__000_000_000 | ||
``` | ||
|
||
For this purpose, the underscore extends gracefully. So does space, but has the | ||
disadvantage that it collapses when inserted into HTML, whereas `''` looks odd. | ||
|
||
### Related work | ||
|
||
* [Java digit separators](https://docs.oracle.com/javase/8/docs/technotes/guides/language/underscores-literals.html) | ||
* [Python PEP 515 - underscores in numeric literals](https://peps.python.org/pep-0515/) | ||
|
||
### Possible new lint rules | ||
|
||
There are some possible new lint rule considerations, but none of these are | ||
considered vital to the usability or general success of the feature. | ||
|
||
The feature is designed to help the readability of long numbers. But a | ||
developer can still make a mistake about where to place separators. For example: | ||
|
||
``` | ||
var one = 1_000_000; | ||
var two = 2_000_000; | ||
var three = 3_000_000; | ||
var four = 4_0000_000; // Whoops! | ||
``` | ||
|
||
If a developer uses the Dart formatter to format their code, they cannot try to | ||
vertically align the numbers with whitespace (extra space characters are | ||
removed by the formatter). So we could offer a lint rule to only place | ||
separators every three digits of a decimal number. Also possibly a similar rule | ||
for hexadecimal numbers. If a developer ever uses digit separators for a | ||
different purpose (as in separating the digits of a phone number), the rule may | ||
not prove useful. | ||
|
||
A separate lint rule could encourage _consistent_ digit separators, which | ||
triggers if the digit groups do not have the same size (except the most | ||
significant one, which can be shorter). If there are any `__` separators, the | ||
number of `_`-separated groups between them should also be the same, and | ||
repeatedly for higher numbers of `_`s. | ||
|
||
### Possible new quick fixes | ||
|
||
There are some possible new automated fix ("quick fix") considerations, but | ||
none of these are considered vital to the usability or general success of the | ||
feature. | ||
|
||
#### Unexpected underscores | ||
|
||
With the digit-separators feature, separators can be added between _digits_ of | ||
a number literal, but nowhere else. In most error cases, the unexpected | ||
underscore can be detected as such, and we can offer quick fixes to remove | ||
unexpected errors (for example, `100_`, `100_e1.2`, `100._00`). In a few cases, | ||
the intention is not as straightforward, such as `100._100`, where `_100` can | ||
be a legal name of an extension member (though the presense of such a private | ||
extension member can be detected). | ||
|
||
#### Unexpected commas | ||
|
||
The only legal digit separator that is introduced with this feature is the | ||
underscore character. If a developer attempts to use another character, for | ||
example commas, as a separator, we may be able to detect this, and offer a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't this fall into the same problem that we have when we consider There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't. Many automated fixes are offered with the understanding that they are not the only fix, and may not be the desired fix. I personally doubt we will offer this fix. But there might be some cases, like if the user pastes 100 lines of: var x1 = 1,000,000;
var x2 = 2,000,000;
var x3 = 3,000,000; into their editor, from another source text, we could offer a fix to convert the commas, because it looks more likely that they meant to write There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the same person pasted 100 lines of One thing that worries me is whether we can do the same fixes after the formatter has run. |
||
quick fix to convert the commas to underscores. | ||
|
||
### Non-breaking change | ||
|
||
This change is strictly non-breaking. The feature can be thought of as a single | ||
change from previous Dart syntax: some syntax which was previously illegal | ||
(producing compile-time errors) becomes legal. | ||
|
||
(The feature is still introduced with a [Dart language version][], so that | ||
packages that start using the feature declare that they require some new lower | ||
bound of the Dart SDK.) | ||
|
||
### Formatting | ||
|
||
As any number literal remains a single token, there are no formatting | ||
considerations. | ||
|
||
## Changelog | ||
|
||
### 1.0 | ||
|
||
- Initial version | ||
|
||
[digit group separators]: https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping | ||
[Dart language version]: https://github.com/dart-lang/language/blob/main/accepted/2.8/language-versioning/feature-specification.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another lint option could be "consistent digit separators", which triggers if the digit groups do not have the same size (except the most significant I one, which can be shorter).
If there are any
__
separators, the number of_
separated groups between them should also be the same, and repeatedly for higher numbers of_
s.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, added!