Skip to content

Commit 73c0786

Browse files
committed
RFC: Number value literal lookahead restrictions
This RFC proposes adding a lookahead restriction to the IntValue and FloatValue lexical grammars to not allow following a number with a letter. **Problem:** Currently there are some language ambiguities and underspecification for lexing numbers which each implementation has handled slightly differently. Because commas are optional and white space isn't required between tokens, these two snippets are equivalent: `[123, abc]`, `[123abc]`. This may be confusing to read, but it should parse correctly. However the opposite is not true, since digits may belong in a Name, the following two are *not* equivalent: `[abc, 123]`, `[abc123]`. This could lead to mistakes. Ambiguity and underspecification enter when the Name starts with "e", since "e" indicats the beginning of an exponent in a FloatValue. `123efg` is a lexical error in GraphQL.js which greedily starts to lex a FloatValue when it encounters the "e", however you might also expect it to validly lex (`123`, `efg`) and some implementations might do this. Further, other languages offer variations of numeric literals which GraphQL does not support, such as hexidecimal literals. The input `0x1F` properly lexes as (`0`, `x`, `1`, `F`) however this is very likely a confusing syntax error. A similar issue exists for some languages which allow underscores in numbers for readability, `1_000` lexes a `1` and `_` but fails when `000` is not a valid number. **Proposed Solution:** Add a lookahead restriction to IntValue and FloatValue to disallow any NameStart character (including letters and `_`) to follow. This makes it clear that `1e5` can only possibly be one FloatValue and not three tokens, makes lexer errors specified clearly to remove ambiguity, and provides clear errors for mistaken input. **Precedent** Javascript applies this same restriction for similar reasons, I believe originally to produce an early error if C-style typed literals were used in a Javascript program. https://www.ecma-international.org/ecma-262/10.0/index.html#sec-literals-numeric-literals **Cost of change** While this is *technically* a breaking change to the language grammar, it seeks to restrict cases that are almost certainly already producing either syntax or validation errors. This is different from the current implementation of GraphQL.js and I believe other parsers, and will require minor implementation updates.
1 parent 27c2602 commit 73c0786

File tree

2 files changed

+15
-10
lines changed

2 files changed

+15
-10
lines changed

spec/Appendix B -- Grammar Summary.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Letter :: one of
6868
Digit :: one of
6969
`0` `1` `2` `3` `4` `5` `6` `7` `8` `9`
7070

71-
IntValue :: IntegerPart [lookahead != {Digit, `.`}]
71+
IntValue :: IntegerPart [lookahead != {Digit, NameStart, `.`}]
7272

7373
IntegerPart ::
7474
- NegativeSign? 0
@@ -79,9 +79,9 @@ NegativeSign :: -
7979
NonZeroDigit :: Digit but not `0`
8080

8181
FloatValue ::
82-
- IntegerPart FractionalPart ExponentPart [lookahead != Digit]
83-
- IntegerPart FractionalPart [lookahead != Digit]
84-
- IntegerPart ExponentPart [lookahead != Digit]
82+
- IntegerPart FractionalPart ExponentPart [lookahead != {Digit, NameStart}]
83+
- IntegerPart FractionalPart [lookahead != {Digit, NameStart}]
84+
- IntegerPart ExponentPart [lookahead != {Digit, NameStart}]
8585

8686
FractionalPart :: . Digit+
8787

spec/Section 2 -- Language.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -720,7 +720,7 @@ specified as a variable. List and inputs objects may also contain variables (unl
720720

721721
### Int Value
722722

723-
IntValue :: IntegerPart [lookahead != {Digit, `.`}]
723+
IntValue :: IntegerPart [lookahead != {Digit, NameStart, `.`}]
724724

725725
IntegerPart ::
726726
- NegativeSign? 0
@@ -732,16 +732,18 @@ NonZeroDigit :: Digit but not `0`
732732

733733
An Int number is specified without a decimal point or exponent (ex. `1`).
734734

735-
An {IntValue} must not be followed by a {`.`}. If a {`.`} follows the token must
736-
only be interpreted as a {FloatValue}.
735+
An {IntValue} must not be followed by a {`.`} or {NameStart}. If a {`.`}, {`e`},
736+
or {`E`} follows the token must only be interpreted as a {FloatValue}.
737+
No other letter can follow. For example the sequence `0x123` has no valid
738+
lexical representation.
737739

738740

739741
### Float Value
740742

741743
FloatValue ::
742-
- IntegerPart FractionalPart ExponentPart [lookahead != Digit]
743-
- IntegerPart FractionalPart [lookahead != Digit]
744-
- IntegerPart ExponentPart [lookahead != Digit]
744+
- IntegerPart FractionalPart ExponentPart [lookahead != {Digit, NameStart}]
745+
- IntegerPart FractionalPart [lookahead != {Digit, NameStart}]
746+
- IntegerPart ExponentPart [lookahead != {Digit, NameStart}]
745747

746748
FractionalPart :: . Digit+
747749

@@ -754,6 +756,9 @@ Sign :: one of + -
754756
A Float number includes either a decimal point (ex. `1.0`) or an exponent
755757
(ex. `1e50`) or both (ex. `6.0221413e23`).
756758

759+
A {FloatValue} must not be followed by a {NameStart}. For example the sequence
760+
`0x1.2p3` has no valid lexical representation.
761+
757762

758763
### Boolean Value
759764

0 commit comments

Comments
 (0)