Skip to content

Commit ab527ce

Browse files
committed
SI-6810 Spec reflects literal parsing literally
Emphasize that literal parsing accepts Unicode escapes as if they were escaped. In particular, a newline represented by its Unicode escape does not terminate the line in the middle of a literal.
1 parent aad7c67 commit ab527ce

File tree

2 files changed

+30
-24
lines changed

2 files changed

+30
-24
lines changed

spec/01-lexical-syntax.md

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -398,40 +398,46 @@ members of type `Boolean`.
398398
### Character Literals
399399

400400
```ebnf
401-
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
401+
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
402402
```
403403

404404
A character literal is a single character enclosed in quotes.
405-
The character is either a printable unicode character or is described
406-
by an [escape sequence](#escape-sequences).
405+
The character can be any Unicode character except the single quote
406+
delimiter or `\u000A` (LF) or `\u000D` (CR);
407+
or any Unicode character represented by either a
408+
[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
407409

408410
> ```scala
409411
> 'a' '\u0041' '\n' '\t'
410412
> ```
411413
412-
Note that `'\u000A'` is _not_ a valid character literal because
413-
Unicode conversion is done before literal parsing and the Unicode
414-
character `\u000A` (line feed) is not a printable
415-
character. One can use instead the escape sequence `'\n'` or
416-
the octal escape `'\12'` ([see here](#escape-sequences)).
414+
Note that although Unicode conversion is done early during parsing,
415+
so that Unicode characters are generally equivalent to their escaped
416+
expansion in the source text, literal parsing accepts arbitrary
417+
Unicode escapes, including the character literal `'\u000A'`,
418+
which can also be written using the escape sequence `'\n'`.
417419
418420
### String Literals
419421
420422
```ebnf
421423
stringLiteral ::="’ {stringElement} ‘"
422-
stringElement ::= printableCharNoDoubleQuote | charEscapeSeq
424+
stringElement ::= charNoDoubleQuoteOrNewline | UnicodeEscape | charEscapeSeq
423425
```
424426
425-
A string literal is a sequence of characters in double quotes. The
426-
characters are either printable unicode character or are described by
427-
[escape sequences](#escape-sequences). If the string literal
428-
contains a double quote character, it must be escaped,
429-
i.e. `"\""`. The value of a string literal is an instance of
430-
class `String`.
427+
A string literal is a sequence of characters in double quotes.
428+
The characters can be any Unicode character except the double quote
429+
delimiter or `\u000A` (LF) or `\u000D` (CR);
430+
or any Unicode character represented by either a
431+
[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
432+
433+
If the string literal contains a double quote character, it must be escaped using
434+
`"\""`.
435+
436+
The value of a string literal is an instance of class `String`.
431437

432438
> ```scala
433-
> "Hello,\nWorld!"
434-
> "This string contains a \" character."
439+
> "Hello, world!\n"
440+
> "\"Hello,\" replied the world."
435441
> ```
436442
437443
#### Multi-Line String Literals
@@ -443,11 +449,10 @@ multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
443449
444450
A multi-line string literal is a sequence of characters enclosed in
445451
triple quotes `""" ... """`. The sequence of characters is
446-
arbitrary, except that it may contain three or more consuctive quote characters
447-
only at the very end. Characters
448-
must not necessarily be printable; newlines or other
449-
control characters are also permitted. Unicode escapes work as everywhere else, but none
450-
of the escape sequences [here](#escape-sequences) are interpreted.
452+
arbitrary, except that it may contain three or more consecutive quote characters
453+
only at the very end. In particular, embedded newlines
454+
are permitted. Unicode escapes work as everywhere else, but none
455+
of the [escape sequences](#escape-sequences) are interpreted.
451456
452457
> ```scala
453458
> """the present string

spec/13-syntax-summary.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,12 @@ floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
5757
5858
booleanLiteral ::= ‘true’ | ‘false’
5959
60-
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
60+
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
6161
6262
stringLiteral ::= ‘"’ {stringElement} ‘"’
6363
| ‘"""’ multiLineChars ‘"""’
64-
stringElement ::= (printableChar except ‘"’)
64+
stringElement ::= charNoDoubleQuoteOrNewline
65+
| UnicodeEscape
6566
| charEscapeSeq
6667
multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
6768

0 commit comments

Comments
 (0)