Skip to content

Commit 3d62009

Browse files
committed
Merge pull request scala#4590 from som-snytt/issue/6810
SI-6810 Disallow EOL in char literal
2 parents 3a543d6 + ab527ce commit 3d62009

File tree

5 files changed

+89
-27
lines changed

5 files changed

+89
-27
lines changed

spec/01-lexical-syntax.md

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -398,40 +398,46 @@ members of type `Boolean`.
398398
### Character Literals
399399

400400
```ebnf
401-
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
401+
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
402402
```
403403

404404
A character literal is a single character enclosed in quotes.
405-
The character is either a printable unicode character or is described
406-
by an [escape sequence](#escape-sequences).
405+
The character can be any Unicode character except the single quote
406+
delimiter or `\u000A` (LF) or `\u000D` (CR);
407+
or any Unicode character represented by either a
408+
[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
407409

408410
> ```scala
409411
> 'a' '\u0041' '\n' '\t'
410412
> ```
411413
412-
Note that `'\u000A'` is _not_ a valid character literal because
413-
Unicode conversion is done before literal parsing and the Unicode
414-
character `\u000A` (line feed) is not a printable
415-
character. One can use instead the escape sequence `'\n'` or
416-
the octal escape `'\12'` ([see here](#escape-sequences)).
414+
Note that although Unicode conversion is done early during parsing,
415+
so that Unicode characters are generally equivalent to their escaped
416+
expansion in the source text, literal parsing accepts arbitrary
417+
Unicode escapes, including the character literal `'\u000A'`,
418+
which can also be written using the escape sequence `'\n'`.
417419
418420
### String Literals
419421
420422
```ebnf
421423
stringLiteral ::="’ {stringElement} ‘"
422-
stringElement ::= printableCharNoDoubleQuote | charEscapeSeq
424+
stringElement ::= charNoDoubleQuoteOrNewline | UnicodeEscape | charEscapeSeq
423425
```
424426
425-
A string literal is a sequence of characters in double quotes. The
426-
characters are either printable unicode character or are described by
427-
[escape sequences](#escape-sequences). If the string literal
428-
contains a double quote character, it must be escaped,
429-
i.e. `"\""`. The value of a string literal is an instance of
430-
class `String`.
427+
A string literal is a sequence of characters in double quotes.
428+
The characters can be any Unicode character except the double quote
429+
delimiter or `\u000A` (LF) or `\u000D` (CR);
430+
or any Unicode character represented by either a
431+
[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
432+
433+
If the string literal contains a double quote character, it must be escaped using
434+
`"\""`.
435+
436+
The value of a string literal is an instance of class `String`.
431437

432438
> ```scala
433-
> "Hello,\nWorld!"
434-
> "This string contains a \" character."
439+
> "Hello, world!\n"
440+
> "\"Hello,\" replied the world."
435441
> ```
436442
437443
#### Multi-Line String Literals
@@ -443,11 +449,10 @@ multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
443449
444450
A multi-line string literal is a sequence of characters enclosed in
445451
triple quotes `""" ... """`. The sequence of characters is
446-
arbitrary, except that it may contain three or more consuctive quote characters
447-
only at the very end. Characters
448-
must not necessarily be printable; newlines or other
449-
control characters are also permitted. Unicode escapes work as everywhere else, but none
450-
of the escape sequences [here](#escape-sequences) are interpreted.
452+
arbitrary, except that it may contain three or more consecutive quote characters
453+
only at the very end. In particular, embedded newlines
454+
are permitted. Unicode escapes work as everywhere else, but none
455+
of the [escape sequences](#escape-sequences) are interpreted.
451456
452457
> ```scala
453458
> """the present string

spec/13-syntax-summary.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,12 @@ floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
5757
5858
booleanLiteral ::= ‘true’ | ‘false’
5959
60-
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
60+
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
6161
6262
stringLiteral ::= ‘"’ {stringElement} ‘"’
6363
| ‘"""’ multiLineChars ‘"""’
64-
stringElement ::= (printableChar except ‘"’)
64+
stringElement ::= charNoDoubleQuoteOrNewline
65+
| UnicodeEscape
6566
| charEscapeSeq
6667
multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
6768

src/compiler/scala/tools/nsc/ast/parser/Scanners.scala

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -515,7 +515,7 @@ trait Scanners extends ScannersCommon {
515515
charLitOr(getIdentRest)
516516
else if (isOperatorPart(ch) && (ch != '\\'))
517517
charLitOr(getOperatorRest)
518-
else {
518+
else if (!isAtEnd && (ch != SU && ch != CR && ch != LF || isUnicodeEscape)) {
519519
getLitChar()
520520
if (ch == '\'') {
521521
nextChar()
@@ -525,6 +525,8 @@ trait Scanners extends ScannersCommon {
525525
syntaxError("unclosed character literal")
526526
}
527527
}
528+
else
529+
syntaxError("unclosed character literal")
528530
}
529531
fetchSingleQuote()
530532
case '.' =>
@@ -690,7 +692,7 @@ trait Scanners extends ScannersCommon {
690692

691693
private def unclosedStringLit(): Unit = syntaxError("unclosed string literal")
692694

693-
private def getRawStringLit(): Unit = {
695+
@tailrec private def getRawStringLit(): Unit = {
694696
if (ch == '\"') {
695697
nextRawChar()
696698
if (isTripleQuote()) {
@@ -707,7 +709,7 @@ trait Scanners extends ScannersCommon {
707709
}
708710
}
709711

710-
@scala.annotation.tailrec private def getStringPart(multiLine: Boolean): Unit = {
712+
@tailrec private def getStringPart(multiLine: Boolean): Unit = {
711713
def finishStringPart() = {
712714
setStrVal()
713715
token = STRINGPART

test/files/neg/t6810.check

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
t6810.scala:4: error: unclosed character literal
2+
val y = '
3+
^
4+
t6810.scala:5: error: unclosed character literal
5+
' // but not embedded EOL sequences not represented as escapes
6+
^
7+
t6810.scala:9: error: unclosed string literal
8+
val Y = "
9+
^
10+
t6810.scala:10: error: unclosed string literal
11+
" // obviously not
12+
^
13+
t6810.scala:20: error: unclosed quoted identifier
14+
val `
15+
^
16+
t6810.scala:21: error: unclosed quoted identifier
17+
` = EOL // not raw string literals aka triple-quoted, multiline strings
18+
^
19+
t6810.scala:24: error: unclosed character literal
20+
val b = '
21+
^
22+
t6810.scala:25: error: unclosed character literal
23+
' // CR seen as EOL by scanner
24+
^
25+
t6810.scala:24: error: '=' expected but ';' found.
26+
val b = '
27+
^
28+
9 errors found

test/files/neg/t6810.scala

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
2+
trait t6810 {
3+
val x = '\u000A' // char literals accept arbitrary unicode escapes
4+
val y = '
5+
' // but not embedded EOL sequences not represented as escapes
6+
val z = '\n' // normally, expect this escape
7+
8+
val X = "\u000A" // it's the same as ordinary string literals
9+
val Y = "
10+
" // obviously not
11+
val Z = "\n" // normally, expect this escape
12+
13+
val A = """
14+
""" // which is what these are for
15+
val B = s"""
16+
""" // or the same for interpolated strings
17+
18+
import scala.compat.Platform.EOL
19+
val `\u000A` = EOL // backquoted identifiers are arbitrary string literals
20+
val `
21+
` = EOL // not raw string literals aka triple-quoted, multiline strings
22+
23+
val a = '\u000D' // similar treatment of CR
24+
val b = '' // CR seen as EOL by scanner
25+
val c = '\r' // traditionally
26+
}

0 commit comments

Comments
 (0)