Skip to content

Should multi-line raw strings preserve line encoding? #23562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DartBot opened this issue Jun 1, 2015 · 4 comments
Closed

Should multi-line raw strings preserve line encoding? #23562

DartBot opened this issue Jun 1, 2015 · 4 comments
Assignees
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@DartBot
Copy link

DartBot commented Jun 1, 2015

This issue was originally filed by [email protected]


Running:

main() {
  String s1 = 'foo\r\n';
  String s2 = r'''foo^M
''';
  print(s1.codeUnits);
  print(s2.codeUnits);
}

Will output:
[102, 111, 111, 13, 10]
[102, 111, 111, 10]

Before running, replace ^M by an actual carriage return. In Emacs insert a carriage return with C-q RET
Compiling with dart2js and running with d8 produces the expected output:
[102, 111, 111, 13, 10]
[102, 111, 111, 13, 10]

@DartBot DartBot added Type-Defect area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Jun 1, 2015
@iposva-google iposva-google added Triaged and removed New labels Jun 3, 2015
@mhausner
Copy link
Contributor

mhausner commented Jun 8, 2015

That would mean that a raw string can change depending on which platform you save the source code. Save it on Windows and you get \r\n on every line ending. Save the same file on Linux and you'll get a string with \n at the end of every line. That seems less than ideal.

@mhausner mhausner added area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). and removed area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Jun 8, 2015
@mhausner mhausner assigned gbracha and unassigned mhausner Jun 8, 2015
@mhausner
Copy link
Contributor

mhausner commented Jun 8, 2015

Gilad, could you clarify whether raw strings have to preserve the platform-specific line ending encoding? The spec does not mention this issue other than stating that source is in UTF-8, which leaves line endings platform-dependent. For raw strings, the spec just says that escape sequences and interpolations are not processed.

Thank you.

@kevmoo kevmoo added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) and removed triaged labels Mar 1, 2016
@munificent munificent changed the title VM strips carriage returns from raw strings Should multi-line raw strings preserve line encoding? Dec 14, 2016
@peter-ahe-google
Copy link
Contributor

Would it be possible to get a resolution on this? The VM and dart2js disagrees, so it's hard for Fasta to produce the correct behavior. Currently, we've implemented that CRLF is translated to LF which I agree is the least surprising behavior.

@lrhn
Copy link
Member

lrhn commented Sep 28, 2017

I have updated the specification in the repository (CL: https://codereview.chromium.org/2665613003).

Multi-line strings must not preserve the physical line ending. Any line terminating sequence (CR+LF, CR-not-followed-by-LF or LF-not-preceded-by-CR) represents a line-break in the multi-line string, and a line-break adds a single newline character to the resulting string (unless it's the first one and it's only preceded by whitespace characters - I haven't gotten rid of the "backslash" exception, but I still want to).

A CR+BACKSLASH+LF is not a CR+LF sequence. I'm not sure exactly what BACKLASH+CR or BACKSLASH+LF means, though (BACKSLASH+CR+LF for that matter). It might need looking into, especially if it's at the end of a first whitespace-only line. I'd be happy to make it a compile-time error to put a backslash before a CR or LF.
As written, it's not allowed, the only thing that is allowed is backslashes before whitespace before the line terminator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

7 participants