Skip to content

Treat include_str! like it produces a raw string. #143077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nnethercote
Copy link
Contributor

There are two reasons to do this:

  • It just makes sense. The contents of an included file is just like the contents of a raw string literal, because string escapes like \" and \x61 don't get special treatment.

  • We can avoid escaping it when putting it into token::Lit::StrRaw, unlike token::Lit::Str. On a tiny test program that included an 80 MiB file, this reduced compile time from 2.2s to 1.0s.

The change is detectable from proc macros that use to_string on tokens, as the change to the expand-expr.rs test indicates. But this kind of change is allowable, and it seems very unlikely to cause problems in practice.

r? @petrochenkov

There are two reasons to do this:

- It just makes sense. The contents of an included file is just like the
  contents of a raw string literal, because string escapes like `\"` and
  `\x61` don't get special treatment.

- We can avoid escaping it when putting it into `token::Lit::StrRaw`,
  unlike `token::Lit::Str`. On a tiny test program that included an 80
  MiB file, this reduced compile time from 2.2s to 1.0s.

The change is detectable from proc macros that use `to_string` on
tokens, as the change to the `expand-expr.rs` test indicates. But this
kind of change is allowable, and it seems very unlikely to cause
problems in practice.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 27, 2025
@nnethercote
Copy link
Contributor Author

If we introduce ByteSymbol (#141875) then we could do the same thing for include_bytes!: treat it as producing a raw byte string instead of a byte string. And then ExprKind::IncludedBytes could be removed, which would be a nice simplicity win. Plus several differences between the handling of strings and byte strings would be eliminated.

@nnethercote
Copy link
Contributor Author

If we introduce ByteSymbol (#141875) then we could do the same thing for include_bytes!: treat it as producing a raw byte string instead of a byte string.

Actually, that doesn't work. Because token::Lit::symbol is a Symbol, i.e. UTF-8, and the output of include_bytes! can be non-UTF-8. So either we need to escape the output (slow) or have a special representation (ExprKind::IncludedBytes). Annoying.

Comment on lines +448 to +451
pub fn expr_str_raw(&self, span: Span, s: Symbol) -> P<ast::Expr> {
let lit = token::Lit::new(token::StrRaw(0), s, None);
self.expr(span, ast::ExprKind::Lit(lit))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only assume 0 means "with 0 #s". But that would produce an invalid token, right? I.e. if the file being included contains ", this would break. Moreover for any number of #s you can construct a file which would break it ("####...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn desugar_doc_comments has some logic that counts how many hashes need to be added to keep the literal in #[doc = r"my arbitrary string from a sugared doc comment"] well formed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, forgot to mention, for doc comments the hash counter can overflow too.

@petrochenkov
Copy link
Contributor

Technically, token kind changes are user-observable.

macro_rules! expect_nonraw {
    ("a") => {}
}

macro_rules! expect_raw {
    (r"a") => {}
}

expect_nonraw!("a");
expect_nonraw!(r"a"); // ERROR no rules expected `r"a"`

expect_raw!(r"a");
expect_raw!("a"); // ERROR no rules expected `"a"`

fn main() {}

@petrochenkov
Copy link
Contributor

petrochenkov commented Jun 27, 2025

Apparently my memory is failing me.
I clearly remember some PR from @dtolnay (?) doing a similar change from a regular string token to raw string token (or vice versa) somewhere in doc comment logic for macros.
However, I cannot find anything like this in git history.

@petrochenkov
Copy link
Contributor

include_str has one more interesting detail, compared to raw string literals, it doesn't normalize \r\n into \n (#63681).
So there are other issues with pretty-printing it and then reading it back, besides the number of #s.

@petrochenkov
Copy link
Contributor

Neither normal nor raw strings really suit perfectly for representing included strings and doc comments.

I'd rather introduce a new literal kind "undelimited raw string" for all this stuff, if not the concerns about token matching and compatibility (#143077 (comment)).
Or maybe it's ok if such literals cannot match anything user-written.

If we discern the undelimited literals from regular raw strings (with StrRaw(Option<u8> = None) or StrRawUndelimited), then we can apply best efforts at preserving the contents when pretty-printing them and emit the cooked forms for literals with \r\n or too many #s.

@petrochenkov
Copy link
Contributor

Or maybe it's ok if such literals cannot match anything user-written.

We can perhaps crater a change like this.
Doc comments, include_str, include_bytes -> new literal kinds.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 27, 2025
@nnethercote
Copy link
Contributor Author

This is clearly more complicated than I realised. The alternative suggestions might be worthwhile, but they don't have to happen in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants