Treat `include_str!` like it produces a raw string. #143077

nnethercote · 2025-06-27T06:05:50Z

There are two reasons to do this:

It just makes sense. The contents of an included file is just like the contents of a raw string literal, because string escapes like \" and \x61 don't get special treatment.
We can avoid escaping it when putting it into token::Lit::StrRaw, unlike token::Lit::Str. On a tiny test program that included an 80 MiB file, this reduced compile time from 2.2s to 1.0s.

The change is detectable from proc macros that use to_string on tokens, as the change to the expand-expr.rs test indicates. But this kind of change is allowable, and it seems very unlikely to cause problems in practice.

r? @petrochenkov

There are two reasons to do this: - It just makes sense. The contents of an included file is just like the contents of a raw string literal, because string escapes like `\"` and `\x61` don't get special treatment. - We can avoid escaping it when putting it into `token::Lit::StrRaw`, unlike `token::Lit::Str`. On a tiny test program that included an 80 MiB file, this reduced compile time from 2.2s to 1.0s. The change is detectable from proc macros that use `to_string` on tokens, as the change to the `expand-expr.rs` test indicates. But this kind of change is allowable, and it seems very unlikely to cause problems in practice.

nnethercote · 2025-06-27T06:08:00Z

If we introduce ByteSymbol (#141875) then we could do the same thing for include_bytes!: treat it as producing a raw byte string instead of a byte string. And then ExprKind::IncludedBytes could be removed, which would be a nice simplicity win. Plus several differences between the handling of strings and byte strings would be eliminated.

nnethercote · 2025-06-27T06:47:15Z

If we introduce ByteSymbol (#141875) then we could do the same thing for include_bytes!: treat it as producing a raw byte string instead of a byte string.

Actually, that doesn't work. Because token::Lit::symbol is a Symbol, i.e. UTF-8, and the output of include_bytes! can be non-UTF-8. So either we need to escape the output (slow) or have a special representation (ExprKind::IncludedBytes). Annoying.

WaffleLapkin · 2025-06-27T12:32:02Z

compiler/rustc_expand/src/build.rs

+    pub fn expr_str_raw(&self, span: Span, s: Symbol) -> P<ast::Expr> {
+        let lit = token::Lit::new(token::StrRaw(0), s, None);
+        self.expr(span, ast::ExprKind::Lit(lit))
+    }


I can only assume 0 means "with 0 #s". But that would produce an invalid token, right? I.e. if the file being included contains ", this would break. Moreover for any number of #s you can construct a file which would break it ("####...)

fn desugar_doc_comments has some logic that counts how many hashes need to be added to keep the literal in #[doc = r"my arbitrary string from a sugared doc comment"] well formed.

Ah, forgot to mention, for doc comments the hash counter can overflow too.

petrochenkov · 2025-06-27T15:08:49Z

Technically, token kind changes are user-observable.

macro_rules! expect_nonraw {
    ("a") => {}
}

macro_rules! expect_raw {
    (r"a") => {}
}

expect_nonraw!("a");
expect_nonraw!(r"a"); // ERROR no rules expected `r"a"`

expect_raw!(r"a");
expect_raw!("a"); // ERROR no rules expected `"a"`

fn main() {}

petrochenkov · 2025-06-27T16:14:16Z

Apparently my memory is failing me.
I clearly remember some PR from @dtolnay (?) doing a similar change from a regular string token to raw string token (or vice versa) somewhere in doc comment logic for macros.
However, I cannot find anything like this in git history.

petrochenkov · 2025-06-27T16:29:17Z

include_str has one more interesting detail, compared to raw string literals, it doesn't normalize \r\n into \n (#63681).
So there are other issues with pretty-printing it and then reading it back, besides the number of #s.

petrochenkov · 2025-06-27T16:48:49Z

Neither normal nor raw strings really suit perfectly for representing included strings and doc comments.

I'd rather introduce a new literal kind "undelimited raw string" for all this stuff, if not the concerns about token matching and compatibility (#143077 (comment)).
Or maybe it's ok if such literals cannot match anything user-written.

If we discern the undelimited literals from regular raw strings (with StrRaw(Option<u8> = None) or StrRawUndelimited), then we can apply best efforts at preserving the contents when pretty-printing them and emit the cooked forms for literals with \r\n or too many #s.

petrochenkov · 2025-06-27T16:50:42Z

Or maybe it's ok if such literals cannot match anything user-written.

We can perhaps crater a change like this.
Doc comments, include_str, include_bytes -> new literal kinds.

nnethercote · 2025-06-29T22:10:26Z

This is clearly more complicated than I realised. The alternative suggestions might be worthwhile, but they don't have to happen in this PR.

rustbot assigned petrochenkov Jun 27, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 27, 2025

WaffleLapkin reviewed Jun 27, 2025

View reviewed changes

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 27, 2025

nnethercote closed this Jun 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Treat `include_str!` like it produces a raw string. #143077

Treat `include_str!` like it produces a raw string. #143077

Uh oh!

nnethercote commented Jun 27, 2025

Uh oh!

nnethercote commented Jun 27, 2025

Uh oh!

nnethercote commented Jun 27, 2025

Uh oh!

WaffleLapkin Jun 27, 2025

Uh oh!

petrochenkov Jun 27, 2025

Uh oh!

petrochenkov Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025 •

edited

Loading

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

nnethercote commented Jun 29, 2025

Uh oh!

Uh oh!

Treat include_str! like it produces a raw string. #143077

Treat include_str! like it produces a raw string. #143077

Uh oh!

Conversation

nnethercote commented Jun 27, 2025

Uh oh!

nnethercote commented Jun 27, 2025

Uh oh!

nnethercote commented Jun 27, 2025

Uh oh!

WaffleLapkin Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

petrochenkov Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

petrochenkov Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

petrochenkov commented Jun 27, 2025

Uh oh!

nnethercote commented Jun 29, 2025

Uh oh!

Uh oh!

Treat `include_str!` like it produces a raw string. #143077

Treat `include_str!` like it produces a raw string. #143077

petrochenkov commented Jun 27, 2025 •

edited

Loading