| 
 | 1 | +- Feature Name: `c_str_literal`  | 
 | 2 | +- Start Date: 2022-11-15  | 
 | 3 | +- RFC PR: [rust-lang/rfcs#3348](https://github.com/rust-lang/rfcs/pull/3348)  | 
 | 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)  | 
 | 5 | + | 
 | 6 | +# Summary  | 
 | 7 | +[summary]: #summary  | 
 | 8 | + | 
 | 9 | +`c"…"` string literals.  | 
 | 10 | + | 
 | 11 | +# Motivation  | 
 | 12 | +[motivation]: #motivation  | 
 | 13 | + | 
 | 14 | +Looking at the [amount of `cstr!()` invocations just on GitHub](https://cs.github.com/?scopeName=All+repos&scope=&q=cstr%21+lang%3Arust) it seems like C string literals  | 
 | 15 | +are a widely used feature. Implementing `cstr!()` as a `macro_rules` or `proc_macro` requires non-trivial code to get it completely right (e.g. refusing embedded nul bytes),  | 
 | 16 | +and is still less flexible than it should be (e.g. in terms of accepted escape codes).  | 
 | 17 | + | 
 | 18 | +In Rust 2021, we reserved prefixes for (string) literals, so let's make use of that.  | 
 | 19 | + | 
 | 20 | +# Guide-level explanation  | 
 | 21 | +[guide-level-explanation]: #guide-level-explanation  | 
 | 22 | + | 
 | 23 | +`c"abc"` is a [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). A nul byte (`b'\0'`) is appended to it in memory and the result is a `&CStr`.  | 
 | 24 | + | 
 | 25 | +All escape codes and characters accepted by `""` and `b""` literals are accepted, except the nul byte (`\0`).  | 
 | 26 | +So, both UTF-8 and non-UTF-8 data can co-exist in a C string. E.g. `c"hello\x80我叫\u{1F980}"`.  | 
 | 27 | + | 
 | 28 | +The raw string literal variant is prefixed with `cr`. For example, `cr"\"` and `r##"Hello "world"!"##`. (Just like `r""` and `br""`.)  | 
 | 29 | + | 
 | 30 | +# Reference-level explanation  | 
 | 31 | +[reference-level-explanation]: #reference-level-explanation  | 
 | 32 | + | 
 | 33 | +Two new [string literal types](https://doc.rust-lang.org/reference/tokens.html#characters-and-strings): `c"…"` and `cr#"…"#`.  | 
 | 34 | + | 
 | 35 | +Accepted escape codes: [Quote](https://doc.rust-lang.org/reference/tokens.html#quote-escapes) & [Unicode](https://doc.rust-lang.org/reference/tokens.html#unicode-escapes) & [Byte](https://doc.rust-lang.org/reference/tokens.html#byte-escapes).  | 
 | 36 | + | 
 | 37 | +Unicode characters are accepted and encoded as UTF-8. That is, `c"🦀"`, `c"\u{1F980}"` and `c"\xf0\x9f\xa6\x80"` are all accepted and equivalent.  | 
 | 38 | + | 
 | 39 | +The type of the expression is [`&core::ffi::CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). So, the `CStr` type will have to become a lang item.  | 
 | 40 | + | 
 | 41 | +Interactions with string related macros:  | 
 | 42 | + | 
 | 43 | +- The [`concat` macro](https://doc.rust-lang.org/stable/std/macro.concat.html) will _not_ accept these literals, just like it doesn't accept byte string literals.  | 
 | 44 | +- The [`format_args` macro](https://doc.rust-lang.org/stable/std/macro.format_args.html) will _not_ accept such a literal as the format string, just like it doesn't accept a byte string literal.  | 
 | 45 | + | 
 | 46 | +(This might change in the future. E.g. `format_args!(c"…")` would be cool, but that would require generalizing the macro and `fmt::Arguments` to work for other kinds of strings. (Ideally also for `b"…"`.))  | 
 | 47 | + | 
 | 48 | +# Rationale and alternatives  | 
 | 49 | +[rationale-and-alternatives]: #rationale-and-alternatives  | 
 | 50 | + | 
 | 51 | +* No `c""` literal, but just a `cstr!()` macro. (Possibly as part of the standard library.)  | 
 | 52 | + | 
 | 53 | +  This requires [complicated machinery](https://github.com/rust-lang/rust/pull/101607/files) to implement correctly.  | 
 | 54 | + | 
 | 55 | +  The trivial implementation of using `concat!($s, "\0")` is problematic for several reasons, including non-string input and embedded nul bytes.  | 
 | 56 | +  (The unstable `concat_bytes!()` solves some of the problems.)  | 
 | 57 | + | 
 | 58 | +  The popular [`cstr` crate](https://crates.io/crates/cstr) is a proc macro to work around the limiations of a `macro_rules` implementation, but that also has many downsides.  | 
 | 59 | + | 
 | 60 | +  Even if we had the right language features for a trivial correct implementation, there are many code bases where C strings are the primary form of string,  | 
 | 61 | +  making `cstr!("..")` syntax quite cumbersome.  | 
 | 62 | + | 
 | 63 | +* Allowing only valid UTF-8 and unicode-oriented escape codes (like in `"…"`, e.g. `螃蟹` or `\u{1F980}` but not `\xff`).  | 
 | 64 | + | 
 | 65 | +  For regular string literals, we have this restriction because `&str` is required to be valid UTF-8.  | 
 | 66 | +  However, C literals (and objects of our `&CStr` type) aren't necessarily valid UTF-8.  | 
 | 67 | + | 
 | 68 | +* Allowing only ASCII characters rand byte-oriented escape codes (like in `b"…"`, e.g. `\xff` but not `螃蟹` or `\u{1F980}`).  | 
 | 69 | + | 
 | 70 | +  While C literals (and  `&CStr`) aren't necessarily valid UTF-8, they often do contain UTF-8 data.  | 
 | 71 | +  Refusing to put UTF-8 in it would make the feature less useful and would unnecessarily make it harder to use unicode in programs that mainly use C strings.  | 
 | 72 | + | 
 | 73 | +* Having separate `c"…"` and `bc"…"` string literal prefixes for UTF-8 and non-UTF8.  | 
 | 74 | + | 
 | 75 | +  Both of those would be the same type (`&CStr`). Unless we add a special "always valid UTF-8 C string" type, there's not much use in separating them.  | 
 | 76 | + | 
 | 77 | +* Use `z` instead of `c` (`z"…"`), for "zero terminated" instead of "C string".  | 
 | 78 | + | 
 | 79 | +  We already have a type called `CStr` for this, so `c` seems consistent.  | 
 | 80 | + | 
 | 81 | +# Drawbacks  | 
 | 82 | +[drawbacks]: #drawbacks  | 
 | 83 | + | 
 | 84 | +- The `CStr` type needs some work. `&CStr` is currently a wide pointer, but it's supposed to be a thin pointer. See https://doc.rust-lang.org/1.65.0/src/core/ffi/c_str.rs.html#87  | 
 | 85 | + | 
 | 86 | +  It's not a blocker, but we might want to try to fix that before stabilizing `c"…"`.  | 
 | 87 | + | 
 | 88 | +# Prior art  | 
 | 89 | +[prior-art]: #prior-art  | 
 | 90 | + | 
 | 91 | +- NIM has `cstring"…"`.  | 
 | 92 | +- COBOL has `Z"…"`.  | 
 | 93 | +- Probably a lot more languages, but it's hard to search for. :)  | 
 | 94 | + | 
 | 95 | +# Unresolved questions  | 
 | 96 | +[unresolved-questions]: #unresolved-questions  | 
 | 97 | + | 
 | 98 | +- Should we make `&CStr` a thin pointer before stabilizing this? (If so, how?)  | 
 | 99 | +- Should the (unstable) [`concat_bytes` macro](https://github.com/rust-lang/rust/issues/87555) accept C string literals? (If so, should it evaluate to a C string or byte string?)  | 
 | 100 | + | 
 | 101 | +# Future possibilities  | 
 | 102 | +[future-possibilities]: #future-possibilities  | 
 | 103 | + | 
 | 104 | +- Make `concat!()` or `concat_bytes!()` work with `c"…"`.  | 
 | 105 | +- Make `format_args!(c"…")` (and `format_args!(b"…")`) work.  | 
 | 106 | +- Improve the `&CStr` type, and make it FFI safe.  | 
0 commit comments