Skip to content

Segfault in release mode when reading an enum with a smartstring in one of the variants #13

@cormacrelf

Description

@cormacrelf
  • rustc 1.48.0-nightly (a1947b3f9 2020-09-10) on macOS 10.14.6 64-bit
  • smartstring 0.2.5`

I'm in the process of refactoring a medium codebase to use smartstring. I'm getting a segfault when reading an enum containing a smarstring as a variant, only in release mode. The invalid address I've seen is is 0x0, which is discovered in my libc's _platform_memmove when cloning the enum (or trying to fmt::Debug it). Weirdly enough, I have observed writing other enum variants, and then reading the one with a smartstring in it instead.

Backtrace:

* thread #2, name = 'csl_test_suite::bugreports_GreekStyleProblems.txt', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00007fff6db1d650 libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 112
    frame #1: 0x0000000100014d3f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] core::intrinsics::copy_nonoverlapping::h6dcf6051f7b35d2e at intrinsics.rs:1858:14 [opt]
    frame #2: 0x0000000100014d31 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::vec::Vec$LT$T$GT$::append_elements::h9acd358ad7c793de at vec.rs:1268 [opt]
    frame #3: 0x0000000100014d1f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$alloc..vec..SpecExtend$LT$$RF$T$C$core..slice..Iter$LT$T$GT$$GT$$GT$::spec_extend::h0c2714dd53677232 at vec.rs:2410 [opt]
    frame #4: 0x0000000100014d1f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::vec::Vec$LT$T$GT$::extend_from_slice::hc9db780e179c9492 at vec.rs:1592 [opt]
    frame #5: 0x0000000100014d1f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::slice::hack::to_vec::h4416f72647053294 at slice.rs:156 [opt]
    frame #6: 0x0000000100014cb8 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::slice::_$LT$impl$u20$$u5b$T$u5d$$GT$::to_vec::h20b2a9edbcf66ddc at slice.rs:391 [opt]
    frame #7: 0x0000000100014cb8 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$::clone::hb8bb0739bf3e72a0 at vec.rs:1938 [opt]
    frame #8: 0x0000000100014cb1 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a at string.rs:1679 [opt]
    frame #9: 0x0000000100234f5e suite-c6ec4dbd521a69d6`citeproc_proc::disamb::add_to_graph::h2bf47a2062ed62b4 + 2126
<snip, user code>

Here's frame 9, showing that cast() gave us Boxed:

frame #9: 0x000000010023684e suite-e27366cf4d0ebb7f`citeproc_proc::disamb::add_to_graph::hdd46c801cda0b1e8 at lib.rs:235:59 [opt]
   232      /// [Copy]: https://doc.rust-lang.org/std/marker/trait.Copy.html
   233      fn clone(&self) -> Self {
   234          match self.cast() {
-> 235              StringCast::Boxed(string) => Self::from_boxed(string.string().clone().into()),
   236              StringCast::Inline(string) => Self::from_inline(*string),
   237          }
   238      }

It is called from the derived clone impl of this enum:

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum EdgeData {
    Output(SmartString<LazyCompact>),
    Locator,
    LocatorLabel,
    YearSuffix,
    YearSuffixExplicit,
    YearSuffixPlain,
    CitationNumber,
    CitationNumberLabel,
    Frnn,
    FrnnLabel,
    Accessed,
}

I would note, out of ~900 tests that my code is running, all of which exercise this code, only two of them segfault, and they segfault every time in the same way. These tests are over here; a piece of EdgeData is created for a single element like <label variable="locator"> appearing in one of those files. I thought it was something to do with the content (both having a lot of Greek unicode), but I condensed the test case into a bare minimum repro that didn't produce any SmartStrings at all! The code segfaults when the only EdgeData instance in the entire execution an EdgeData::LocatorLabel, which I have verified with logging in debug mode. And this program pretty directly creates these edgedata, and then immediately reads them, there is no opportunity for them to be corrupted in between.

So I don't lose it, the minimal repro, down from bugreports_GreekStyleProblems.txt, is:

<style class="note" version="1.0">
  <macro name="a"> <label variable="locator"/> </macro>
  <citation><layout> <text macro="a"/> </layout></citation>
</style>

I added some logging just after calling the function where EdgeData::LocatorLabel is created, and had to change it to write!(a_logfile, "{:?}", refir) because the debug statement caused a segfault. (Progress!) Here's the output:

Edge(None)
Edge(None)
Edge(Some(⏎ 

That carriage return symbol is my terminal saying the file ends without a newline, because it segfaulted in the same way while formatting the EdgeData::LocatorLabel. I modified my minimal case to try the other variants instead, and couldn't get them to segfault, but I also found LocatorLabel wouldn't segfault if it appeared in the test case in a different context... it's all pretty fragile.

Here's the code where the edgedata is returned as part of a larger type:

    let mut contents = Vec::with_capacity(els.len());
    let mut overall_gv = GroupVars::new();

    for el in els {
        // the EdgeData is constructed in ref_ir(...)
        let (ir, gv) =
            Disambiguation::<Markup>::ref_ir(el, db, ctx, state, formatting.unwrap_or_default());
        // <snip> manual logging code that segfaults goes here
        match ir {
            RefIR::Edge(None) => {
                overall_gv = overall_gv.neighbour(gv);
            }
            _ => {
                contents.push(ir);
                overall_gv = overall_gv.neighbour(gv)
            }
        }
    }

The Upshot

The upshot is, sometimes a SmartString in an enum will somehow overlap with other discriminants, such that an enum written with one variant will be read as a the SmartString variant. Then, you will be reading uninitialised memory and interpreting it as a SmartString (and hence trying to dereference an uninitialised pointer, or, I can imagine it reading uninitialised inline content). I have viewed the contents of the smartstring, and the data is pretty garbagey, but I need to look closer at it. I think this happens when Rust's release mode optimisation figures out a way to avoid copying the enum around by finding a slot for it, but I'm having a hard time coming up with a minimal code reproduction. I'm going to leave this report as is for now, but I'll try to get a commit + build instructions to repro tomorrow.

The big puzzles are how:

  • This program does not segfault with std::string::String but does with smartstring. No smartstring code gets run until it is erroneously read.
  • An enum discriminant is being written or read wrong (is this perhaps a rustc bug?) I checked and size_of::<EdgeData>() is 32, as one would expect. How can the computer mess up accessing a discriminant field? Isn't this what they're best at? Perhaps rustc/llvm is trying to cheat and getting mixed up because smartstring hasn't quite described itself properly to the compiler... if anything, one would think that the MaybeUninit would rule out anything like that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions