[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

jacobsa · 2023-11-11T00:25:31Z

This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.

The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline std::coroutine_handle::address. But as far as I'm aware there was no systematic fix to teach clang that it's not allowed to access the frame after suspending. I think the new issue here is a consequence of this.

With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended.

I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign await_suspend method with symmetric transfer of control to await each element of an array of 32 integers:

#include <coroutine>

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int& x;

  bool await_ready() { return false; }
  std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() { return {}; }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int& x) { return Awaiter{x}; }
  };
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int& x : g_array) {
    co_await x;
  }
}

If you build this on armv8-a clang trunk with -std=c++20 -fno-exceptions -fsanitize=address -fsanitize=undefined -fno-sanitize=nonnull-attribute -fno-sanitize=vptr -fno-sanitize=function -fno-sanitize=enum -fno-sanitize-trap=all -O0 -emit-llvm (Compiler Explorer), you get this assembly as part of the resume function for FooBar (full dump):

        bl      Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)
        ldr     x9, [x19, #536]                 // 8-byte Folded Reload
        str     x0, [x19, #240]                 // 8-byte Folded Spill
        str     x9, [x19, #248]                 // 8-byte Folded Spill
        mov     x8, #68719476736                // =0x1000000000
        add     x8, x8, x9, lsr #3
        ldrb    w8, [x8]
        cbz     w8, .LBB16_52
        b       .LBB16_51
.LBB16_51:
        ldr     x0, [x19, #248]                 // 8-byte Folded Reload
        bl      __asan_report_store8

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

  %call29 = call i64 @Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11.reload.addr, i64 %coerce.val.pi28), !dbg !329
  %coerce.dive30 = getelementptr inbounds %"struct.std::__n4861::coroutine_handle", ptr %ref.tmp18.reload.addr, i32 0, i32 0, !dbg !329
  %coerce.val.ip31 = inttoptr i64 %call29 to ptr, !dbg !329
  %144 = ptrtoint ptr %coerce.dive30 to i64, !dbg !329
  %145 = lshr i64 %144, 3, !dbg !329
  %146 = add i64 %145, 68719476736, !dbg !329
  %147 = inttoptr i64 %146 to ptr, !dbg !329
  %148 = load i8, ptr %147, align 1, !dbg !329
  %149 = icmp ne i8 %148, 0, !dbg !329
  br i1 %149, label %150, label %151, !dbg !329

150:
  call void @__asan_report_store8(i64 %144) #10, !dbg !329
  unreachable, !dbg !329

151:
  store ptr %coerce.val.ip31, ptr %coerce.dive30, align 8, !dbg !329

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

%FooBar().Frame = type { ptr, ptr, %"struct.MyTask::promise_type", i2, %"struct.std::__n4861::suspend_always", %"struct.std::__n4861::suspend_always", ptr, ptr, ptr, ptr, %struct.Awaiter, %"struct.std::__n4861::coroutine_handle", %"struct.std::__n4861::coroutine_handle.0" }
[...]
  %ref.tmp18.reload.addr = getelementptr inbounds %FooBar().Frame, ptr %0, i32 0, i32 11, !dbg !320

In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends.

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-11-11T00:42:48Z

@llvm/issue-subscribers-coroutines

Author: Aaron Jacobs (jacobsa)

This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: **clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race**. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.

The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline std::coroutine_handle::address. But as far as I'm aware there was no systematic fix to teach clang that it's not allowed to access the frame after suspending. I think the new issue here is a consequence of this.

With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended.

I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign await_suspend method with symmetric transfer of control to await each element of an array of 32 integers:

#include &lt;coroutine&gt;

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int&amp; x;

  bool await_ready() { return false; }
  std::coroutine_handle&lt;&gt; await_suspend(const std::coroutine_handle&lt;&gt; h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() { return {}; }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int&amp; x) { return Awaiter{x}; }
  };
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int&amp; x : g_array) {
    co_await x;
  }
}

If you build this on armv8-a clang trunk with -std=c++20 -fno-exceptions -fsanitize=address -fsanitize=undefined -fno-sanitize=nonnull-attribute -fno-sanitize=vptr -fno-sanitize=function -fno-sanitize=enum -fno-sanitize-trap=all -O0 -emit-llvm (Compiler Explorer), you get this assembly as part of the resume function for FooBar (full dump):

        bl      Awaiter::await_suspend(std::__n4861::coroutine_handle&lt;void&gt;)
        ldr     x9, [x19, #<!-- -->536]                 // 8-byte Folded Reload
        str     x0, [x19, #<!-- -->240]                 // 8-byte Folded Spill
        str     x9, [x19, #<!-- -->248]                 // 8-byte Folded Spill
        mov     x8, #<!-- -->68719476736                // =0x1000000000
        add     x8, x8, x9, lsr #<!-- -->3
        ldrb    w8, [x8]
        cbz     w8, .LBB16_52
        b       .LBB16_51
.LBB16_51:
        ldr     x0, [x19, #<!-- -->248]                 // 8-byte Folded Reload
        bl      __asan_report_store8

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

  %call29 = call i64 @<!-- -->Awaiter::await_suspend(std::__n4861::coroutine_handle&lt;void&gt;)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11.reload.addr, i64 %coerce.val.pi28), !dbg !329
  %coerce.dive30 = getelementptr inbounds %"struct.std::__n4861::coroutine_handle", ptr %ref.tmp18.reload.addr, i32 0, i32 0, !dbg !329
  %coerce.val.ip31 = inttoptr i64 %call29 to ptr, !dbg !329
  %144 = ptrtoint ptr %coerce.dive30 to i64, !dbg !329
  %145 = lshr i64 %144, 3, !dbg !329
  %146 = add i64 %145, 68719476736, !dbg !329
  %147 = inttoptr i64 %146 to ptr, !dbg !329
  %148 = load i8, ptr %147, align 1, !dbg !329
  %149 = icmp ne i8 %148, 0, !dbg !329
  br i1 %149, label %150, label %151, !dbg !329

150:
  call void @<!-- -->__asan_report_store8(i64 %144) #<!-- -->10, !dbg !329
  unreachable, !dbg !329

151:
  store ptr %coerce.val.ip31, ptr %coerce.dive30, align 8, !dbg !329

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

%FooBar().Frame = type { ptr, ptr, %"struct.MyTask::promise_type", i2, %"struct.std::__n4861::suspend_always", %"struct.std::__n4861::suspend_always", ptr, ptr, ptr, ptr, %struct.Awaiter, %"struct.std::__n4861::coroutine_handle", %"struct.std::__n4861::coroutine_handle.0" }
[...]
  %ref.tmp18.reload.addr = getelementptr inbounds %FooBar().Frame, ptr %0, i32 0, i32 11, !dbg !320

In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends.

alanzhao1 · 2024-02-23T22:33:54Z

So my understanding here is that Clang should not write anything to the stack/execution frame that called await_suspend, which includes spilling registers. Is that right?

If so, I'm not sure how (or even if its possible) to achieve this.

jacobsa · 2024-02-23T22:39:47Z

No, it can freely use the stack frame, which belongs to a single thread that is still devoted to dealing with cleanup. It can't write anything to the coroutine frame once suspended, because it may no longer even exist anymore.

rnk · 2024-02-23T23:04:00Z

It sounds like the invariant is that the compiler is not supposed to touch the coroutine frame after the suspend point, and somewhere in the optimization pipeline, this invariant is broken. I think the next step is to dump the IR after every pass using -mllvm -print-after-all and try to figure out which pass breaks this invariant, or if it is maintained at all in the first place.

I think ASan is detecting a true positive issue, the store of the handle into the frame exists before ASan instrumentation runs.

ChuanqiXu9 · 2024-02-26T01:59:29Z

IIUC, this may be fixed fundamentally in #79712.

alanzhao1 · 2024-02-28T00:11:40Z

I'm a little confused by the issue description. You state that:

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

and

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

However, neither the ARM assembly nor the LLVM IR in your example has a store instruction. Do you mean there's a store elsewhere that we don't see here?

jacobsa · 2024-02-28T00:33:18Z

This is subtle, sorry. I'll highlight the important part:

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack.

I'm not good enough at reading ARM assembly to say for sure whether there is an actual store. If there was, it would be a correctness problem too. But the important part here is that a store is being reported to asan, after the coroutine has been suspended and therefore after the frame potentially being destroyed on another thread. Even if there is no real store, this causes a use-after-free report from asan due to clang's actions, not the user's.

In the llvm IR shown after that it seems that there is an incorrect store introduced in the IR at some stage in the pipeline. I don't know if it made it into the ARM assembly or not. I'm also not 100% sure of this, because I find IR hard to read.

Either way I'm fairly confident that the report of a store is wrong. There is no use-after-free in the user code, so one of the two following things happened:

Clang introduced a store after suspending. This is a correctness problem, and asan correctly reported it.
Clang didn't introduce a store, and hallucinated a report to asan that causes it to give an incorrect use-after-free report.

Either way this is a bug from the user's point of view. The only part I'm unsure about is where the bug lies.

jacobsa · 2024-02-28T00:36:29Z

That is to say, I think all of the following are true:

It's a bug for clang to introduce a store to the coroutine frame after the coroutine suspends.
It's a bug for clang to report a real store that happens before suspension to asan, but delay the report until after the coroutine suspends.
It's a bug for clang to report a store that doesn't exist to asan, especially if it reports it after the coroutine suspends.
One of the above is happening here. I'm hoping you can help me figure out which.

alanzhao1 · 2024-02-28T23:44:05Z

I think this was actually fixed sometime since the bug was reported.

The latest LLVM IR is:

  %call20 = call ptr @Awaiter::await_suspend(std::__1::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp8.reload.addr, ptr %117)
  %coerce.dive21 = getelementptr inbounds %"struct.std::__1::coroutine_handle", ptr %15, i32 0, i32 0
  store ptr %call20, ptr %coerce.dive21, align 8

The store is now to %coerce.dive21, which now points to an area of memory in the stack of FooBar.Resume. Before, the store was to %ref.tmp18.reload.addr, which points to the frame object passed to await_suspend (WTF???).

I'm doing a git bisect to figure out which commit changed this.

See https://godbolt.org/z/z5oqs1fjW

alanzhao1 · 2024-02-29T00:04:09Z

git bisect result:

20e6515d5c5ff155a54e10f64caef1c76d8d5976 is the first bad commit
commit 20e6515d5c5ff155a54e10f64caef1c76d8d5976
Author: Chuanqi Xu <[email protected]>
Date:   Tue Aug 29 14:19:11 2023 +0800

    [Coroutines] Mark 'coroutine_handle<>::address' as always-inline

    Close https://github.com/llvm/llvm-project/issues/65054

    The direct issue is still the call to coroutine_handle<>::address()
    after await_suspend(). Without optimizations, the current logic will put
    the temporary result of await_suspend() to the coroutine frame since the
    middle end feel the temporary is escaped from
    coroutine_handle<>::address. To fix this fundamentally, we should wrap
    the whole logic about await-suspend into a standalone function. See
    https://github.com/llvm/llvm-project/issues/64945

    And as a short-term workaround, we probably can mark
    coroutine_handle<>::address() as always-inline so that the temporary
    result may not be thought to be escaped then it won't be put on the
    coroutine frame. Although it looks dirty, it is probably do-able since
    the compiler are allowed to do special tricks to standard library
    components.

 clang/lib/Sema/SemaCoroutine.cpp         | 22 ++++++++++++
 clang/test/CodeGenCoroutines/pr65054.cpp | 60 ++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)
 create mode 100644 clang/test/CodeGenCoroutines/pr65054.cpp
bisect found first bad commit

alanzhao1 · 2024-02-29T00:07:30Z

Actually, nvm - it turns out that this is broken if you add all the other sanitizer flags: https://godbolt.org/z/zaTa6jKfr

alanzhao1 · 2024-02-29T00:20:04Z

By adding and removing -fsanitize=undefined, we see that UBSan is causing Clang to generate code that reuses the frame object.

alanzhao1 · 2024-03-07T22:38:39Z

I can confirm that #79712 will resolve this bug.

Per the LLVM IR generated with -fsanitize=address and -fsanitize-undefined, we do not store anything to the frame object after await_suspend:

AfterCoroSave37:                                  ; preds = %86
  %87 = call ptr @__await_suspend_wrapper__Z6FooBarv_await(ptr %ref.tmp5.reload.addr, ptr %0)
  %88 = getelementptr inbounds { ptr, ptr }, ptr %87, i32 0, i32 0
  %89 = ptrtoint ptr %88 to i64
  %90 = lshr i64 %89, 3
  %91 = add i64 %90, 2147450880
  %92 = inttoptr i64 %91 to ptr
  %93 = load i8, ptr %92, align 1
  %94 = icmp ne i8 %93, 0
  br i1 %94, label %95, label %96

95:                                               ; preds = %AfterCoroSave37
  call void @__asan_report_load8(i64 %89) #10
  unreachable

96:                                               ; preds = %AfterCoroSave37
  %97 = load ptr, ptr %88, align 8
  musttail call fastcc void %97(ptr %87)
  ret void

ChuanqiXu9 · 2024-03-11T02:02:41Z

Given #79712 get landed, I'll close this.

Feel free to send similar issue reports founded. Thanks a lot.

github-actions bot added the clang Clang issues not falling into any other category label Nov 11, 2023

EugeneZelenko added coroutines C++20 coroutines and removed clang Clang issues not falling into any other category labels Nov 11, 2023

usx95 added this to C++ 20 in Clang Nov 28, 2023

tzcnt mentioned this issue Jan 25, 2024

perf: default-initialize thread local variables tzcnt/TooManyCooks#6

Merged

fpasserby mentioned this issue Jan 29, 2024

[coroutine] Implement llvm.coro.await.suspend intrinsic #79712

Merged

ChuanqiXu9 closed this as completed Mar 11, 2024

github-project-automation bot moved this to Done in C++ 20 in Clang Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

jacobsa commented Nov 11, 2023

llvmbot commented Nov 11, 2023

alanzhao1 commented Feb 23, 2024

jacobsa commented Feb 23, 2024

rnk commented Feb 23, 2024

ChuanqiXu9 commented Feb 26, 2024

alanzhao1 commented Feb 28, 2024

jacobsa commented Feb 28, 2024

jacobsa commented Feb 28, 2024 •

edited

Loading

alanzhao1 commented Feb 28, 2024

alanzhao1 commented Feb 29, 2024

alanzhao1 commented Feb 29, 2024

alanzhao1 commented Feb 29, 2024

alanzhao1 commented Mar 7, 2024

ChuanqiXu9 commented Mar 11, 2024

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

Comments

jacobsa commented Nov 11, 2023

llvmbot commented Nov 11, 2023

alanzhao1 commented Feb 23, 2024

jacobsa commented Feb 23, 2024

rnk commented Feb 23, 2024

ChuanqiXu9 commented Feb 26, 2024

alanzhao1 commented Feb 28, 2024

jacobsa commented Feb 28, 2024

jacobsa commented Feb 28, 2024 • edited Loading

alanzhao1 commented Feb 28, 2024

alanzhao1 commented Feb 29, 2024

alanzhao1 commented Feb 29, 2024

alanzhao1 commented Feb 29, 2024

alanzhao1 commented Mar 7, 2024

ChuanqiXu9 commented Mar 11, 2024

jacobsa commented Feb 28, 2024 •

edited

Loading