Skip to content

[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jacobsa opened this issue Nov 11, 2023 · 14 comments
Labels
coroutines C++20 coroutines

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Nov 11, 2023

This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.

The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline std::coroutine_handle::address. But as far as I'm aware there was no systematic fix to teach clang that it's not allowed to access the frame after suspending. I think the new issue here is a consequence of this.


With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended.

I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign await_suspend method with symmetric transfer of control to await each element of an array of 32 integers:

#include <coroutine>

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int& x;

  bool await_ready() { return false; }
  std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() { return {}; }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int& x) { return Awaiter{x}; }
  };
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int& x : g_array) {
    co_await x;
  }
}

If you build this on armv8-a clang trunk with -std=c++20 -fno-exceptions -fsanitize=address -fsanitize=undefined -fno-sanitize=nonnull-attribute -fno-sanitize=vptr -fno-sanitize=function -fno-sanitize=enum -fno-sanitize-trap=all -O0 -emit-llvm (Compiler Explorer), you get this assembly as part of the resume function for FooBar (full dump):

        bl      Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)
        ldr     x9, [x19, #536]                 // 8-byte Folded Reload
        str     x0, [x19, #240]                 // 8-byte Folded Spill
        str     x9, [x19, #248]                 // 8-byte Folded Spill
        mov     x8, #68719476736                // =0x1000000000
        add     x8, x8, x9, lsr #3
        ldrb    w8, [x8]
        cbz     w8, .LBB16_52
        b       .LBB16_51
.LBB16_51:
        ldr     x0, [x19, #248]                 // 8-byte Folded Reload
        bl      __asan_report_store8

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

  %call29 = call i64 @Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11.reload.addr, i64 %coerce.val.pi28), !dbg !329
  %coerce.dive30 = getelementptr inbounds %"struct.std::__n4861::coroutine_handle", ptr %ref.tmp18.reload.addr, i32 0, i32 0, !dbg !329
  %coerce.val.ip31 = inttoptr i64 %call29 to ptr, !dbg !329
  %144 = ptrtoint ptr %coerce.dive30 to i64, !dbg !329
  %145 = lshr i64 %144, 3, !dbg !329
  %146 = add i64 %145, 68719476736, !dbg !329
  %147 = inttoptr i64 %146 to ptr, !dbg !329
  %148 = load i8, ptr %147, align 1, !dbg !329
  %149 = icmp ne i8 %148, 0, !dbg !329
  br i1 %149, label %150, label %151, !dbg !329

150:
  call void @__asan_report_store8(i64 %144) #10, !dbg !329
  unreachable, !dbg !329

151:
  store ptr %coerce.val.ip31, ptr %coerce.dive30, align 8, !dbg !329

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

%FooBar().Frame = type { ptr, ptr, %"struct.MyTask::promise_type", i2, %"struct.std::__n4861::suspend_always", %"struct.std::__n4861::suspend_always", ptr, ptr, ptr, ptr, %struct.Awaiter, %"struct.std::__n4861::coroutine_handle", %"struct.std::__n4861::coroutine_handle.0" }
[...]
  %ref.tmp18.reload.addr = getelementptr inbounds %FooBar().Frame, ptr %0, i32 0, i32 11, !dbg !320

In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends.

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Nov 11, 2023
@EugeneZelenko EugeneZelenko added coroutines C++20 coroutines and removed clang Clang issues not falling into any other category labels Nov 11, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2023

@llvm/issue-subscribers-coroutines

Author: Aaron Jacobs (jacobsa)

This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: **clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race**. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.

The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline std::coroutine_handle::address. But as far as I'm aware there was no systematic fix to teach clang that it's not allowed to access the frame after suspending. I think the new issue here is a consequence of this.


With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended.

I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign await_suspend method with symmetric transfer of control to await each element of an array of 32 integers:

#include &lt;coroutine&gt;

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int&amp; x;

  bool await_ready() { return false; }
  std::coroutine_handle&lt;&gt; await_suspend(const std::coroutine_handle&lt;&gt; h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() { return {}; }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int&amp; x) { return Awaiter{x}; }
  };
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int&amp; x : g_array) {
    co_await x;
  }
}

If you build this on armv8-a clang trunk with -std=c++20 -fno-exceptions -fsanitize=address -fsanitize=undefined -fno-sanitize=nonnull-attribute -fno-sanitize=vptr -fno-sanitize=function -fno-sanitize=enum -fno-sanitize-trap=all -O0 -emit-llvm (Compiler Explorer), you get this assembly as part of the resume function for FooBar (full dump):

        bl      Awaiter::await_suspend(std::__n4861::coroutine_handle&lt;void&gt;)
        ldr     x9, [x19, #<!-- -->536]                 // 8-byte Folded Reload
        str     x0, [x19, #<!-- -->240]                 // 8-byte Folded Spill
        str     x9, [x19, #<!-- -->248]                 // 8-byte Folded Spill
        mov     x8, #<!-- -->68719476736                // =0x1000000000
        add     x8, x8, x9, lsr #<!-- -->3
        ldrb    w8, [x8]
        cbz     w8, .LBB16_52
        b       .LBB16_51
.LBB16_51:
        ldr     x0, [x19, #<!-- -->248]                 // 8-byte Folded Reload
        bl      __asan_report_store8

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

  %call29 = call i64 @<!-- -->Awaiter::await_suspend(std::__n4861::coroutine_handle&lt;void&gt;)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11.reload.addr, i64 %coerce.val.pi28), !dbg !329
  %coerce.dive30 = getelementptr inbounds %"struct.std::__n4861::coroutine_handle", ptr %ref.tmp18.reload.addr, i32 0, i32 0, !dbg !329
  %coerce.val.ip31 = inttoptr i64 %call29 to ptr, !dbg !329
  %144 = ptrtoint ptr %coerce.dive30 to i64, !dbg !329
  %145 = lshr i64 %144, 3, !dbg !329
  %146 = add i64 %145, 68719476736, !dbg !329
  %147 = inttoptr i64 %146 to ptr, !dbg !329
  %148 = load i8, ptr %147, align 1, !dbg !329
  %149 = icmp ne i8 %148, 0, !dbg !329
  br i1 %149, label %150, label %151, !dbg !329

150:
  call void @<!-- -->__asan_report_store8(i64 %144) #<!-- -->10, !dbg !329
  unreachable, !dbg !329

151:
  store ptr %coerce.val.ip31, ptr %coerce.dive30, align 8, !dbg !329

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

%FooBar().Frame = type { ptr, ptr, %"struct.MyTask::promise_type", i2, %"struct.std::__n4861::suspend_always", %"struct.std::__n4861::suspend_always", ptr, ptr, ptr, ptr, %struct.Awaiter, %"struct.std::__n4861::coroutine_handle", %"struct.std::__n4861::coroutine_handle.0" }
[...]
  %ref.tmp18.reload.addr = getelementptr inbounds %FooBar().Frame, ptr %0, i32 0, i32 11, !dbg !320

In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends.

@alanzhao1
Copy link
Contributor

So my understanding here is that Clang should not write anything to the stack/execution frame that called await_suspend, which includes spilling registers. Is that right?

If so, I'm not sure how (or even if its possible) to achieve this.

@jacobsa
Copy link
Contributor Author

jacobsa commented Feb 23, 2024

No, it can freely use the stack frame, which belongs to a single thread that is still devoted to dealing with cleanup. It can't write anything to the coroutine frame once suspended, because it may no longer even exist anymore.

@rnk
Copy link
Collaborator

rnk commented Feb 23, 2024

It sounds like the invariant is that the compiler is not supposed to touch the coroutine frame after the suspend point, and somewhere in the optimization pipeline, this invariant is broken. I think the next step is to dump the IR after every pass using -mllvm -print-after-all and try to figure out which pass breaks this invariant, or if it is maintained at all in the first place.

I think ASan is detecting a true positive issue, the store of the handle into the frame exists before ASan instrumentation runs.

@ChuanqiXu9
Copy link
Member

IIUC, this may be fixed fundamentally in #79712.

@alanzhao1
Copy link
Contributor

I'm a little confused by the issue description. You state that:

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack. What's going on is clearer with -emit-llvm (full dump, just FooBar.resume):

and

There is a store to %coerce.dive30, which is a pointer to a std::coroutine_handle on the coroutine frame:

However, neither the ARM assembly nor the LLVM IR in your example has a store instruction. Do you mean there's a store elsewhere that we don't see here?

@jacobsa
Copy link
Contributor Author

jacobsa commented Feb 28, 2024

This is subtle, sorry. I'll highlight the important part:

This shows that after suspension asan is being told that there is a store to an address loaded from x19+536, which is something earlier spilled to the stack.

I'm not good enough at reading ARM assembly to say for sure whether there is an actual store. If there was, it would be a correctness problem too. But the important part here is that a store is being reported to asan, after the coroutine has been suspended and therefore after the frame potentially being destroyed on another thread. Even if there is no real store, this causes a use-after-free report from asan due to clang's actions, not the user's.

In the llvm IR shown after that it seems that there is an incorrect store introduced in the IR at some stage in the pipeline. I don't know if it made it into the ARM assembly or not. I'm also not 100% sure of this, because I find IR hard to read.

Either way I'm fairly confident that the report of a store is wrong. There is no use-after-free in the user code, so one of the two following things happened:

  • Clang introduced a store after suspending. This is a correctness problem, and asan correctly reported it.

  • Clang didn't introduce a store, and hallucinated a report to asan that causes it to give an incorrect use-after-free report.

Either way this is a bug from the user's point of view. The only part I'm unsure about is where the bug lies.

@jacobsa
Copy link
Contributor Author

jacobsa commented Feb 28, 2024

That is to say, I think all of the following are true:

  • It's a bug for clang to introduce a store to the coroutine frame after the coroutine suspends.
  • It's a bug for clang to report a real store that happens before suspension to asan, but delay the report until after the coroutine suspends.
  • It's a bug for clang to report a store that doesn't exist to asan, especially if it reports it after the coroutine suspends.
  • One of the above is happening here. I'm hoping you can help me figure out which.

@alanzhao1
Copy link
Contributor

I think this was actually fixed sometime since the bug was reported.

The latest LLVM IR is:

  %call20 = call ptr @Awaiter::await_suspend(std::__1::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp8.reload.addr, ptr %117)
  %coerce.dive21 = getelementptr inbounds %"struct.std::__1::coroutine_handle", ptr %15, i32 0, i32 0
  store ptr %call20, ptr %coerce.dive21, align 8

The store is now to %coerce.dive21, which now points to an area of memory in the stack of FooBar.Resume. Before, the store was to %ref.tmp18.reload.addr, which points to the frame object passed to await_suspend (WTF???).

I'm doing a git bisect to figure out which commit changed this.

See https://godbolt.org/z/z5oqs1fjW

@alanzhao1
Copy link
Contributor

git bisect result:

20e6515d5c5ff155a54e10f64caef1c76d8d5976 is the first bad commit
commit 20e6515d5c5ff155a54e10f64caef1c76d8d5976
Author: Chuanqi Xu <[email protected]>
Date:   Tue Aug 29 14:19:11 2023 +0800

    [Coroutines] Mark 'coroutine_handle<>::address' as always-inline

    Close https://github.com/llvm/llvm-project/issues/65054

    The direct issue is still the call to coroutine_handle<>::address()
    after await_suspend(). Without optimizations, the current logic will put
    the temporary result of await_suspend() to the coroutine frame since the
    middle end feel the temporary is escaped from
    coroutine_handle<>::address. To fix this fundamentally, we should wrap
    the whole logic about await-suspend into a standalone function. See
    https://github.com/llvm/llvm-project/issues/64945

    And as a short-term workaround, we probably can mark
    coroutine_handle<>::address() as always-inline so that the temporary
    result may not be thought to be escaped then it won't be put on the
    coroutine frame. Although it looks dirty, it is probably do-able since
    the compiler are allowed to do special tricks to standard library
    components.

 clang/lib/Sema/SemaCoroutine.cpp         | 22 ++++++++++++
 clang/test/CodeGenCoroutines/pr65054.cpp | 60 ++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)
 create mode 100644 clang/test/CodeGenCoroutines/pr65054.cpp
bisect found first bad commit

@alanzhao1
Copy link
Contributor

Actually, nvm - it turns out that this is broken if you add all the other sanitizer flags: https://godbolt.org/z/zaTa6jKfr

@alanzhao1
Copy link
Contributor

By adding and removing -fsanitize=undefined, we see that UBSan is causing Clang to generate code that reuses the frame object.

@alanzhao1
Copy link
Contributor

I can confirm that #79712 will resolve this bug.

Per the LLVM IR generated with -fsanitize=address and -fsanitize-undefined, we do not store anything to the frame object after await_suspend:

AfterCoroSave37:                                  ; preds = %86
  %87 = call ptr @__await_suspend_wrapper__Z6FooBarv_await(ptr %ref.tmp5.reload.addr, ptr %0)
  %88 = getelementptr inbounds { ptr, ptr }, ptr %87, i32 0, i32 0
  %89 = ptrtoint ptr %88 to i64
  %90 = lshr i64 %89, 3
  %91 = add i64 %90, 2147450880
  %92 = inttoptr i64 %91 to ptr
  %93 = load i8, ptr %92, align 1
  %94 = icmp ne i8 %93, 0
  br i1 %94, label %95, label %96

95:                                               ; preds = %AfterCoroSave37
  call void @__asan_report_load8(i64 %89) #10
  unreachable

96:                                               ; preds = %AfterCoroSave37
  %97 = load ptr, ptr %88, align 8
  musttail call fastcc void %97(ptr %87)
  ret void

@ChuanqiXu9
Copy link
Member

Given #79712 get landed, I'll close this.

Feel free to send similar issue reports founded. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coroutines C++20 coroutines
Projects
Status: Done
Development

No branches or pull requests

6 participants