-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[bug] clang incorrectly uses coroutine frame for scratch space after suspending (take 3) #72006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-coroutines Author: Aaron Jacobs (jacobsa)
This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: **clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race**. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.
The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended. I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign #include <coroutine>
// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
const int& x;
bool await_ready() { return false; }
std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
void await_resume() {}
};
struct MyTask {
// A lazy promise with an await_transform method that supports awaiting
// integer references using the Awaiter struct above.
struct promise_type {
MyTask get_return_object() { return {}; }
std::suspend_always initial_suspend() { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }
void unhandled_exception();
auto await_transform(const int& x) { return Awaiter{x}; }
};
};
// A global array of integers.
int g_array[32];
// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
for (const int& x : g_array) {
co_await x;
}
} If you build this on armv8-a clang trunk with bl Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)
ldr x9, [x19, #<!-- -->536] // 8-byte Folded Reload
str x0, [x19, #<!-- -->240] // 8-byte Folded Spill
str x9, [x19, #<!-- -->248] // 8-byte Folded Spill
mov x8, #<!-- -->68719476736 // =0x1000000000
add x8, x8, x9, lsr #<!-- -->3
ldrb w8, [x8]
cbz w8, .LBB16_52
b .LBB16_51
.LBB16_51:
ldr x0, [x19, #<!-- -->248] // 8-byte Folded Reload
bl __asan_report_store8 This shows that after suspension asan is being told that there is a store to an address loaded from %call29 = call i64 @<!-- -->Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11.reload.addr, i64 %coerce.val.pi28), !dbg !329
%coerce.dive30 = getelementptr inbounds %"struct.std::__n4861::coroutine_handle", ptr %ref.tmp18.reload.addr, i32 0, i32 0, !dbg !329
%coerce.val.ip31 = inttoptr i64 %call29 to ptr, !dbg !329
%144 = ptrtoint ptr %coerce.dive30 to i64, !dbg !329
%145 = lshr i64 %144, 3, !dbg !329
%146 = add i64 %145, 68719476736, !dbg !329
%147 = inttoptr i64 %146 to ptr, !dbg !329
%148 = load i8, ptr %147, align 1, !dbg !329
%149 = icmp ne i8 %148, 0, !dbg !329
br i1 %149, label %150, label %151, !dbg !329
150:
call void @<!-- -->__asan_report_store8(i64 %144) #<!-- -->10, !dbg !329
unreachable, !dbg !329
151:
store ptr %coerce.val.ip31, ptr %coerce.dive30, align 8, !dbg !329 There is a store to %FooBar().Frame = type { ptr, ptr, %"struct.MyTask::promise_type", i2, %"struct.std::__n4861::suspend_always", %"struct.std::__n4861::suspend_always", ptr, ptr, ptr, ptr, %struct.Awaiter, %"struct.std::__n4861::coroutine_handle", %"struct.std::__n4861::coroutine_handle.0" }
[...]
%ref.tmp18.reload.addr = getelementptr inbounds %FooBar().Frame, ptr %0, i32 0, i32 11, !dbg !320 In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends. |
So my understanding here is that Clang should not write anything to the stack/execution frame that called If so, I'm not sure how (or even if its possible) to achieve this. |
No, it can freely use the stack frame, which belongs to a single thread that is still devoted to dealing with cleanup. It can't write anything to the coroutine frame once suspended, because it may no longer even exist anymore. |
It sounds like the invariant is that the compiler is not supposed to touch the coroutine frame after the suspend point, and somewhere in the optimization pipeline, this invariant is broken. I think the next step is to dump the IR after every pass using I think ASan is detecting a true positive issue, the store of the handle into the frame exists before ASan instrumentation runs. |
IIUC, this may be fixed fundamentally in #79712. |
I'm a little confused by the issue description. You state that:
and
However, neither the ARM assembly nor the LLVM IR in your example has a store instruction. Do you mean there's a store elsewhere that we don't see here? |
This is subtle, sorry. I'll highlight the important part:
I'm not good enough at reading ARM assembly to say for sure whether there is an actual store. If there was, it would be a correctness problem too. But the important part here is that a store is being reported to asan, after the coroutine has been suspended and therefore after the frame potentially being destroyed on another thread. Even if there is no real store, this causes a use-after-free report from asan due to clang's actions, not the user's. In the llvm IR shown after that it seems that there is an incorrect store introduced in the IR at some stage in the pipeline. I don't know if it made it into the ARM assembly or not. I'm also not 100% sure of this, because I find IR hard to read. Either way I'm fairly confident that the report of a store is wrong. There is no use-after-free in the user code, so one of the two following things happened:
Either way this is a bug from the user's point of view. The only part I'm unsure about is where the bug lies. |
That is to say, I think all of the following are true:
|
I think this was actually fixed sometime since the bug was reported. The latest LLVM IR is: %call20 = call ptr @Awaiter::await_suspend(std::__1::coroutine_handle<void>)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp8.reload.addr, ptr %117)
%coerce.dive21 = getelementptr inbounds %"struct.std::__1::coroutine_handle", ptr %15, i32 0, i32 0
store ptr %call20, ptr %coerce.dive21, align 8 The store is now to I'm doing a git bisect to figure out which commit changed this. |
git bisect result:
|
Actually, nvm - it turns out that this is broken if you add all the other sanitizer flags: https://godbolt.org/z/zaTa6jKfr |
By adding and removing |
I can confirm that #79712 will resolve this bug. Per the LLVM IR generated with
|
Given #79712 get landed, I'll close this. Feel free to send similar issue reports founded. Thanks a lot. |
This issue is related to #65018 and #65054, where clang incorrectly uses a coroutine frame for scratch space after the coroutine suspended. To recap: clang must not insert code that accesses the coroutine frame after it suspends, because this may introduce a data race. Another thread may already have resumed or destroyed the coroutine by the time this thread gets around to accessing its frame.
The immediate symptoms in those two issues were fixed by @ChuanqiXu9, by manipulating whether clang will inline
std::coroutine_handle::address
. But as far as I'm aware there was no systematic fix to teach clang that it's not allowed to access the frame after suspending. I think the new issue here is a consequence of this.With a large piece of internal Google code we've found that AddressSanitizer sometimes reports an incorrect use-after-free error in coroutine code, claiming that the frame is being written to after it was already destroyed by another thread that ran concurrently after it suspended.
I don't have a nicely packaged version of the original code to share, but I believe the reproducer from #65018 also shows this issue with the appropriate build settings. Here is the code again, containing a function that uses a foreign
await_suspend
method with symmetric transfer of control to await each element of an array of 32 integers:If you build this on armv8-a clang trunk with
-std=c++20 -fno-exceptions -fsanitize=address -fsanitize=undefined -fno-sanitize=nonnull-attribute -fno-sanitize=vptr -fno-sanitize=function -fno-sanitize=enum -fno-sanitize-trap=all -O0 -emit-llvm
(Compiler Explorer), you get this assembly as part of the resume function forFooBar
(full dump):This shows that after suspension asan is being told that there is a store to an address loaded from
x19+536
, which is something earlier spilled to the stack. What's going on is clearer with-emit-llvm
(full dump, justFooBar.resume
):There is a store to
%coerce.dive30
, which is a pointer to astd::coroutine_handle
on the coroutine frame:In other words, with these build settings clang still incorrectly dumps a coroutine handle to the coroutine frame after it suspends.
The text was updated successfully, but these errors were encountered: