-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[bug] clang miscompiles coroutine awaiter, moving write across a critical section #56301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
By the way, I should mention that I discovered this because tsan reports it as a data race. And I think it's correct: clang has introduced a data race by putting a write after the call to |
I'm not sure if this is related to a known bug that current coroutine couldn't cache TLS variable correctly. @jacobsa Could you build clang from source? If yes, could you test it again after applying https://reviews.llvm.org/D125291 and https://reviews.llvm.org/D127383? |
@ChuanqiXu9: just saw your comment after writing this. I'll try that shortly, but it may take me some time because I've never done it before. In the meantime here is some information about the IR—can you tell whether it's related based on that? Here is an IR dump after each optimization pass made with You can see that in the version on line 3669 we still have the correct control flow: store i8 1, i8* %44, align 8, !dbg !913
%45 = getelementptr inbounds %struct.Awaiter, %struct.Awaiter* %5, i64 0, i32 0, !dbg !914
%46 = load %struct.SomeAwaitable*, %struct.SomeAwaitable** %45, align 8, !dbg !914
%47 = icmp eq %struct.SomeAwaitable* %46, null, !dbg !915, !nosanitize !82
br i1 %47, label %48, label %49, !dbg !915, !nosanitize !82
48: ; preds = %37
call void @llvm.ubsantrap(i8 22) #14, !nosanitize !82
unreachable, !nosanitize !82
49: ; preds = %37
%50 = call i8* @_ZN13SomeAwaitable8RegisterENSt3__u16coroutine_handleIvEE(%struct.SomeAwaitable* noundef nonnull align 1 dereferenceable(1) %46, i8* %43) #2, !dbg !915
call void @llvm.dbg.value(metadata i8* %50, metadata !909, metadata !DIExpression()) #2, !dbg !910
call void @llvm.dbg.value(metadata %"struct.std::__u::coroutine_handle"* undef, metadata !916, metadata !DIExpression()) #2, !dbg !920
%51 = icmp eq i8* %50, null, !dbg !923
br i1 %51, label %52, label %53, !dbg !924
52: ; preds = %49
store i8 0, i8* %44, align 8, !dbg !925
br label %53, !dbg !927 However the 25: ; preds = %21
%26 = call i8* @_ZN13SomeAwaitable8RegisterENSt3__u16coroutine_handleIvEE(%struct.SomeAwaitable* noundef nonnull align 1 dereferenceable(1) %19, i8* %11) #2, !dbg !910
call void @llvm.dbg.value(metadata i8* %26, metadata !907, metadata !DIExpression()) #2, !dbg !908
call void @llvm.dbg.value(metadata %"struct.std::__u::coroutine_handle"* undef, metadata !911, metadata !DIExpression()) #2, !dbg !915
%27 = icmp eq i8* %26, null, !dbg !918
br i1 %27, label %28, label %29, !dbg !919
28: ; preds = %25
br label %29, !dbg !920
29: ; preds = %25, %28
%30 = phi i8 [ 0, %28 ], [ 1, %25 ], !dbg !908
%31 = phi i8* [ %11, %28 ], [ %26, %25 ], !dbg !908
call void @llvm.dbg.value(metadata %"struct.std::__u::coroutine_handle"* undef, metadata !922, metadata !DIExpression()), !dbg !925
%32 = call i8* @llvm.coro.subfn.addr(i8* %31, i8 0)
%33 = bitcast i8* %32 to void (i8*)*
call fastcc void %33(i8* %31) #2, !dbg !890
%34 = call i8 @llvm.coro.suspend(token %22, i1 false), !dbg !890
switch i8 %34, label %52 [
i8 0, label %35
i8 1, label %46
], !dbg !890
35: ; preds = %29, %15
%36 = phi i8 [ %20, %15 ], [ %30, %29 ], !dbg !927
call void @llvm.dbg.value(metadata %struct.Awaiter* undef, metadata !928, metadata !DIExpression()) #2, !dbg !931
%37 = icmp eq i8 %36, 0, !dbg !933
br i1 %37, label %38, label %39, !dbg !935
38: ; preds = %35
call void @_Z12DidntSuspendv() #2, !dbg !936
br label %39, !dbg !938 The lack of a store is preserved up through the version on line 3068: %14 = call i8* @_ZN13SomeAwaitable8RegisterENSt3__u16coroutine_handleIvEE(%struct.SomeAwaitable* noundef nonnull align 1 dereferenceable(1) %2, i8* %11) #2, !dbg !841
call void @llvm.dbg.value(metadata i8* %14, metadata !838, metadata !DIExpression()) #2, !dbg !839
call void @llvm.dbg.value(metadata %"struct.std::__u::coroutine_handle"* undef, metadata !842, metadata !DIExpression()) #2, !dbg !846
%15 = icmp eq i8* %14, null, !dbg !849
%16 = select i1 %15, i8* %11, i8* %14, !dbg !850
%17 = call i8* @llvm.coro.subfn.addr(i8* %16, i8 0)
%18 = bitcast i8* %17 to void (i8*)*
call fastcc void %18(i8* %16) #2, !dbg !832
%19 = call i8 @llvm.coro.suspend(token %13, i1 false), !dbg !832
switch i8 %19, label %32 [
i8 0, label %20
i8 1, label %26
], !dbg !832
20: ; preds = %9
call void @llvm.dbg.value(metadata %struct.Awaiter* undef, metadata !851, metadata !DIExpression()) #2, !dbg !854
br i1 %15, label %21, label %22, !dbg !856
21: ; preds = %20
call void @_Z12DidntSuspendv() #2, !dbg !857
br label %22, !dbg !860 But then on line 6111 %27 = call i8* @_ZN13SomeAwaitable8RegisterENSt3__u16coroutine_handleIvEE(%struct.SomeAwaitable* noundef nonnull align 1 dereferenceable(1) %19, i8* %13) #2, !dbg !858
%28 = getelementptr inbounds %_Z6FooBarv.Frame, %_Z6FooBarv.Frame* %14, i32 0, i32 5, !dbg !850
store i8* %27, i8** %28, align 8, !dbg !850 I'd appreciate anybody's thoughts about what could be done to prevent this. |
@ChuanqiXu9 okay yes, I can reproduce this at > ./bin/clang++ -std=c++20 -O1 -fno-exceptions -S -mllvm --x86-asm-syntax=intel ~/tmp/foo.cc
> grep -A 5 Register foo.s
call _ZN13SomeAwaitable8RegisterENSt7__n486116coroutine_handleIvEE@PLT
mov qword ptr [rbx + 24], rax
test rax, rax
cmove rax, rbx
mov rdi, rax
call qword ptr [rax] I applied https://reviews.llvm.org/D125291 and https://reviews.llvm.org/D127383 in their current state and rebuilt clang, and still get the same result. I guess that makes sense—there is no TLS here. |
Oh, sorry for misleading. I think I get the problem. Long story short, your analysis (and the analysis of tsan) is correct. This is a (potential) miscompile. Here is the reason:
The key issue here is that:
I think we need to introduce something like CoroutineAA to provide the information. I would try to look at it.
This is not an option to me. The key reason why Clang/LLVM want to construct coroutine frames is about the performance. And in fact, there were many such bugs about coroutines, which could be fixed in one shot if we disable the optimizations. So our strategy is always to fix the actual issues. As a heavy user and developer of coroutines, I believe it should be the right choice since the performance is a key reason why we chose C++. |
Yeah, I didn't mean disabling optimizations altogether. Just recognizing that this particular optimization shouldn't be performed for objects that span an It's probably more complicated than I realize. Thanks for looking; I look forward to seeing what fix you come up with. :-) |
I have recently run into the same issue using clang 14.0.6. My conclusion is that the |
GCC has much less coroutine bugs than clang. Since all the coroutine related works in GCC are done in the frontend. And for clang, the middle end gets involved to optimize coroutines further. |
I am aware that the support for coroutines is much more limited in gcc. That is why I am experimenting with clang. I love the fact that clang is able to fully inline non-recursive synchronous generators. Here are some code snippets that might help pinpoint the underlying issue (hopefully the same one observed by @jacobsa). this code does not trigger the issue:
this code triggers the issue if the
The issue (according to TSAN) is that the local |
Bug report 59221 has a different reason than this. So ignore the above mentioning. Now I feel this is like a TSan bug/defect instead of a compiler/optimizer bug/defect more. Here are my reasons:
In another word, in the example, the function @jacobsa @havardpe Does the explanation make sense for you? If yes, I think it may be better to file another issue to tell the TSan guys to address this. |
Are you saying this hinges on the type erasure of |
No. I mean SomeAwaitable::Register isn't allowed to access
So it is clear that users are allowed to access the resume function, destroy function and the promise (if they know the type). But they are not allowed to access the
or
the above code is clearly not legal. So I want to say that the users (or SomeAwaitable::Register in this example) shouldn't/can't access the other part of the coroutine frame (or |
As far as I can see, both issues (write ordering, local variable placement) boils down to this: To work well in a multi-threaded environment, a coroutine needs to support being resumed in another thread before the await_suspend function has returned in the thread handling its suspension. |
@ChuanqiXu9 Ah, I see what you mean. I think @havardpe has it right. This issue isn't about the fact that The generated assembly code gets this wrong. It unconditionally writes to the coroutine frame after const auto to_resume = awaitable.Register(h);
if (!to_resume) {
suspended = false;
return h;
} call SomeAwaitable::Register(std::__n4861::coroutine_handle<void>)
mov qword ptr [rbx + 24], rax This is not correct, because My proof from the original post about the write of |
If my reading is correct, you're saying the coroutine
So these two sentences look like UB to me if my understanding is right. |
This is the problem, yes.
Could you please cite the standard in more detail if you think this is UB? My understanding is that the coroutine is suspended once it enters I don't think there is any alternative to this. There is no way for the implementor to synchronize on |
http://eel.is/c++draft/coroutine.handle.resumption talks about we can't resume/destroy a non-suspended coroutine. And about the definition of Luckily we found another defect report. Let me see how to fix it... I feel like it is a pretty tough one...
I got your point. Although we did this usually like:
The reason why we don't use |
This sounds like it's the heart of the bug; thanks for finding it! |
Also, I think you already agree with me, but I do want to drive home the point that clang's current definition doesn't make sense. Here's your example of more typical code again: void await_suspend(coroutine_handle<> h) {
get_scheduler()->schedule(h);
return;
} But this code also doesn't synchronize on
If the definition is "a coroutine is suspended once |
Yeah, you're right and we've thought this before too. Although our previous explanation is "it is OK from the compiler's perspective since there is no code left after In fact I am surprised that the coroutine is considered suspended in the execution of |
I leave it to you to say how hard it is to fix the implementation, but I'll defend the standard, since I think its definition makes sense:
|
I got your point. Your words make sense from the perspective of users. |
Isn't this exactly why the LLVM model has the llvm.coro.save instrinsic? My understanding was that the LLVM code calls |
… regressions The fix we sent for llvm/llvm-project#56301 may bring performance regressions. But we didn't mention it in the ReleaseNotes so that users may get confused. e.g, llvm/llvm-project#64933. So this patch mentions the possible side effect and the potential solutions in llvm/llvm-project#64945 to avoid misunderstandings.
Got it. I'll try to take a look but I can't make a promise. |
/branch ChuanqiXu9/llvm-project/release/17.x |
/pull-request llvm/llvm-project-release-prs#655 |
/branch ChuanqiXu9/llvm-project/release/17.x |
Merged into release/17.x |
Remove this from LLVM17.x Release milestone since the fix wouldn't be there. |
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
… not empty Close llvm#56301 Close llvm#64151 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend call. Also as an optimization, we don't add the `noinline` attribute to the await_suspend call if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. Another potential optimization is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
I've hit what I think is a miscompilation bug in clang, where a write is moved in an illegal way that introduces a data race and/or use of uninitialized memory. Here is a test case reduced from my real codebase (Compiler Explorer):
The idea is that the awaiter is implemented by calling a
Register
function in a foreign translation unit that decides what to do:If the coroutine should be resumed immediately, it returns a null handle to indicate this.
If the coroutine will be resumed later, it reduces some other handle to resume now, for symmetric control. (Maybe
std::noop_coroutine()
.)Further, when we don't actually wind up suspending we need
await_resume
to do some follow-up work, in this case represented by calling theDidntSuspend
function. So we use asuspended
member to track whether we actually suspended. This is written before callingRegister
, and read after resuming.The bug I see in my codebase is that the write of
true
tosuspended
is delayed until after the call toRegister
. In the reduced test case, we have something similar. Here is what Compiler Explorer gives me for clang with-std=c++20 -O1 -fno-exceptions
:The coroutine frame address is in
rbx
. After callingRegister
, the returned handle is stored into the coroutine frame at offset 24 and then resumed (or the original handle resumed if it's empty), and later in[clone .resume]
the handle in the frame at offset 24 is compared to zero to synthesize theif (!suspended)
condition.But it's not safe to store the returned handle in the coroutine frame unless it's zero: any other value indicates that
Register
took responsibility for the coroutine handle, and may have passed it off to another thread. So another thread may have calleddestroy
on the handle by the time we get around to writing into it. Similarly, the other thread may already have resumed the coroutine and see an uninitialized value at offset 24.I think this is a miscompilation. Consider for example that
Register
may contain a critical section under a mutex that hands the coroutine handle off to another thread to resume, with a similar critical section in the other thread synchronizing with the first. (This is the situation in my codebase.) So we have:The write of
suspended
inawait_suspend
is sequenced before the call toRegister
below it inawait_suspend
.The call to
Register
synchronizes with the function on the other thread that resumes the coroutine.That synchronization is sequenced before resuming the coroutine handle.
Resuming the coroutine handle is (I believe?) sequenced before the call to
await_resume
that readssuspended
.Therefore the write of
suspended
inter-thread happens before the read ofsuspended
.So there was no data race before, but clang has introduced one by delaying the write to the coroutine frame.
For what it's worth, I spent some time dumping IR after optimization passes with my real codebase, and in that case this seemed to be related to an interaction betweem
SROAPass
andCoroSplitPass
:Until
SROAPass
the write was a simple store to the coroutine frame, before the call toRegister
.SROAPass
eliminated the write altogether, turning it into phi nodes that plumbed the value directly into the branch. The value was plumbed from before the call toRegister
to after it.CoroSplitPass
re-introduced astore
instruction, after the call toRegister
.I am far from an expert here, but I wonder if
SROAPass
should be forbidden from making optimizatons of this sort across anllvm.coro.suspend
?The text was updated successfully, but these errors were encountered: