Skip to content

Commit b32aa72

Browse files
committed
Recommit [C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty
The original patch is incorrect since it marks too many calls to be noinline. It shows that it is bad to do analysis in the frontend again. This patch tries to mark the await_suspend function as noinlne only. --- Close #56301 Close #64151 Close #65018 See the summary and the discussion of https://reviews.llvm.org/D157070 to get the full context. As @rjmccall pointed out, the key point of the root cause is that currently we didn't implement the semantics for '@llvm.coro.save' well ("after the await-ready returns false, the coroutine is considered to be suspended ") well. Since the semantics implies that we (the compiler) shouldn't write the spills into the coroutine frame in the await_suspend. But now it is possible due to some combinations of the optimizations so the semantics are broken. And the inlining is the root optimization of such optimizations. So in this patch, we tried to add the `noinline` attribute to the await_suspend function. This looks slightly problematic since the users are able to call the await_suspend function standalone. This is limited by the implementation. On the one hand, we don't want the workaround solution (See the proposed solution later) to be too complex. On the other hand, it is rare to call await_suspend standalone. Also it is not semantically incorrect to do so since the inlining is not part of the C++ standard. Also as an optimization, we don't add the `noinline` attribute to the await_suspend function if the awaiter is an empty class. This should be correct since the programmers can't access the local variables in await_suspend if the awaiter is empty. I think this is necessary for the performance since it is pretty common. The long term solution is: call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle, ptr @awaitSuspendFn) Then it is much easier to perform the safety analysis in the middle end. If it is safe to inline the call to awaitSuspend, we can replace it in the CoroEarly pass. Otherwise we could replace it in the CoroSplit pass. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D157833
1 parent 7ffc507 commit b32aa72

File tree

6 files changed

+365
-0
lines changed

6 files changed

+365
-0
lines changed

clang/docs/ReleaseNotes.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,13 @@ Bug Fixes in This Version
174174
``abs`` builtins.
175175
(`#45129 <https://github.com/llvm/llvm-project/issues/45129>`_,
176176
`#45794 <https://github.com/llvm/llvm-project/issues/45794>`_)
177+
- Fixed an issue where accesses to the local variables of a coroutine during
178+
``await_suspend`` could be misoptimized, including accesses to the awaiter
179+
object itself.
180+
(`#56301 <https://github.com/llvm/llvm-project/issues/56301>`_)
181+
The current solution may bring performance regressions if the awaiters have
182+
non-static data members. See
183+
`#64945 <https://github.com/llvm/llvm-project/issues/64945>`_ for details.
177184
- Clang now prints unnamed members in diagnostic messages instead of giving an
178185
empty ''. Fixes
179186
(`#63759 <https://github.com/llvm/llvm-project/issues/63759>`_)

clang/lib/CodeGen/CGCoroutine.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
201201
CGF.CurCoro.InSuspendBlock = true;
202202
auto *SuspendRet = CGF.EmitScalarExpr(S.getSuspendExpr());
203203
CGF.CurCoro.InSuspendBlock = false;
204+
204205
if (SuspendRet != nullptr && SuspendRet->getType()->isIntegerTy(1)) {
205206
// Veto suspension if requested by bool returning await_suspend.
206207
BasicBlock *RealSuspendBlock =

clang/lib/Sema/SemaCoroutine.cpp

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,63 @@ static Expr *maybeTailCall(Sema &S, QualType RetType, Expr *E,
360360
JustAddress);
361361
}
362362

363+
/// The await_suspend call performed by co_await is essentially asynchronous
364+
/// to the execution of the coroutine. Inlining it normally into an unsplit
365+
/// coroutine can cause miscompilation because the coroutine CFG misrepresents
366+
/// the true control flow of the program: things that happen in the
367+
/// await_suspend are not guaranteed to happen prior to the resumption of the
368+
/// coroutine, and things that happen after the resumption of the coroutine
369+
/// (including its exit and the potential deallocation of the coroutine frame)
370+
/// are not guaranteed to happen only after the end of await_suspend.
371+
///
372+
/// See https://github.com/llvm/llvm-project/issues/56301 and
373+
/// https://reviews.llvm.org/D157070 for the example and the full discussion.
374+
///
375+
/// The short-term solution to this problem is to mark the call as uninlinable.
376+
/// But we don't want to do this if the call is known to be trivial, which is
377+
/// very common.
378+
///
379+
/// The long-term solution may introduce patterns like:
380+
///
381+
/// call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
382+
/// ptr @awaitSuspendFn)
383+
///
384+
/// Then it is much easier to perform the safety analysis in the middle end.
385+
/// If it is safe to inline the call to awaitSuspend, we can replace it in the
386+
/// CoroEarly pass. Otherwise we could replace it in the CoroSplit pass.
387+
static void tryMarkAwaitSuspendNoInline(Sema &S, OpaqueValueExpr *Awaiter,
388+
CallExpr *AwaitSuspend) {
389+
// The method here to extract the awaiter decl is not precise.
390+
// This is intentional. Since it is hard to perform the analysis in the
391+
// frontend due to the complexity of C++'s type systems.
392+
// And we prefer to perform such analysis in the middle end since it is
393+
// easier to implement and more powerful.
394+
CXXRecordDecl *AwaiterDecl =
395+
Awaiter->getType().getNonReferenceType()->getAsCXXRecordDecl();
396+
397+
if (AwaiterDecl && AwaiterDecl->field_empty())
398+
return;
399+
400+
FunctionDecl *FD = AwaitSuspend->getDirectCallee();
401+
402+
assert(FD);
403+
404+
// If the `await_suspend()` function is marked as `always_inline` explicitly,
405+
// we should give the user the right to control the codegen.
406+
if (FD->hasAttr<NoInlineAttr>() || FD->hasAttr<AlwaysInlineAttr>())
407+
return;
408+
409+
// This is problematic if the user calls the await_suspend standalone. But on
410+
// the on hand, it is not incorrect semantically since inlining is not part
411+
// of the standard. On the other hand, it is relatively rare to call
412+
// the await_suspend function standalone.
413+
//
414+
// And given we've already had the long-term plan, the current workaround
415+
// looks relatively tolerant.
416+
FD->addAttr(
417+
NoInlineAttr::CreateImplicit(S.getASTContext(), FD->getLocation()));
418+
}
419+
363420
/// Build calls to await_ready, await_suspend, and await_resume for a co_await
364421
/// expression.
365422
/// The generated AST tries to clean up temporary objects as early as
@@ -431,6 +488,10 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema &S, VarDecl *CoroPromise,
431488
// type Z.
432489
QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
433490

491+
// We need to mark await_suspend as noinline temporarily. See the comment
492+
// of tryMarkAwaitSuspendNoInline for details.
493+
tryMarkAwaitSuspendNoInline(S, Operand, AwaitSuspend);
494+
434495
// Support for coroutine_handle returning await_suspend.
435496
if (Expr *TailCallSuspend =
436497
maybeTailCall(S, RetType, AwaitSuspend, Loc))
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
// Tests that we can mark await-suspend as noinline correctly.
2+
//
3+
// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
4+
// RUN: -O1 -disable-llvm-passes | FileCheck %s
5+
6+
#include "Inputs/coroutine.h"
7+
8+
struct Task {
9+
struct promise_type {
10+
struct FinalAwaiter {
11+
bool await_ready() const noexcept { return false; }
12+
template <typename PromiseType>
13+
std::coroutine_handle<> await_suspend(std::coroutine_handle<PromiseType> h) noexcept {
14+
return h.promise().continuation;
15+
}
16+
void await_resume() noexcept {}
17+
};
18+
19+
Task get_return_object() noexcept {
20+
return std::coroutine_handle<promise_type>::from_promise(*this);
21+
}
22+
23+
std::suspend_always initial_suspend() noexcept { return {}; }
24+
FinalAwaiter final_suspend() noexcept { return {}; }
25+
void unhandled_exception() noexcept {}
26+
void return_void() noexcept {}
27+
28+
std::coroutine_handle<> continuation;
29+
};
30+
31+
Task(std::coroutine_handle<promise_type> handle);
32+
~Task();
33+
34+
private:
35+
std::coroutine_handle<promise_type> handle;
36+
};
37+
38+
struct StatefulAwaiter {
39+
int value;
40+
bool await_ready() const noexcept { return false; }
41+
template <typename PromiseType>
42+
void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
43+
void await_resume() noexcept {}
44+
};
45+
46+
typedef std::suspend_always NoStateAwaiter;
47+
using AnotherStatefulAwaiter = StatefulAwaiter;
48+
49+
template <class T>
50+
struct TemplatedAwaiter {
51+
T value;
52+
bool await_ready() const noexcept { return false; }
53+
template <typename PromiseType>
54+
void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
55+
void await_resume() noexcept {}
56+
};
57+
58+
59+
class Awaitable {};
60+
StatefulAwaiter operator co_await(Awaitable) {
61+
return StatefulAwaiter{};
62+
}
63+
64+
StatefulAwaiter GlobalAwaiter;
65+
class Awaitable2 {};
66+
StatefulAwaiter& operator co_await(Awaitable2) {
67+
return GlobalAwaiter;
68+
}
69+
70+
struct AlwaysInlineStatefulAwaiter {
71+
void* value;
72+
bool await_ready() const noexcept { return false; }
73+
74+
template <typename PromiseType>
75+
__attribute__((always_inline))
76+
void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
77+
78+
void await_resume() noexcept {}
79+
};
80+
81+
Task testing() {
82+
co_await std::suspend_always{};
83+
co_await StatefulAwaiter{};
84+
co_await AnotherStatefulAwaiter{};
85+
86+
// Test lvalue case.
87+
StatefulAwaiter awaiter;
88+
co_await awaiter;
89+
90+
// The explicit call to await_suspend is not considered suspended.
91+
awaiter.await_suspend(std::coroutine_handle<void>::from_address(nullptr));
92+
93+
co_await TemplatedAwaiter<int>{};
94+
TemplatedAwaiter<int> TemplatedAwaiterInstace;
95+
co_await TemplatedAwaiterInstace;
96+
97+
co_await Awaitable{};
98+
co_await Awaitable2{};
99+
100+
co_await AlwaysInlineStatefulAwaiter{};
101+
}
102+
103+
struct AwaitTransformTask {
104+
struct promise_type {
105+
struct FinalAwaiter {
106+
bool await_ready() const noexcept { return false; }
107+
template <typename PromiseType>
108+
std::coroutine_handle<> await_suspend(std::coroutine_handle<PromiseType> h) noexcept {
109+
return h.promise().continuation;
110+
}
111+
void await_resume() noexcept {}
112+
};
113+
114+
AwaitTransformTask get_return_object() noexcept {
115+
return std::coroutine_handle<promise_type>::from_promise(*this);
116+
}
117+
118+
std::suspend_always initial_suspend() noexcept { return {}; }
119+
FinalAwaiter final_suspend() noexcept { return {}; }
120+
void unhandled_exception() noexcept {}
121+
void return_void() noexcept {}
122+
123+
template <typename Awaitable>
124+
auto await_transform(Awaitable &&awaitable) {
125+
return awaitable;
126+
}
127+
128+
std::coroutine_handle<> continuation;
129+
};
130+
131+
AwaitTransformTask(std::coroutine_handle<promise_type> handle);
132+
~AwaitTransformTask();
133+
134+
private:
135+
std::coroutine_handle<promise_type> handle;
136+
};
137+
138+
struct awaitableWithGetAwaiter {
139+
bool await_ready() const noexcept { return false; }
140+
template <typename PromiseType>
141+
void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
142+
void await_resume() noexcept {}
143+
};
144+
145+
AwaitTransformTask testingWithAwaitTransform() {
146+
co_await awaitableWithGetAwaiter{};
147+
}
148+
149+
// CHECK: define{{.*}}@_ZNSt14suspend_always13await_suspendESt16coroutine_handleIvE{{.*}}#[[NORMAL_ATTR:[0-9]+]]
150+
151+
// CHECK: define{{.*}}@_ZN15StatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E{{.*}}#[[NOINLINE_ATTR:[0-9]+]]
152+
153+
// CHECK: define{{.*}}@_ZN15StatefulAwaiter13await_suspendIvEEvSt16coroutine_handleIT_E{{.*}}#[[NORMAL_ATTR]]
154+
155+
// CHECK: define{{.*}}@_ZN16TemplatedAwaiterIiE13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E{{.*}}#[[NOINLINE_ATTR]]
156+
157+
// CHECK: define{{.*}}@_ZN27AlwaysInlineStatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E{{.*}}#[[ALWAYS_INLINE_ATTR:[0-9]+]]
158+
159+
// CHECK: define{{.*}}@_ZN4Task12promise_type12FinalAwaiter13await_suspendIS0_EESt16coroutine_handleIvES3_IT_E{{.*}}#[[NORMAL_ATTR]]
160+
161+
// CHECK: define{{.*}}@_ZN23awaitableWithGetAwaiter13await_suspendIN18AwaitTransformTask12promise_typeEEEvSt16coroutine_handleIT_E{{.*}}#[[NORMAL_ATTR]]
162+
163+
// CHECK: define{{.*}}@_ZN18AwaitTransformTask12promise_type12FinalAwaiter13await_suspendIS0_EESt16coroutine_handleIvES3_IT_E{{.*}}#[[NORMAL_ATTR]]
164+
165+
// CHECK-NOT: attributes #[[NORMAL_ATTR]] = noinline
166+
// CHECK: attributes #[[NOINLINE_ATTR]] = {{.*}}noinline
167+
// CHECK-NOT: attributes #[[ALWAYS_INLINE_ATTR]] = {{.*}}noinline
168+
// CHECK: attributes #[[ALWAYS_INLINE_ATTR]] = {{.*}}alwaysinline
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
// An end-to-end test to make sure things get processed correctly.
2+
// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s -O3 | \
3+
// RUN: FileCheck %s
4+
5+
#include "Inputs/coroutine.h"
6+
7+
struct SomeAwaitable {
8+
// Resume the supplied handle once the awaitable becomes ready,
9+
// returning a handle that should be resumed now for the sake of symmetric transfer.
10+
// If the awaitable is already ready, return an empty handle without doing anything.
11+
//
12+
// Defined in another translation unit. Note that this may contain
13+
// code that synchronizees with another thread.
14+
std::coroutine_handle<> Register(std::coroutine_handle<>);
15+
};
16+
17+
// Defined in another translation unit.
18+
void DidntSuspend();
19+
20+
struct Awaiter {
21+
SomeAwaitable&& awaitable;
22+
bool suspended;
23+
24+
bool await_ready() { return false; }
25+
26+
std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h) {
27+
// Assume we will suspend unless proven otherwise below. We must do
28+
// this *before* calling Register, since we may be destroyed by another
29+
// thread asynchronously as soon as we have registered.
30+
suspended = true;
31+
32+
// Attempt to hand off responsibility for resuming/destroying the coroutine.
33+
const auto to_resume = awaitable.Register(h);
34+
35+
if (!to_resume) {
36+
// The awaitable is already ready. In this case we know that Register didn't
37+
// hand off responsibility for the coroutine. So record the fact that we didn't
38+
// actually suspend, and tell the compiler to resume us inline.
39+
suspended = false;
40+
return h;
41+
}
42+
43+
// Resume whatever Register wants us to resume.
44+
return to_resume;
45+
}
46+
47+
void await_resume() {
48+
// If we didn't suspend, make note of that fact.
49+
if (!suspended) {
50+
DidntSuspend();
51+
}
52+
}
53+
};
54+
55+
struct MyTask{
56+
struct promise_type {
57+
MyTask get_return_object() { return {}; }
58+
std::suspend_never initial_suspend() { return {}; }
59+
std::suspend_always final_suspend() noexcept { return {}; }
60+
void unhandled_exception();
61+
62+
Awaiter await_transform(SomeAwaitable&& awaitable) {
63+
return Awaiter{static_cast<SomeAwaitable&&>(awaitable)};
64+
}
65+
};
66+
};
67+
68+
MyTask FooBar() {
69+
co_await SomeAwaitable();
70+
}
71+
72+
// CHECK-LABEL: @_Z6FooBarv
73+
// CHECK: %[[to_resume:.*]] = {{.*}}call ptr @_ZN13SomeAwaitable8RegisterESt16coroutine_handleIvE
74+
// CHECK-NEXT: %[[to_bool:.*]] = icmp eq ptr %[[to_resume]], null
75+
// CHECK-NEXT: br i1 %[[to_bool]], label %[[then:.*]], label %[[else:.*]]
76+
77+
// CHECK: [[then]]:
78+
// We only access the coroutine frame conditionally as the sources did.
79+
// CHECK: store i8 0,
80+
// CHECK-NEXT: br label %[[else]]
81+
82+
// CHECK: [[else]]:
83+
// No more access to the coroutine frame until suspended.
84+
// CHECK-NOT: store
85+
// CHECK: }
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 \
2+
// RUN: -O1 -emit-llvm %s -o - | FileCheck %s
3+
4+
#include "Inputs/coroutine.h"
5+
6+
// A simple awaiter type with an await_suspend method that can't be
7+
// inlined.
8+
struct Awaiter {
9+
const int& x;
10+
11+
bool await_ready() { return false; }
12+
std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
13+
void await_resume() {}
14+
};
15+
16+
struct MyTask {
17+
// A lazy promise with an await_transform method that supports awaiting
18+
// integer references using the Awaiter struct above.
19+
struct promise_type {
20+
MyTask get_return_object() { return {}; }
21+
std::suspend_always initial_suspend() { return {}; }
22+
std::suspend_always final_suspend() noexcept { return {}; }
23+
void unhandled_exception();
24+
25+
auto await_transform(const int& x) { return Awaiter{x}; }
26+
};
27+
};
28+
29+
// A global array of integers.
30+
int g_array[32];
31+
32+
// A coroutine that awaits each integer in the global array.
33+
MyTask FooBar() {
34+
for (const int& x : g_array) {
35+
co_await x;
36+
}
37+
}
38+
39+
// CHECK: %[[RET:.+]] = {{.*}}call{{.*}}@_ZN7Awaiter13await_suspendESt16coroutine_handleIvE
40+
// CHECK: %[[RESUME_ADDR:.+]] = load ptr, ptr %[[RET]],
41+
// CHECK: musttail call fastcc void %[[RESUME_ADDR]]({{.*}}%[[RET]]
42+
// CHECK: ret
43+

0 commit comments

Comments
 (0)