Skip to content

[Clang][Coroutines] Introducing the [[clang::coro_inplace_task]] attribute #94693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

yuxuanchen1997
Copy link
Member

@yuxuanchen1997 yuxuanchen1997 commented Jun 6, 2024

This patch proposes a C++ struct/class attribute [[clang::coro_inplace_task]]. This notion of inplace task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way.

This only changes things for C++ with Switch ABI.

Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example:

Task foo();
Task bar() {
  co_await foo();
}

For CoroElide to happen, the ramp function of foo must be inlined into bar. This inlining happens after foo has been split but bar is usually still a presplit coroutine. If foo is indeed a coroutine, the inlined coro.id intrinsics of foo is visible within bar. CoroElide then runs an analysis to figure out whether the SSA value of coro.begin() of foo gets destroyed before bar terminates.

Task types are rarely simple enough for the destroy logic of the task to reference the SSA value from coro.begin() directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally originates from the philosophy that library implementations of Task types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller.

The [[clang::coro_inplace_task]] is a class attribute which can be applied to a coroutine return type.

When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met:

  • callee coroutine function returns a type that is annotated with [[clang::coro_inplace_task]].
  • The callee coroutine function is inlined.
  • In caller coroutine, the return value of the callee is a prvalue that is immediately co_awaited.

From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements Task types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of co_await expressions (not operator co_await). If it's also attributed with [[clang::coro_inplace_task]], the FE emits a marker against the return object of the callee as a hint for the middle end to elide the elision. This is necessary because we don't perform inlining until opt pipeline and FE cannot even know if the callee is a coroutine. The marker here as proposed is an intrinsic called llvm.coro.safe.elide. When CoroElide sees this intrinsic, it immediately conclude that the coroutine allocation backing the task object is safe to elide.

There are a few known implementation gaps:

  • Would like inputs on FE approach.
  • Because the llvm.coro.safe.elide intrinsic is applied on the return value of the callee, CoroElide has to find association between the coro.begin() ssa value and the return value. Such association currently only works with zero offset of the task object. Upstream comments are welcome to help with a resolution of this problem.

@yuxuanchen1997 yuxuanchen1997 self-assigned this Jun 7, 2024
@yuxuanchen1997 yuxuanchen1997 added clang:frontend Language frontend issues, e.g. anything involving "Sema" coroutines C++20 coroutines labels Jun 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2024

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-coroutines

Author: Yuxuan Chen (yuxuanchen1997)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/94693.diff

10 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+8)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+20)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+71-5)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+5-1)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+3)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+3)
  • (modified) llvm/lib/Transforms/Coroutines/CoroCleanup.cpp (+8-3)
  • (modified) llvm/lib/Transforms/Coroutines/CoroElide.cpp (+53-3)
  • (modified) llvm/lib/Transforms/Coroutines/Coroutines.cpp (+1)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 17d9a710d948b..0fb623d6c0515 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1212,6 +1212,14 @@ def CoroDisableLifetimeBound : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def CoroStructuredConcurrencyType : InheritableAttr {
+  let Spellings = [Clang<"coro_structured_concurrency">];
+  let Subjects = SubjectList<[CXXRecord]>;
+  let LangOpts = [CPlusPlus];
+  let Documentation = [CoroStructuredConcurrencyDoc];
+  let SimpleHandler = 1;
+}
+
 // OSObject-based attributes.
 def OSConsumed : InheritableParamAttr {
   let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 70d5dfa8aaf86..50fc0fc2d16f6 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -8015,6 +8015,26 @@ but do not pass them to the underlying coroutine or pass them by value.
 }];
 }
 
+def CoroStructuredConcurrencyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+The ``[[clang::coro_structured_concurrency]]`` is a class attribute which can be applied
+to a coroutine return type.
+
+When a coroutine function that returns such a type calls another coroutine function,
+the compiler performs heap allocation elision when the following conditions are all met:
+- callee coroutine function returns a type that is annotated with
+  ``[[clang::coro_structured_concurrency]]``.
+- The callee coroutine function is inlined.
+- In caller coroutine, the return value of the callee is a prvalue or an xvalue, and
+- The temporary expression containing the callee coroutine object is immediately co_awaited.
+
+The behavior is undefined if any of the following condition was met:
+- the caller coroutine is destroyed earlier than the callee coroutine.
+
+  }];
+}
+
 def CountedByDocs : Documentation {
   let Category = DocCatField;
   let Content = [{
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index b4c724422c14a..78ae04982cba0 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -12,9 +12,12 @@
 
 #include "CGCleanup.h"
 #include "CodeGenFunction.h"
-#include "llvm/ADT/ScopeExit.h"
+#include "clang/AST/ExprCXX.h"
 #include "clang/AST/StmtCXX.h"
 #include "clang/AST/StmtVisitor.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/IR/Intrinsics.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -219,17 +222,81 @@ namespace {
     RValue RV;
   };
 }
+
+static MaterializeTemporaryExpr *
+getStructuredConcurrencyOperand(ASTContext &Ctx,
+                                CoroutineSuspendExpr const &S) {
+  auto *E = S.getCommonExpr();
+  auto *Temporary = dyn_cast_or_null<MaterializeTemporaryExpr>(E);
+  if (!Temporary)
+    return nullptr;
+
+  auto *Operator =
+      dyn_cast_or_null<CXXOperatorCallExpr>(Temporary->getSubExpr());
+
+  if (!Operator ||
+      Operator->getOperator() != OverloadedOperatorKind::OO_Coawait ||
+      Operator->getNumArgs() != 1)
+    return nullptr;
+
+  Expr *Arg = Operator->getArg(0);
+  assert(Arg && "Arg to operator co_await should not be null");
+  auto *CalleeRetClass = Arg->getType()->getAsCXXRecordDecl();
+
+  if (!CalleeRetClass ||
+      !CalleeRetClass->hasAttr<CoroStructuredConcurrencyTypeAttr>())
+    return nullptr;
+
+  if (!Arg->isTemporaryObject(Ctx, CalleeRetClass)) {
+    return nullptr;
+  }
+
+  return dyn_cast<MaterializeTemporaryExpr>(Arg);
+}
+
 static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
                                     bool ignoreResult, bool forLValue) {
   auto *E = S.getCommonExpr();
 
+  auto &Builder = CGF.Builder;
+  bool MarkOperandSafeIntrinsic = false;
+
+  auto *TemporaryOperand = [&]() -> MaterializeTemporaryExpr * {
+    bool CurFnRetTyHasAttr = false;
+    if (auto *RetTyPtr = CGF.FnRetTy.getTypePtrOrNull()) {
+      if (auto *CxxRecord = RetTyPtr->getAsCXXRecordDecl()) {
+        CurFnRetTyHasAttr =
+            CxxRecord->hasAttr<CoroStructuredConcurrencyTypeAttr>();
+      }
+    }
+
+    if (CurFnRetTyHasAttr) {
+      return getStructuredConcurrencyOperand(CGF.getContext(), S);
+    }
+    return nullptr;
+  }();
+
+  if (TemporaryOperand) {
+    CGF.TemporaryValues[TemporaryOperand] = nullptr;
+    MarkOperandSafeIntrinsic = true;
+  }
+
   auto CommonBinder =
       CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);
   auto UnbindCommonOnExit =
       llvm::make_scope_exit([&] { CommonBinder.unbind(CGF); });
 
+  if (MarkOperandSafeIntrinsic) {
+    auto It = CGF.TemporaryValues.find(TemporaryOperand);
+    assert(It != CGF.TemporaryValues.end());
+    if (auto Value = It->second)
+      Builder.CreateCall(CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_safe_elide),
+                         {Value});
+    CGF.TemporaryValues.erase(TemporaryOperand);
+  }
+
   auto Prefix = buildSuspendPrefixStr(Coro, Kind);
   BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
   BasicBlock *SuspendBlock = CGF.createBasicBlock(Prefix + Twine(".suspend"));
@@ -241,7 +308,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   // Otherwise, emit suspend logic.
   CGF.EmitBlock(SuspendBlock);
 
-  auto &Builder = CGF.Builder;
   llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
   auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
   auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
@@ -255,9 +321,9 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
          "expected to be called in coroutine context");
 
   SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
-  SuspendIntrinsicCallArgs.push_back(
-      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
-
+  auto *BoundAwaiterValue =
+      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF);
+  SuspendIntrinsicCallArgs.push_back(BoundAwaiterValue);
   SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
   SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
 
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d6478cc6835d8..139a602650b6d 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -618,7 +618,11 @@ EmitMaterializeTemporaryExpr(const MaterializeTemporaryExpr *M) {
     }
   }
 
-  return MakeAddrLValue(Object, M->getType(), AlignmentSource::Decl);
+  auto Ret = MakeAddrLValue(Object, M->getType(), AlignmentSource::Decl);
+  if (TemporaryValues.contains(M)) {
+    TemporaryValues[M] = Ret.getPointer(*this);
+  }
+  return Ret;
 }
 
 RValue
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 5739fbaaa9194..083cb8e2f376e 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -371,6 +371,9 @@ class CodeGenFunction : public CodeGenTypeCache {
   };
   CGCoroInfo CurCoro;
 
+  llvm::SmallDenseMap<const MaterializeTemporaryExpr *, llvm::Value *>
+      TemporaryValues;
+
   bool isCoroutine() const {
     return CurCoro.Data != nullptr;
   }
diff --git a/clang/test/Misc/pragma-attribute-supported-attributes-list.test b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
index 99732694f72a5..0369906d7c275 100644
--- a/clang/test/Misc/pragma-attribute-supported-attributes-list.test
+++ b/clang/test/Misc/pragma-attribute-supported-attributes-list.test
@@ -62,6 +62,7 @@
 // CHECK-NEXT: CoroLifetimeBound (SubjectMatchRule_record)
 // CHECK-NEXT: CoroOnlyDestroyWhenComplete (SubjectMatchRule_record)
 // CHECK-NEXT: CoroReturnType (SubjectMatchRule_record)
+// CHECK-NEXT: CoroStructuredConcurrencyType (SubjectMatchRule_record)
 // CHECK-NEXT: CoroWrapper (SubjectMatchRule_function)
 // CHECK-NEXT: DLLExport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
 // CHECK-NEXT: DLLImport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 107442623ab7b..8e554d99ee3b6 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1729,6 +1729,9 @@ def int_coro_subfn_addr : DefaultAttrsIntrinsic<
     [IntrReadMem, IntrArgMemOnly, ReadOnly<ArgIndex<0>>,
      NoCapture<ArgIndex<0>>]>;
 
+def int_coro_safe_elide : DefaultAttrsIntrinsic<
+    [], [llvm_ptr_ty], []>;
+
 ///===-------------------------- Other Intrinsics --------------------------===//
 //
 // TODO: We should introduce a new memory kind fo traps (and other side effects
diff --git a/llvm/lib/Transforms/Coroutines/CoroCleanup.cpp b/llvm/lib/Transforms/Coroutines/CoroCleanup.cpp
index 3e3825fcd50e2..71229eae5cb47 100644
--- a/llvm/lib/Transforms/Coroutines/CoroCleanup.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroCleanup.cpp
@@ -8,10 +8,11 @@
 
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "CoroInternal.h"
+#include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/PassManager.h"
-#include "llvm/IR/Function.h"
 #include "llvm/Transforms/Scalar/SimplifyCFG.h"
 
 using namespace llvm;
@@ -80,7 +81,7 @@ bool Lowerer::lower(Function &F) {
         } else
           continue;
         break;
-      case Intrinsic::coro_async_size_replace:
+      case Intrinsic::coro_async_size_replace: {
         auto *Target = cast<ConstantStruct>(
             cast<GlobalVariable>(II->getArgOperand(0)->stripPointerCasts())
                 ->getInitializer());
@@ -98,6 +99,9 @@ bool Lowerer::lower(Function &F) {
         Target->replaceAllUsesWith(NewFuncPtrStruct);
         break;
       }
+      case Intrinsic::coro_safe_elide:
+        break;
+      }
       II->eraseFromParent();
       Changed = true;
     }
@@ -111,7 +115,8 @@ static bool declaresCoroCleanupIntrinsics(const Module &M) {
       M, {"llvm.coro.alloc", "llvm.coro.begin", "llvm.coro.subfn.addr",
           "llvm.coro.free", "llvm.coro.id", "llvm.coro.id.retcon",
           "llvm.coro.id.async", "llvm.coro.id.retcon.once",
-          "llvm.coro.async.size.replace", "llvm.coro.async.resume"});
+          "llvm.coro.async.size.replace", "llvm.coro.async.resume",
+          "llvm.coro.safe.elide"});
 }
 
 PreservedAnalyses CoroCleanupPass::run(Module &M,
diff --git a/llvm/lib/Transforms/Coroutines/CoroElide.cpp b/llvm/lib/Transforms/Coroutines/CoroElide.cpp
index 74b5ccb7b9b71..dd2f72410c931 100644
--- a/llvm/lib/Transforms/Coroutines/CoroElide.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroElide.cpp
@@ -7,12 +7,14 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Coroutines/CoroElide.h"
+#include "CoroInstr.h"
 #include "CoroInternal.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/InstructionSimplify.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/PostDominators.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -56,7 +58,8 @@ class FunctionElideInfo {
 class CoroIdElider {
 public:
   CoroIdElider(CoroIdInst *CoroId, FunctionElideInfo &FEI, AAResults &AA,
-               DominatorTree &DT, OptimizationRemarkEmitter &ORE);
+               DominatorTree &DT, PostDominatorTree &PDT,
+               OptimizationRemarkEmitter &ORE);
   void elideHeapAllocations(uint64_t FrameSize, Align FrameAlign);
   bool lifetimeEligibleForElide() const;
   bool attemptElide();
@@ -68,6 +71,7 @@ class CoroIdElider {
   FunctionElideInfo &FEI;
   AAResults &AA;
   DominatorTree &DT;
+  PostDominatorTree &PDT;
   OptimizationRemarkEmitter &ORE;
 
   SmallVector<CoroBeginInst *, 1> CoroBegins;
@@ -183,8 +187,9 @@ void FunctionElideInfo::collectPostSplitCoroIds() {
 
 CoroIdElider::CoroIdElider(CoroIdInst *CoroId, FunctionElideInfo &FEI,
                            AAResults &AA, DominatorTree &DT,
+                           PostDominatorTree &PDT,
                            OptimizationRemarkEmitter &ORE)
-    : CoroId(CoroId), FEI(FEI), AA(AA), DT(DT), ORE(ORE) {
+    : CoroId(CoroId), FEI(FEI), AA(AA), DT(DT), PDT(PDT), ORE(ORE) {
   // Collect all coro.begin and coro.allocs associated with this coro.id.
   for (User *U : CoroId->users()) {
     if (auto *CB = dyn_cast<CoroBeginInst>(U))
@@ -336,6 +341,41 @@ bool CoroIdElider::canCoroBeginEscape(
   return false;
 }
 
+// FIXME: This is not accounting for the stores to tasks whose handle is not
+// zero offset.
+static const StoreInst *getPostDominatingStoreToTask(const CoroBeginInst *CB,
+                                                     PostDominatorTree &PDT) {
+  const StoreInst *OnlyStore = nullptr;
+
+  for (auto *U : CB->users()) {
+    auto *Store = dyn_cast<StoreInst>(U);
+    if (Store && Store->getValueOperand() == CB) {
+      if (OnlyStore) {
+        // Store must be unique. one coro begin getting stored to multiple
+        // stores is not accepted.
+        return nullptr;
+      }
+      OnlyStore = Store;
+    }
+  }
+
+  if (!OnlyStore || !PDT.dominates(OnlyStore, CB)) {
+    return nullptr;
+  }
+
+  return OnlyStore;
+}
+
+static bool isMarkedSafeElide(const llvm::Value *V) {
+  for (auto *U : V->users()) {
+    auto *II = dyn_cast<IntrinsicInst>(U);
+    if (II && (II->getIntrinsicID() == Intrinsic::coro_safe_elide)) {
+      return true;
+    }
+  }
+  return false;
+}
+
 bool CoroIdElider::lifetimeEligibleForElide() const {
   // If no CoroAllocs, we cannot suppress allocation, so elision is not
   // possible.
@@ -364,6 +404,15 @@ bool CoroIdElider::lifetimeEligibleForElide() const {
 
   // Filter out the coro.destroy that lie along exceptional paths.
   for (const auto *CB : CoroBegins) {
+    // This might be too strong of a condition but should be very safe.
+    // If the CB is unconditionally stored into a "Task Like Object",
+    // and such object is "safe elide".
+    if (auto *MaybeStoreToTask = getPostDominatingStoreToTask(CB, PDT)) {
+      auto Dest = MaybeStoreToTask->getPointerOperand();
+      if (isMarkedSafeElide(Dest))
+        continue;
+    }
+
     auto It = DestroyAddr.find(CB);
 
     // FIXME: If we have not found any destroys for this coro.begin, we
@@ -476,11 +525,12 @@ PreservedAnalyses CoroElidePass::run(Function &F, FunctionAnalysisManager &AM) {
 
   AAResults &AA = AM.getResult<AAManager>(F);
   DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
+  PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
   auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);
 
   bool Changed = false;
   for (auto *CII : FEI.getCoroIds()) {
-    CoroIdElider CIE(CII, FEI, AA, DT, ORE);
+    CoroIdElider CIE(CII, FEI, AA, DT, PDT, ORE);
     Changed |= CIE.attemptElide();
   }
 
diff --git a/llvm/lib/Transforms/Coroutines/Coroutines.cpp b/llvm/lib/Transforms/Coroutines/Coroutines.cpp
index 1a92bc1636257..48c02e5406b75 100644
--- a/llvm/lib/Transforms/Coroutines/Coroutines.cpp
+++ b/llvm/lib/Transforms/Coroutines/Coroutines.cpp
@@ -86,6 +86,7 @@ static const char *const CoroIntrinsics[] = {
     "llvm.coro.prepare.retcon",
     "llvm.coro.promise",
     "llvm.coro.resume",
+    "llvm.coro.safe.elide",
     "llvm.coro.save",
     "llvm.coro.size",
     "llvm.coro.subfn.addr",

@yuxuanchen1997 yuxuanchen1997 changed the title [Clang] Introduce [[clang::structured_concurrency]] [Clang] Introduce [[clang::coro_structured_concurrency]] Jun 7, 2024
@yuxuanchen1997 yuxuanchen1997 changed the title [Clang] Introduce [[clang::coro_structured_concurrency]] [Clang] Improve CoroElide with [[clang::coro_structured_concurrency]] attribute for C++ Jun 14, 2024
@yuxuanchen1997 yuxuanchen1997 changed the title [Clang] Improve CoroElide with [[clang::coro_structured_concurrency]] attribute for C++ [Clang][Coroutines] Improve CoroElide with [[clang::coro_structured_concurrency]] attribute for C++ Jun 14, 2024
@yuxuanchen1997 yuxuanchen1997 force-pushed the structured_concurrency branch from 9907bd2 to 049c29d Compare June 14, 2024 20:22
@yuxuanchen1997
Copy link
Member Author

@ChuanqiXu9, this is a part of the systematic changes I was talking about. At this stage I know there are quite some rough edges but would like your input on the direction here.

@yuxuanchen1997 yuxuanchen1997 marked this pull request as ready for review June 14, 2024 20:27
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir llvm:transforms labels Jun 14, 2024
- The temporary expression containing the callee coroutine object is immediately co_awaited.

The behavior is undefined if any of the following condition was met:
- the caller coroutine is destroyed earlier than the callee coroutine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to carefully analyze any attributes that add new forms of undefined behavior. How do we expect the user to avoid this case? Is there some way we can make the behavior here deterministic? If we can't make it deterministic, is there some sanitizer that would catch this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the scrutiny here. In coroutine's case, developers don't author Task types themselves usually. The use case of this attribute is mostly within library/framework code. The attribute should only be used when such a library needs to communicate such a guarantee to the compiler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure we're clear about exactly which case we're talking about, can you write an example that triggers undefined behavior?

I'm not sure I see the connection between writing a task type and ensuring coroutines are destroyed in the right order... are you saying that a well-behaved Task will ensure destruction always happens in the right order, regardless of how it's used?

I'd still like an answer to my question about sanitizers.

Copy link
Member Author

@yuxuanchen1997 yuxuanchen1997 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure we're clear about exactly which case we're talking about, can you write an example that triggers undefined behavior?

Sure. Though the UB needs to be triggered from a place that's either:

  1. you have access to the handle to the callee after the caller has been destroyed.
  2. you destroy the caller of a currently running callee (potentially from another thread of execution).

An example would be

std::coroutine_handle<> await_suspend(std::coroutine_handle<> caller_handle) {
  caller_handle.destroy();
  return this->handle;
}

A task type whose associated awaiter implements its await_suspend like this should not be attributed structured concurrency. Same goes for other customization points where you get a hold of handles from both caller and the callee. Same goes for APIs in Task and Awaiter types that help other code extract both handles.

are you saying that a well-behaved Task will ensure destruction always happens in the right order, regardless of how it's used?

Yes. This is the assumption. The Task/Promise and even Awaiter types holding this attribute should not save/allow extraction of callee handle for the purpose of resumption. When such a way to break the structuredness is provided, the Task type should not be attributed as coro_structured_concurrency. This patch has no intention to eradicate the use of nonstructured concurrency. There are legitimate uses of them. It's just close to impossible to perform HALO.

I'd still like an answer to my question about sanitizers.

Missed this one in my prior response. The UB triggered from violation of the contract is effectively a use-after-free. Not a sanitizers expert, but ASan sounds like able to catch this?

@ChuanqiXu9
Copy link
Member

Thanks for the patch. I like it in the very high level. I did a quick scanning over the PR and here is some comments:

  • I feel the name containing concurrency is not proper. I don't feel it relates concurrency in any level.
  • Every time we add or change IR related to coroutines, we need to update https://llvm.org/docs/Coroutines.html. So that we can understand the semantics of the proposed llvm.coro.safe.elide much easier.
  • I'd like to add a new effect to the attribute to always inline (or an inline hint) every such callee function. Note that this won't be part of semantics but the implementation details.

For implementations,

  • It looks like the argument of llvm.coro.safe.elide is the object. This is not good. The object is the concept in the higher level language but not the middle end language.
  • What I prefer is to add a middle end function attribute (must-coro-elide) and apply this attribute and (always inline attribute) to the calls:
    • In the inliner, when we inlined a call with attribute must-coro-elide, we can add the attribute again to inlined intrinsics llvm.coro.id.
    • Then in the coro elider, we can decide whether or not to elide the specific corotuines by judging if the llvm.coro.id has the attribute very easily.

For FE implementation,

  • To implement the semantics precisely, I think we have to touch the Parser and the Sema. Otherwise, it may be pretty tricky to handle cases like co_await foo(bar(...)) or co_await (foo() + bar());. It is hacky to implement this in CodeGen.

@ChuanqiXu9
Copy link
Member

Then we can find that the frontend part is much more complex and harder than the middle end part. So maybe we can introduce the middle end part first and introduce a not so powerful attribute but introducing a function and statement level attribute must_elide in the beginning.

@yuxuanchen1997
Copy link
Member Author

Thanks for the feedback. This patch is the first iteration to model this idea as quickly as I can. In general, I agree with your comments.

  • I feel the name containing concurrency is not proper

The name is bikesheddable as always. I was also thinking around the line of [[clang::coro_inplace_awaitable_task]].

  • Every time we add or change IR related to coroutines, we need to update https://llvm.org/docs/Coroutines.html. So that we can understand the semantics of the proposed llvm.coro.safe.elide much easier.

Will do once we agree on a design.

  • I'd like to add a new effect to the attribute to always inline (or an inline hint) every such callee function. Note that this won't be part of semantics but the implementation details.

This is a good suggestion for the scope of another PR.

  • What I prefer is to add a middle end function attribute (must-coro-elide) and apply this attribute and (always inline attribute) to the calls

Do you mean the caller or the callee? I think both, right?

@yuxuanchen1997
Copy link
Member Author

What I prefer is to add a middle end function attribute (must-coro-elide) and apply this attribute and (always inline attribute) to the calls:

Actually this might be problematic. The same coroutine called in different contexts (e.g. one coroutine that is also attributed, another is a coroutine that does not follow the semantics of the said attribute) can have different elidability. We need to attribute the actual emitted CallInst/InvokeInst from the CallExpr in FE.

@ChuanqiXu9
Copy link
Member

Thanks for the feedback. This patch is the first iteration to model this idea as quickly as I can. In general, I agree with your comments.

  • I feel the name containing concurrency is not proper

The name is bikesheddable as always. I was also thinking around the line of [[clang::coro_inplace_awaitable_task]].

  • Every time we add or change IR related to coroutines, we need to update https://llvm.org/docs/Coroutines.html. So that we can understand the semantics of the proposed llvm.coro.safe.elide much easier.

Will do once we agree on a design.

  • I'd like to add a new effect to the attribute to always inline (or an inline hint) every such callee function. Note that this won't be part of semantics but the implementation details.

This is a good suggestion for the scope of another PR.

  • What I prefer is to add a middle end function attribute (must-coro-elide) and apply this attribute and (always inline attribute) to the calls

Do you mean the caller or the callee? I think both, right?

To calls. We can add attribute to calls instead of functions.

@ChuanqiXu9
Copy link
Member

Thanks for the feedback. This patch is the first iteration to model this idea as quickly as I can. In general, I agree with your comments.

  • I feel the name containing concurrency is not proper

The name is bikesheddable as always. I was also thinking around the line of [[clang::coro_inplace_awaitable_task]].

  • Every time we add or change IR related to coroutines, we need to update https://llvm.org/docs/Coroutines.html. So that we can understand the semantics of the proposed llvm.coro.safe.elide much easier.

Will do once we agree on a design.

It would be helpful for reviewers to understand the design ahead of time. For example, I can't be sure I understand your design 100% right now.

@yuxuanchen1997 yuxuanchen1997 changed the title [Clang][Coroutines] Improve CoroElide with [[clang::coro_structured_concurrency]] attribute for C++ [Clang][Coroutines] Introducing the [[clang::coro_inplace_task]] attribute Jun 17, 2024
@yuxuanchen1997 yuxuanchen1997 marked this pull request as draft June 17, 2024 21:34
#include "Inputs/utility.h"

template <typename T>
struct [[clang::coro_structured_concurrency]] Task {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaict, this particular Task can also be annotated using coro_only_destroy_when_complete?

I guess many libraries would want to use both coro_structured_concurrency / coro_inplace_task and coro_only_destroy_when_complete. Do those optimizations interact well with each other? Does it make sense to add a test case which tests both attributes together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no. It's perfect valid to destroy this task before completion. See this as an example:

Task bar();
Task foo() {
   co_await bar();
}

You can totally call foo and destroy it, as long as bar is not running in parallel and you don't try to resume bar (through a handle you smuggled with the task, which this type does not allow to do)

Copy link
Member

@vogelsgesang vogelsgesang Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

through a handle you smuggled with the task, which this type does not allow to do

that's the point I am getting at. Most task types do not allow smuggling handles out of the types. As such, I expect that many task types would be annotated with both coro_structured_concurrency / coro_inplace_task and coro_only_destroy_when_complete. I guess we should have a test case that both attributes interact nicely with each other?

@yuxuanchen1997 yuxuanchen1997 force-pushed the structured_concurrency branch 3 times, most recently from 349a9b5 to 7488b18 Compare June 18, 2024 05:57
Copy link

github-actions bot commented Jun 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@yuxuanchen1997 yuxuanchen1997 force-pushed the structured_concurrency branch 2 times, most recently from 3806693 to b263674 Compare July 8, 2024 18:14
@yuxuanchen1997 yuxuanchen1997 force-pushed the structured_concurrency branch 19 times, most recently from bcd6e4f to 4f959ee Compare July 15, 2024 21:54
Implement noalloc copy

add CoroAnnotationElidePass
@yuxuanchen1997
Copy link
Member Author

Closing this PR as we are discussing on https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/

and as suggested this PR is split into a FE PR #98971 and a ME PR #98974

yuxuanchen1997 added a commit that referenced this pull request Sep 9, 2024
…await_elidable]] (#99282)

This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
  co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.

`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.

The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.

When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.

From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.

The original patch version is
#94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroSplit patch can be found at
#99283
The middle end transformation that performs the elide can be found at
#99285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category coroutines C++20 coroutines llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants