Skip to content

[Clang][OpenMP] Fix tile/unroll on iterator- and foreach-loops. #91459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 22, 2024

Conversation

Meinersbur
Copy link
Member

OpenMP loop transformation did not work on a for-loop using an iterator or range-based for-loops. The first reason is that it combined the iterator's type for generated loops with the type of NumIterations as generated for any OMPLoopBasedDirective which is an integer. Fixed by basing all generated loop variables on NumIterations.

Second, C++11 range-based for-loops include syntactic sugar that needs to be executed before the loop. This additional code is now added to the construct's Pre-Init lists.

Third, C++20 added an initializer statement to range-based for-loops which is also added to the pre-init statement. PreInits used to be a DeclStmt which made it difficult to add arbitrary statements from CXXRangeForStmt's syntactic sugar, especially the for-loops init statement which does not need to be a declaration. Change it to be a general Stmt that can be a CompoundStmt to hold arbitrary Stmts, including DeclStmts. This also avoids the PointerUnion workaround used by checkTransformableLoopNest.

End-to-end tests are added to verify the expected number and order of loop execution and evaluations of expressions (such as iterator dereference). The order and number of evaluations of expressions in canonical loops is explicitly undefined by OpenMP but checked here for clarification and for changes to be noticed.

@Meinersbur Meinersbur marked this pull request as ready for review May 13, 2024 08:51
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang openmp:libomp OpenMP host runtime labels May 13, 2024
@llvmbot
Copy link
Member

llvmbot commented May 13, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Michael Kruse (Meinersbur)

Changes

OpenMP loop transformation did not work on a for-loop using an iterator or range-based for-loops. The first reason is that it combined the iterator's type for generated loops with the type of NumIterations as generated for any OMPLoopBasedDirective which is an integer. Fixed by basing all generated loop variables on NumIterations.

Second, C++11 range-based for-loops include syntactic sugar that needs to be executed before the loop. This additional code is now added to the construct's Pre-Init lists.

Third, C++20 added an initializer statement to range-based for-loops which is also added to the pre-init statement. PreInits used to be a DeclStmt which made it difficult to add arbitrary statements from CXXRangeForStmt's syntactic sugar, especially the for-loops init statement which does not need to be a declaration. Change it to be a general Stmt that can be a CompoundStmt to hold arbitrary Stmts, including DeclStmts. This also avoids the PointerUnion workaround used by checkTransformableLoopNest.

End-to-end tests are added to verify the expected number and order of loop execution and evaluations of expressions (such as iterator dereference). The order and number of evaluations of expressions in canonical loops is explicitly undefined by OpenMP but checked here for clarification and for changes to be noticed.


Patch is 170.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91459.diff

18 Files Affected:

  • (modified) clang/include/clang/Sema/SemaOpenMP.h (+1-3)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+23-6)
  • (modified) clang/lib/Sema/SemaOpenMP.cpp (+135-55)
  • (modified) clang/test/OpenMP/tile_codegen.cpp (+678-209)
  • (modified) clang/test/OpenMP/tile_codegen_for_dependent.cpp (+65-65)
  • (modified) clang/test/OpenMP/tile_codegen_tile_for.cpp (+111-107)
  • (modified) openmp/runtime/test/lit.cfg (+4)
  • (added) openmp/runtime/test/transform/tile/foreach.cpp (+228)
  • (added) openmp/runtime/test/transform/tile/iterfor.cpp (+233)
  • (added) openmp/runtime/test/transform/tile/parallel-wsloop-collapse-foreach.cpp (+366)
  • (added) openmp/runtime/test/transform/unroll/factor_foreach.cpp (+162)
  • (added) openmp/runtime/test/transform/unroll/factor_intfor.c (+25)
  • (added) openmp/runtime/test/transform/unroll/factor_iterfor.cpp (+169)
  • (added) openmp/runtime/test/transform/unroll/factor_parallel-wsloop-collapse-foreach.cpp (+199)
  • (added) openmp/runtime/test/transform/unroll/factor_parallel-wsloop-collapse-intfor.cpp (+32)
  • (added) openmp/runtime/test/transform/unroll/full_intfor.c (+25)
  • (added) openmp/runtime/test/transform/unroll/heuristic_intfor.c (+25)
  • (added) openmp/runtime/test/transform/unroll/partial_intfor.c (+25)
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index 9927459bbc594..51981e1c9a8b9 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -1390,9 +1390,7 @@ class SemaOpenMP : public SemaBase {
   bool checkTransformableLoopNest(
       OpenMPDirectiveKind Kind, Stmt *AStmt, int NumLoops,
       SmallVectorImpl<OMPLoopBasedDirective::HelperExprs> &LoopHelpers,
-      Stmt *&Body,
-      SmallVectorImpl<SmallVector<llvm::PointerUnion<Stmt *, Decl *>, 0>>
-          &OriginalInits);
+      Stmt *&Body, SmallVectorImpl<SmallVector<Stmt *, 0>> &OriginalInits);
 
   /// Helper to keep information about the current `omp begin/end declare
   /// variant` nesting.
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index ef3aa3a8e0dc6..5069f5b5a291b 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -142,7 +142,7 @@ class OMPTeamsScope final : public OMPLexicalScope {
 /// of used expression from loop statement.
 class OMPLoopScope : public CodeGenFunction::RunCleanupsScope {
   void emitPreInitStmt(CodeGenFunction &CGF, const OMPLoopBasedDirective &S) {
-    const DeclStmt *PreInits;
+    const Stmt *PreInits;
     CodeGenFunction::OMPMapVars PreCondVars;
     if (auto *LD = dyn_cast<OMPLoopDirective>(&S)) {
       llvm::DenseSet<const VarDecl *> EmittedAsPrivate;
@@ -182,17 +182,34 @@ class OMPLoopScope : public CodeGenFunction::RunCleanupsScope {
             }
             return false;
           });
-      PreInits = cast_or_null<DeclStmt>(LD->getPreInits());
+      PreInits = LD->getPreInits();
     } else if (const auto *Tile = dyn_cast<OMPTileDirective>(&S)) {
-      PreInits = cast_or_null<DeclStmt>(Tile->getPreInits());
+      PreInits = Tile->getPreInits();
     } else if (const auto *Unroll = dyn_cast<OMPUnrollDirective>(&S)) {
-      PreInits = cast_or_null<DeclStmt>(Unroll->getPreInits());
+      PreInits = Unroll->getPreInits();
     } else {
       llvm_unreachable("Unknown loop-based directive kind.");
     }
     if (PreInits) {
-      for (const auto *I : PreInits->decls())
-        CGF.EmitVarDecl(cast<VarDecl>(*I));
+      // CompoundStmts and DeclStmts are used as lists of PreInit statements and
+      // declarations. Since declarations must be visible in the the following
+      // that they initialize, unpack the ComboundStmt they are nested in.
+      SmallVector<const Stmt *> PreInitStmts;
+      if (auto *PreInitCompound = dyn_cast<CompoundStmt>(PreInits))
+        llvm::append_range(PreInitStmts, PreInitCompound->body());
+      else
+        PreInitStmts.push_back(PreInits);
+
+      for (const Stmt *S : PreInitStmts) {
+        // EmitStmt skips any OMPCapturedExprDecls, but needs to be emitted
+        // here.
+        if (auto *PreInitDecl = dyn_cast<DeclStmt>(S)) {
+          for (Decl *I : PreInitDecl->decls())
+            CGF.EmitVarDecl(cast<VarDecl>(*I));
+          continue;
+        }
+        CGF.EmitStmt(S);
+      }
     }
     PreCondVars.restore(CGF);
   }
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 6e19f47356964..da60fe4461ef2 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -9827,6 +9827,23 @@ buildPreInits(ASTContext &Context,
   return nullptr;
 }
 
+/// Build pre-init statement for the given statements.
+static Stmt *buildPreInits(ASTContext &Context, ArrayRef<Stmt *> PreInits) {
+  if (!PreInits.empty()) {
+    SmallVector<Stmt *> Stmts;
+    for (Stmt *S : PreInits) {
+      // Do not nest CompoundStmts.
+      if (auto *CS = dyn_cast<CompoundStmt>(S)) {
+        llvm::append_range(Stmts, CS->body());
+        continue;
+      }
+      Stmts.push_back(S);
+    }
+    return CompoundStmt::Create(Context, PreInits, FPOptionsOverride(), {}, {});
+  }
+  return nullptr;
+}
+
 /// Build postupdate expression for the given list of postupdates expressions.
 static Expr *buildPostUpdate(Sema &S, ArrayRef<Expr *> PostUpdates) {
   Expr *PostUpdate = nullptr;
@@ -9923,11 +9940,24 @@ checkOpenMPLoop(OpenMPDirectiveKind DKind, Expr *CollapseLoopCountExpr,
             Stmt *DependentPreInits = Transform->getPreInits();
             if (!DependentPreInits)
               return;
-            for (Decl *C : cast<DeclStmt>(DependentPreInits)->getDeclGroup()) {
-              auto *D = cast<VarDecl>(C);
-              DeclRefExpr *Ref = buildDeclRefExpr(SemaRef, D, D->getType(),
-                                                  Transform->getBeginLoc());
-              Captures[Ref] = Ref;
+
+            // Search for pre-init declared variables that need to be captured
+            // to be referenceable inside the directive.
+            SmallVector<Stmt *> Constituents;
+            if (auto *CS = dyn_cast<CompoundStmt>(DependentPreInits))
+              llvm::append_range(Constituents, CS->body());
+            else
+              Constituents.push_back(DependentPreInits);
+            for (Stmt *S : Constituents) {
+              if (DeclStmt *DC = dyn_cast<DeclStmt>(S)) {
+                for (Decl *C : DC->decls()) {
+                  auto *D = cast<VarDecl>(C);
+                  DeclRefExpr *Ref = buildDeclRefExpr(
+                      SemaRef, D, D->getType().getNonReferenceType(),
+                      Transform->getBeginLoc());
+                  Captures[Ref] = Ref;
+                }
+              }
             }
           }))
     return 0;
@@ -15058,9 +15088,7 @@ StmtResult SemaOpenMP::ActOnOpenMPTargetTeamsDistributeSimdDirective(
 bool SemaOpenMP::checkTransformableLoopNest(
     OpenMPDirectiveKind Kind, Stmt *AStmt, int NumLoops,
     SmallVectorImpl<OMPLoopBasedDirective::HelperExprs> &LoopHelpers,
-    Stmt *&Body,
-    SmallVectorImpl<SmallVector<llvm::PointerUnion<Stmt *, Decl *>, 0>>
-        &OriginalInits) {
+    Stmt *&Body, SmallVectorImpl<SmallVector<Stmt *, 0>> &OriginalInits) {
   OriginalInits.emplace_back();
   bool Result = OMPLoopBasedDirective::doForAllLoops(
       AStmt->IgnoreContainers(), /*TryImperfectlyNestedLoops=*/false, NumLoops,
@@ -15096,14 +15124,75 @@ bool SemaOpenMP::checkTransformableLoopNest(
           llvm_unreachable("Unhandled loop transformation");
         if (!DependentPreInits)
           return;
-        llvm::append_range(OriginalInits.back(),
-                           cast<DeclStmt>(DependentPreInits)->getDeclGroup());
+        // CompoundStmts are used as lists of other statements, add their
+        // contents, not the lists themselves to avoid nesting. This is
+        // necessary because DeclStmts need to be visible after the pre-init.
+        else if (auto *CS = dyn_cast<CompoundStmt>(DependentPreInits))
+          llvm::append_range(OriginalInits.back(), CS->body());
+        else
+          OriginalInits.back().push_back(DependentPreInits);
       });
   assert(OriginalInits.back().empty() && "No preinit after innermost loop");
   OriginalInits.pop_back();
   return Result;
 }
 
+/// Add preinit statements that need to be propageted from the selected loop.
+static void addLoopPreInits(ASTContext &Context,
+                            OMPLoopBasedDirective::HelperExprs &LoopHelper,
+                            Stmt *LoopStmt, ArrayRef<Stmt *> OriginalInit,
+                            SmallVectorImpl<Stmt *> &PreInits) {
+
+  // For range-based for-statements, ensure that their syntactic sugar is
+  // executed by adding them as pre-init statements.
+  if (auto *CXXRangeFor = dyn_cast<CXXForRangeStmt>(LoopStmt)) {
+    Stmt *RangeInit = CXXRangeFor->getInit();
+    if (RangeInit)
+      PreInits.push_back(RangeInit);
+
+    DeclStmt *RangeStmt = CXXRangeFor->getRangeStmt();
+    PreInits.push_back(new (Context) DeclStmt(RangeStmt->getDeclGroup(),
+                                              RangeStmt->getBeginLoc(),
+                                              RangeStmt->getEndLoc()));
+
+    DeclStmt *RangeEnd = CXXRangeFor->getEndStmt();
+    PreInits.push_back(new (Context) DeclStmt(RangeEnd->getDeclGroup(),
+                                              RangeEnd->getBeginLoc(),
+                                              RangeEnd->getEndLoc()));
+  }
+
+  llvm::append_range(PreInits, OriginalInit);
+
+  // List of OMPCapturedExprDecl, for __begin, __end, and NumIterations
+  if (auto *PI = cast_or_null<DeclStmt>(LoopHelper.PreInits)) {
+    PreInits.push_back(new (Context) DeclStmt(
+        PI->getDeclGroup(), PI->getBeginLoc(), PI->getEndLoc()));
+  }
+
+  // Gather declarations for the data members used as counters.
+  for (Expr *CounterRef : LoopHelper.Counters) {
+    auto *CounterDecl = cast<DeclRefExpr>(CounterRef)->getDecl();
+    if (isa<OMPCapturedExprDecl>(CounterDecl))
+      PreInits.push_back(new (Context) DeclStmt(
+          DeclGroupRef(CounterDecl), SourceLocation(), SourceLocation()));
+  }
+}
+
+/// Collect the loop statements (ForStmt or CXXRangeForStmt) of the affected
+/// loop of a construct.
+static void collectLoopStmts(Stmt *AStmt, MutableArrayRef<Stmt *> LoopStmts) {
+  size_t NumLoops = LoopStmts.size();
+  OMPLoopBasedDirective::doForAllLoops(
+      AStmt, /*TryImperfectlyNestedLoops=*/false, NumLoops,
+      [LoopStmts](unsigned Cnt, Stmt *CurStmt) {
+        assert(!LoopStmts[Cnt] && "Loop statement must not yet be assigned");
+        LoopStmts[Cnt] = CurStmt;
+        return false;
+      });
+  assert(llvm::all_of(LoopStmts, [](Stmt *LoopStmt) { return LoopStmt; }) &&
+         "Expecting a loop statement for each affected loop");
+}
+
 StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
                                                 Stmt *AStmt,
                                                 SourceLocation StartLoc,
@@ -15125,8 +15214,7 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
   // Verify and diagnose loop nest.
   SmallVector<OMPLoopBasedDirective::HelperExprs, 4> LoopHelpers(NumLoops);
   Stmt *Body = nullptr;
-  SmallVector<SmallVector<llvm::PointerUnion<Stmt *, Decl *>, 0>, 4>
-      OriginalInits;
+  SmallVector<SmallVector<Stmt *, 0>, 4> OriginalInits;
   if (!checkTransformableLoopNest(OMPD_tile, AStmt, NumLoops, LoopHelpers, Body,
                                   OriginalInits))
     return StmtError();
@@ -15143,7 +15231,11 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
          "Expecting loop iteration space dimensionality to match number of "
          "affected loops");
 
-  SmallVector<Decl *, 4> PreInits;
+  // Collect all affected loop statements.
+  SmallVector<Stmt *> LoopStmts(NumLoops, nullptr);
+  collectLoopStmts(AStmt, LoopStmts);
+
+  SmallVector<Stmt *, 4> PreInits;
   CaptureVars CopyTransformer(SemaRef);
 
   // Create iteration variables for the generated loops.
@@ -15183,20 +15275,9 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
           &SemaRef.PP.getIdentifierTable().get(TileCntName));
       TileIndVars[I] = TileCntDecl;
     }
-    for (auto &P : OriginalInits[I]) {
-      if (auto *D = P.dyn_cast<Decl *>())
-        PreInits.push_back(D);
-      else if (auto *PI = dyn_cast_or_null<DeclStmt>(P.dyn_cast<Stmt *>()))
-        PreInits.append(PI->decl_begin(), PI->decl_end());
-    }
-    if (auto *PI = cast_or_null<DeclStmt>(LoopHelper.PreInits))
-      PreInits.append(PI->decl_begin(), PI->decl_end());
-    // Gather declarations for the data members used as counters.
-    for (Expr *CounterRef : LoopHelper.Counters) {
-      auto *CounterDecl = cast<DeclRefExpr>(CounterRef)->getDecl();
-      if (isa<OMPCapturedExprDecl>(CounterDecl))
-        PreInits.push_back(CounterDecl);
-    }
+
+    addLoopPreInits(Context, LoopHelper, LoopStmts[I], OriginalInits[I],
+                    PreInits);
   }
 
   // Once the original iteration values are set, append the innermost body.
@@ -15237,19 +15318,20 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
     OMPLoopBasedDirective::HelperExprs &LoopHelper = LoopHelpers[I];
     Expr *NumIterations = LoopHelper.NumIterations;
     auto *OrigCntVar = cast<DeclRefExpr>(LoopHelper.Counters[0]);
-    QualType CntTy = OrigCntVar->getType();
+    QualType IVTy = NumIterations->getType();
+    Stmt *LoopStmt = LoopStmts[I];
 
     // Commonly used variables. One of the constraints of an AST is that every
     // node object must appear at most once, hence we define lamdas that create
     // a new AST node at every use.
-    auto MakeTileIVRef = [&SemaRef = this->SemaRef, &TileIndVars, I, CntTy,
+    auto MakeTileIVRef = [&SemaRef = this->SemaRef, &TileIndVars, I, IVTy,
                           OrigCntVar]() {
-      return buildDeclRefExpr(SemaRef, TileIndVars[I], CntTy,
+      return buildDeclRefExpr(SemaRef, TileIndVars[I], IVTy,
                               OrigCntVar->getExprLoc());
     };
-    auto MakeFloorIVRef = [&SemaRef = this->SemaRef, &FloorIndVars, I, CntTy,
+    auto MakeFloorIVRef = [&SemaRef = this->SemaRef, &FloorIndVars, I, IVTy,
                            OrigCntVar]() {
-      return buildDeclRefExpr(SemaRef, FloorIndVars[I], CntTy,
+      return buildDeclRefExpr(SemaRef, FloorIndVars[I], IVTy,
                               OrigCntVar->getExprLoc());
     };
 
@@ -15311,6 +15393,8 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
     // further into the inner loop.
     SmallVector<Stmt *, 4> BodyParts;
     BodyParts.append(LoopHelper.Updates.begin(), LoopHelper.Updates.end());
+    if (auto *SourceCXXFor = dyn_cast<CXXForRangeStmt>(LoopStmt))
+      BodyParts.push_back(SourceCXXFor->getLoopVarStmt());
     BodyParts.push_back(Inner);
     Inner = CompoundStmt::Create(Context, BodyParts, FPOptionsOverride(),
                                  Inner->getBeginLoc(), Inner->getEndLoc());
@@ -15325,12 +15409,14 @@ StmtResult SemaOpenMP::ActOnOpenMPTileDirective(ArrayRef<OMPClause *> Clauses,
     auto &LoopHelper = LoopHelpers[I];
     Expr *NumIterations = LoopHelper.NumIterations;
     DeclRefExpr *OrigCntVar = cast<DeclRefExpr>(LoopHelper.Counters[0]);
-    QualType CntTy = OrigCntVar->getType();
+    QualType IVTy = NumIterations->getType();
 
-    // Commonly used variables.
-    auto MakeFloorIVRef = [&SemaRef = this->SemaRef, &FloorIndVars, I, CntTy,
+    // Commonly used variables. One of the constraints of an AST is that every
+    // node object must appear at most once, hence we define lamdas that create
+    // a new AST node at every use.
+    auto MakeFloorIVRef = [&SemaRef = this->SemaRef, &FloorIndVars, I, IVTy,
                            OrigCntVar]() {
-      return buildDeclRefExpr(SemaRef, FloorIndVars[I], CntTy,
+      return buildDeclRefExpr(SemaRef, FloorIndVars[I], IVTy,
                               OrigCntVar->getExprLoc());
     };
 
@@ -15396,8 +15482,7 @@ StmtResult SemaOpenMP::ActOnOpenMPUnrollDirective(ArrayRef<OMPClause *> Clauses,
   Stmt *Body = nullptr;
   SmallVector<OMPLoopBasedDirective::HelperExprs, NumLoops> LoopHelpers(
       NumLoops);
-  SmallVector<SmallVector<llvm::PointerUnion<Stmt *, Decl *>, 0>, NumLoops + 1>
-      OriginalInits;
+  SmallVector<SmallVector<Stmt *, 0>, NumLoops + 1> OriginalInits;
   if (!checkTransformableLoopNest(OMPD_unroll, AStmt, NumLoops, LoopHelpers,
                                   Body, OriginalInits))
     return StmtError();
@@ -15409,6 +15494,10 @@ StmtResult SemaOpenMP::ActOnOpenMPUnrollDirective(ArrayRef<OMPClause *> Clauses,
     return OMPUnrollDirective::Create(Context, StartLoc, EndLoc, Clauses, AStmt,
                                       NumGeneratedLoops, nullptr, nullptr);
 
+  assert(LoopHelpers.size() == NumLoops &&
+         "Expecting a single-dimensional loop iteration space");
+  assert(OriginalInits.size() == NumLoops &&
+         "Expecting a single-dimensional loop iteration space");
   OMPLoopBasedDirective::HelperExprs &LoopHelper = LoopHelpers.front();
 
   if (FullClause) {
@@ -15472,24 +15561,13 @@ StmtResult SemaOpenMP::ActOnOpenMPUnrollDirective(ArrayRef<OMPClause *> Clauses,
   // of a canonical loop nest where these PreInits are emitted before the
   // outermost directive.
 
+  // Find the loop statement.
+  Stmt *LoopStmt = nullptr;
+  collectLoopStmts(AStmt, {LoopStmt});
+
   // Determine the PreInit declarations.
-  SmallVector<Decl *, 4> PreInits;
-  assert(OriginalInits.size() == 1 &&
-         "Expecting a single-dimensional loop iteration space");
-  for (auto &P : OriginalInits[0]) {
-    if (auto *D = P.dyn_cast<Decl *>())
-      PreInits.push_back(D);
-    else if (auto *PI = dyn_cast_or_null<DeclStmt>(P.dyn_cast<Stmt *>()))
-      PreInits.append(PI->decl_begin(), PI->decl_end());
-  }
-  if (auto *PI = cast_or_null<DeclStmt>(LoopHelper.PreInits))
-    PreInits.append(PI->decl_begin(), PI->decl_end());
-  // Gather declarations for the data members used as counters.
-  for (Expr *CounterRef : LoopHelper.Counters) {
-    auto *CounterDecl = cast<DeclRefExpr>(CounterRef)->getDecl();
-    if (isa<OMPCapturedExprDecl>(CounterDecl))
-      PreInits.push_back(CounterDecl);
-  }
+  SmallVector<Stmt *, 4> PreInits;
+  addLoopPreInits(Context, LoopHelper, LoopStmt, OriginalInits[0], PreInits);
 
   auto *IterationVarRef = cast<DeclRefExpr>(LoopHelper.IterationVarRef);
   QualType IVTy = IterationVarRef->getType();
@@ -15595,6 +15673,8 @@ StmtResult SemaOpenMP::ActOnOpenMPUnrollDirective(ArrayRef<OMPClause *> Clauses,
   // Inner For statement.
   SmallVector<Stmt *> InnerBodyStmts;
   InnerBodyStmts.append(LoopHelper.Updates.begin(), LoopHelper.Updates.end());
+  if (auto *CXXRangeFor = dyn_cast<CXXForRangeStmt>(LoopStmt))
+    InnerBodyStmts.push_back(CXXRangeFor->getLoopVarStmt());
   InnerBodyStmts.push_back(Body);
   CompoundStmt *InnerBody =
       CompoundStmt::Create(getASTContext(), InnerBodyStmts, FPOptionsOverride(),
diff --git a/clang/test/OpenMP/tile_codegen.cpp b/clang/test/OpenMP/tile_codegen.cpp
index 93a3a14133ab5..5fd5609b844cc 100644
--- a/clang/test/OpenMP/tile_codegen.cpp
+++ b/clang/test/OpenMP/tile_codegen.cpp
@@ -1,10 +1,10 @@
-// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _ --version 4
 // Check code generation
-// RUN: %clang_cc1 -verify -triple x86_64-pc-linux-gnu -fclang-abi-compat=latest -fopenmp -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK1
+// RUN: %clang_cc1 -verify -triple x86_64-pc-linux-gnu -fclang-abi-compat=latest -std=c++20 -fopenmp -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK1
 
 // Check same results after serialization round-trip
-// RUN: %clang_cc1 -verify -triple x86_64-pc-linux-gnu -fclang-abi-compat=latest -fopenmp -emit-pch -o %t %s
-// RUN: %clang_cc1 -verify -triple x86_64-pc-linux-gnu -fclang-abi-compat=latest -fopenmp -include-pch %t -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK2
+// RUN: %clang_cc1 -verify -triple x86_64-pc-linux-gnu -fclang-abi-compat=latest -std=c++20 -fopenmp -emit-pch -o %t %s
+// RUN: %clang_cc1 -verify -triple x86_64-pc-linux-gnu -fclang-abi-compat=latest -std=c++20 -fopenmp -include-pch %t -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECK2
 // expected-no-diagnostics
 
 #ifndef HEADER
@@ -91,22 +91,38 @@ extern "C" void foo8(int a) {
 }
 
 
+typedef struct { double array[12]; } data_t;
+extern "C" void foo9(data_t data) {
+#pragma omp tile sizes(5)
+  for (double v : data.array)
+    body(v);
+}
+
+
+extern "C" void foo10(data_t data) {
+#pragma omp tile sizes(5)
+  for (double c = 42.0; double v : data.array)
+    body(c, v);
+}
+
+
 #endif /* HEADER */
-// CHECK1-LABEL: define {{[^@]+}}@body
-// CHECK1-SAME: (...) #[[ATTR0:[0-9]+]] {
+
+// CHECK1-LABEL: define dso_local void @body(
+// CHECK1-SAME: ...) #[[ATTR0:[0-9]+]] {
 // CHECK1-NEXT:  entry:
 // CHECK1-NEXT:    ret void
 //
 //
-// CHECK1-LABEL: define {{[^@]+}}@__cxx_global_var_init
-// CHECK1-SAME: () #[[ATTR1:[0-9]+]] section ".text.startup" {
+// CHECK1-LABEL: ...
[truncated]

@Meinersbur Meinersbur requested a review from alexey-bataev May 13, 2024 08:52
// declarations. Since declarations must be visible in the the following
// that they initialize, unpack the ComboundStmt they are nested in.
SmallVector<const Stmt *> PreInitStmts;
if (auto *PreInitCompound = dyn_cast<CompoundStmt>(PreInits))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to create coumpound stmt? DeclStmt support multiple declarations itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explained in the summary. In essence, the init-statement for a C++20 foreach-loop does not need to be a DeclStmt, but can be an arbitrary Stmt.

Base automatically changed from users/meinersbur/clang_openmp_tile_sizes_nonconstant to main May 13, 2024 14:10
@Meinersbur Meinersbur force-pushed the users/meinersbur/clang_openmp_unroll-tile_foreach branch from b8b1260 to d6057c4 Compare May 21, 2024 13:22
for (const Stmt *S : PreInitStmts) {
// EmitStmt skips any OMPCapturedExprDecls, but needs to be emitted
// here.
if (auto *PreInitDecl = dyn_cast<DeclStmt>(S)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you still emit only DeclStmts? Then why you can't put all Decls into a single one DeclStmt upon creation in Sema?

Copy link
Member Author

@Meinersbur Meinersbur May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else is emitted in CGF.EmitStmt(S); at line 211.

CGF.EmitStmt(S) does itself call CGF.EmitVarDecl(S) if passed a VarDecl, so this special handling should not be necessary. It includes, however, an exception for OMPCapturedExprDecl (subclass of VarDecl) that are NOT emitted so we need to do this explicitly here. Otherwise, lines 203-212 would be just a single CGF.EmitStmt(S).

case Decl::OMPCapturedExpr:

@@ -156,9 +156,9 @@ extern "C" void body(...) {}
// IR-EMPTY:
// IR-NEXT: [[FOR_INC]]:
// IR-NEXT: %[[TMP34:.+]] = load i32, ptr %[[DOTTILE_0_IV_I]], align 4
// IR-NEXT: %[[INC:.+]] = add nsw i32 %[[TMP34]], 1
// IR-NEXT: %[[INC:.+]] = add i32 %[[TMP34]], 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why nsw is dropped? Another question - can be nuw added?

Copy link
Member Author

@Meinersbur Meinersbur May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the change of using the type of LoopHelper.NumIterations (an unsigned integer) as the new loop's induction variable (instead of the original loop's iteration variable type). In iterator-based loops that type would be incompatible with LoopHelper.NumIterations which remains an integer.

Even if the original loop variable is a signed integer (it could also be a std::iota_view over the same range), using and unsigned type is more correct as long as we start counting at 0. For instance, the loop

#pragma omp tile sizes(2)
for (int i = INT_MIN; i < INT_MAX; ++i)

would overflow the nsw flag.

The "most correct" fix would be to add the nuw flag. Unfortunately there is no AST expression we could create that would make CodeGen generate one. Alternatively, I could make generated loops start counting at INT_MIN.

Copy link
Member Author

@Meinersbur Meinersbur May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuw would be an idea for a Clang extension in the footsteps of vector extensions or _ExtInt which also exposed a feature that LLVM always supported. E.g.

typedef unsigned nowrap_unsigned_t __attribute__((ext_no_wrap));
for (nowrap_unsigned_t i = 0; i < n; ++i)

However, the language consequences are immense.

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@Meinersbur Meinersbur merged commit 9120562 into main May 22, 2024
3 of 4 checks passed
@Meinersbur Meinersbur deleted the users/meinersbur/clang_openmp_unroll-tile_foreach branch May 22, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category openmp:libomp OpenMP host runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants