Skip to content

[Delinearization] Add function for fixed size array without relying on GEP #145050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kasuga-fj
Copy link
Contributor

@kasuga-fj kasuga-fj commented Jun 20, 2025

The existing functions getIndexExpressionsFromGEP and tryDelinearizeFixedSizeImpl provide functionality to delinearize memory accesses for fixed size array. They use the GEP source element type in their optimization heuristics. However, driving optimization heuristics based on GEP type information is not allowed.

This patch introduces a new function delinearizeFixedSizeArray to delinearize a fixed size array without using the type information in GEP. The new function infers the size of each dimension of the array based on the value to be added to the address as induction variables are incremented. This is an initial implementation that may not cover all cases, but is intended to replace the existing function in the future.

Related:

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jun 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Ryotaro Kasuga (kasuga-fj)

Changes

The existing functions getIndexExpressionsFromGEP and tryDelinearizeFixedSizeImpl provide functionality to delinearize memory accesses for fixed size array. They use the GEP source element type in their optimization heuristics. However, driving optimization heuristics based on GEP type information is not allowed.
This patch introduces a new function delinearizeFixedSizeArray to remove them. This is an initial implementation that may not cover all cases, but is intended to replace the existing function in the future.


Patch is 22.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145050.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/Delinearization.h (+23)
  • (modified) llvm/lib/Analysis/Delinearization.cpp (+188-2)
  • (added) llvm/test/Analysis/Delinearization/fixed_size_array.ll (+446)
diff --git a/llvm/include/llvm/Analysis/Delinearization.h b/llvm/include/llvm/Analysis/Delinearization.h
index eb775babd6061..dca423235b3c0 100644
--- a/llvm/include/llvm/Analysis/Delinearization.h
+++ b/llvm/include/llvm/Analysis/Delinearization.h
@@ -112,6 +112,29 @@ void delinearize(ScalarEvolution &SE, const SCEV *Expr,
                  SmallVectorImpl<const SCEV *> &Subscripts,
                  SmallVectorImpl<const SCEV *> &Sizes, const SCEV *ElementSize);
 
+/// Split this SCEVAddRecExpr into two vectors of SCEVs representing the
+/// subscripts and sizes of an access to a fixed size array. This is a special
+/// case of delinearization for fixed size arrays.
+///
+/// The delinearization is a 2 step process: the first step estimates the sizes
+/// of each dimension of the array. The second step computes the access
+/// functions for the delinearized array:
+///
+/// 1. Compute the array size
+/// 2. Compute the access function: same as normal delinearization
+///
+/// Different from the normal delinearization, this function assumes that NO
+/// terms exist in the \p Expr. In other words, it assumes that the all step
+/// values are constant.
+///
+/// This function is intended to replace getIndexExpressionsFromGEP and
+/// tryDelinearizeFixedSizeImpl. They rely on the GEP source element type so
+/// that they will be removed in the future.
+void delinearizeFixedSizeArray(ScalarEvolution &SE, const SCEV *Expr,
+                               SmallVectorImpl<const SCEV *> &Subscripts,
+                               SmallVectorImpl<const SCEV *> &Sizes,
+                               const SCEV *ElementSize);
+
 /// Gathers the individual index expressions from a GEP instruction.
 ///
 /// This function optimistically assumes the GEP references into a fixed size
diff --git a/llvm/lib/Analysis/Delinearization.cpp b/llvm/lib/Analysis/Delinearization.cpp
index 329bd35530c72..30400b4e39cf0 100644
--- a/llvm/lib/Analysis/Delinearization.cpp
+++ b/llvm/lib/Analysis/Delinearization.cpp
@@ -24,6 +24,7 @@
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 
@@ -32,6 +33,11 @@ using namespace llvm;
 #define DL_NAME "delinearize"
 #define DEBUG_TYPE DL_NAME
 
+static cl::opt<bool> UseFixedSizeArrayHeuristic(
+    "delinearize-use-fixed-size-array-heuristic", cl::init(false), cl::Hidden,
+    cl::desc("When printing analysis, use the heuristic for fixed-size arrays "
+             "if the default delinearizetion fails."));
+
 // Return true when S contains at least an undef value.
 static inline bool containsUndefs(const SCEV *S) {
   return SCEVExprContains(S, [](const SCEV *S) {
@@ -480,6 +486,175 @@ void llvm::delinearize(ScalarEvolution &SE, const SCEV *Expr,
   });
 }
 
+static std::optional<APInt> tryIntoAPInt(const SCEV *S) {
+  if (const auto *Const = dyn_cast<SCEVConstant>(S))
+    return Const->getAPInt();
+  return std::nullopt;
+}
+
+/// Collects the absolute values of constant steps for all induction variables.
+/// Returns true if we can prove that all step values are constants and \p Expr
+/// is dividable by \p ElementSize. Each step value is stored in \p Steps after
+/// divided by \p ElementSize.
+static bool collectConstantAbsSteps(ScalarEvolution &SE, const SCEV *Expr,
+                                    SmallVectorImpl<unsigned> &Steps,
+                                    unsigned ElementSize) {
+  // End of recursion. The constant value also must be a multiple of
+  // ElementSize.
+  if (const auto *Const = dyn_cast<SCEVConstant>(Expr)) {
+    const unsigned Mod = Const->getAPInt().urem(ElementSize);
+    return Mod == 0;
+  }
+
+  const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Expr);
+  if (!AR || !AR->isAffine())
+    return false;
+
+  const SCEV *Step = AR->getStepRecurrence(SE);
+  std::optional<APInt> StepAPInt = tryIntoAPInt(Step);
+  if (!StepAPInt)
+    return false;
+
+  APInt Q;
+  uint64_t R;
+  APInt::udivrem(StepAPInt->abs(), ElementSize, Q, R);
+  if (R != 0)
+    return false;
+
+  // Bail out when the step is too large.
+  std::optional<unsigned> StepVal = Q.tryZExtValue();
+  if (!StepVal)
+    return false;
+
+  Steps.push_back(*StepVal);
+  return collectConstantAbsSteps(SE, AR->getStart(), Steps, ElementSize);
+}
+
+static bool findFixedSizeArrayDimensions(ScalarEvolution &SE, const SCEV *Expr,
+                                         SmallVectorImpl<unsigned> &Sizes,
+                                         const SCEV *ElementSize) {
+  if (!ElementSize)
+    return false;
+
+  std::optional<APInt> ElementSizeAPInt = tryIntoAPInt(ElementSize);
+  if (!ElementSizeAPInt || *ElementSizeAPInt == 0)
+    return false;
+
+  std::optional<unsigned> ElementSizeConst = ElementSizeAPInt->tryZExtValue();
+
+  // Early exit when ElementSize is not a positive constant.
+  if (!ElementSizeConst)
+    return false;
+
+  if (!collectConstantAbsSteps(SE, Expr, Sizes, *ElementSizeConst) ||
+      Sizes.empty()) {
+    Sizes.clear();
+    return false;
+  }
+
+  // At this point, Sizes contains the absolute step values for all induction
+  // variables. Each step value must be a multiple of the size of the array
+  // element. Assuming that the each value represents the size of an array for
+  // each dimension, attempts to restore the length of each dimension by
+  // dividing the step value by the next smaller value. For example, if we have
+  // the following AddRec SCEV:
+  //
+  //   AddRec: {{{0,+,2048}<%for.i>,+,256}<%for.j>,+,8}<%for.k> (ElementSize=8)
+  //
+  // Then Sizes will become [256, 32, 1] after sorted. We don't know the size
+  // of the outermost dimension, the next dimension will be computed as
+  // 256 / 32 = 8, and the last dimension will be computed as 32 / 1 = 32. Thus
+  // it results in like Arr[UnknownSize][8][32] with elements of size 8 bytes,
+  // where Arr is a base pointer.
+  //
+  // TODO: Catch more cases, e.g., when a step value is not dividable by the
+  // next smaller one, like A[i][3*j].
+  llvm::sort(Sizes.rbegin(), Sizes.rend());
+  Sizes.erase(llvm::unique(Sizes), Sizes.end());
+  for (unsigned I = 0; I + 1 < Sizes.size(); I++) {
+    unsigned PrevSize = Sizes[I + 1];
+    if (Sizes[I] % PrevSize) {
+      Sizes.clear();
+      return false;
+    }
+    Sizes[I] /= PrevSize;
+  }
+
+  // The last element should be ElementSize.
+  Sizes.back() = *ElementSizeConst;
+  return true;
+}
+
+/// Splits the SCEV into two vectors of SCEVs representing the subscripts and
+/// sizes of an array access, assuming that the array is a fixed size array.
+///
+/// E.g., if we have the code like as follows:
+///
+///  double A[42][8][32];
+///  for i
+///    for j
+///      for k
+///        use A[i][j][k]
+///
+/// The access function will be represented as an AddRec SCEV like:
+///
+///  AddRec: {{{0,+,2048}<%for.i>,+,256}<%for.j>,+,8}<%for.k> (ElementSize=8)
+///
+/// Then findFixedSizeArrayDimensions will compute the array size as follows:
+///
+///  CHECK: ArrayDecl[UnknownSize][8][32] with elements of 8 bytes.
+///
+/// Finally each subscript will be computed as follows:
+///
+///  CHECK: ArrayRef[{0,+,1}<%for.i>][{0,+,1}<%for.j>][{0,+,1}<%for.k>]
+///
+/// Note that this function doesn't check the range of possible values for each
+/// subscript, so the caller should perform additional boundary checks if
+/// necessary.
+///
+/// TODO: At the moment, this function can handle only simple cases. For
+/// example, we cannot handle a case where a step value is not dividable by the
+/// next smaller step value, e.g., A[i][3*j]. Furthermore, this function
+/// doesn't guarantee that the original array size is restored "correctly". For
+/// example, in the following case:
+///
+///  double A[42][4][32];
+///  double B[42][8][64];
+///  for i
+///    for j
+///      for k
+///        use A[i][j][k]
+///        use B[i][2*j][k]
+///
+/// The access function for both accesses will be the same:
+///
+///  AddRec: {{{0,+,2048}<%for.i>,+,512}<%for.j>,+,8}<%for.k> (ElementSize=8)
+///
+/// The array sizes for both A and B will be computed as
+/// ArrayDecl[UnknownSize][4][64], which matches for A, but not for B.
+void llvm::delinearizeFixedSizeArray(ScalarEvolution &SE, const SCEV *Expr,
+                                     SmallVectorImpl<const SCEV *> &Subscripts,
+                                     SmallVectorImpl<const SCEV *> &Sizes,
+                                     const SCEV *ElementSize) {
+
+  // First step: find the fixed array size.
+  SmallVector<unsigned, 4> ConstSizes;
+  if (!findFixedSizeArrayDimensions(SE, Expr, ConstSizes, ElementSize)) {
+    Sizes.clear();
+    return;
+  }
+
+  // Convert the constant size to SCEV.
+  for (unsigned Size : ConstSizes)
+    Sizes.push_back(SE.getConstant(Expr->getType(), Size));
+
+  // Second step: compute the access functions for each subscript.
+  computeAccessFunctions(SE, Expr, Subscripts, Sizes);
+
+  if (Subscripts.empty())
+    return;
+}
+
 bool llvm::getIndexExpressionsFromGEP(ScalarEvolution &SE,
                                       const GetElementPtrInst *GEP,
                                       SmallVectorImpl<const SCEV *> &Subscripts,
@@ -586,9 +761,20 @@ void printDelinearization(raw_ostream &O, Function *F, LoopInfo *LI,
       O << "AccessFunction: " << *AccessFn << "\n";
 
       SmallVector<const SCEV *, 3> Subscripts, Sizes;
+
+      auto IsDelinearizationFailed = [&]() {
+        return Subscripts.size() == 0 || Sizes.size() == 0 ||
+               Subscripts.size() != Sizes.size();
+      };
+
       delinearize(*SE, AccessFn, Subscripts, Sizes, SE->getElementSize(&Inst));
-      if (Subscripts.size() == 0 || Sizes.size() == 0 ||
-          Subscripts.size() != Sizes.size()) {
+      if (UseFixedSizeArrayHeuristic && IsDelinearizationFailed()) {
+        Subscripts.clear();
+        Sizes.clear();
+        delinearizeFixedSizeArray(*SE, AccessFn, Subscripts, Sizes, SE->getElementSize(&Inst));
+      }
+
+      if (IsDelinearizationFailed()) {
         O << "failed to delinearize\n";
         continue;
       }
diff --git a/llvm/test/Analysis/Delinearization/fixed_size_array.ll b/llvm/test/Analysis/Delinearization/fixed_size_array.ll
new file mode 100644
index 0000000000000..98f2d3f194584
--- /dev/null
+++ b/llvm/test/Analysis/Delinearization/fixed_size_array.ll
@@ -0,0 +1,446 @@
+; RUN: opt < %s -passes='print<delinearization>' -disable-output -delinearize-use-fixed-size-array-heuristic 2>&1 | FileCheck %s
+
+; void f(int A[][8][32]) {
+;   for (i = 0; i < 42; i++)
+;    for (j = 0; j < 8; j++)
+;     for (k = 0; k < 32; k++)
+;       A[i][j][k] = 1;
+; }
+
+; CHECK:      Delinearization on function a_i_j_k:
+; CHECK:      Base offset: %a
+; CHECK-NEXT: ArrayDecl[UnknownSize][8][32] with elements of 4 bytes.
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
+define void @a_i_j_k(ptr %a) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %idx = getelementptr [8 x [32 x i32]], ptr %a, i32 %i, i32 %j, i32 %k
+  store i32 1, ptr %idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 32
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 8
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt i32 %i.inc, 42
+  br i1 %cmp.i, label %for.i.header, label %exit
+
+exit:
+  ret void
+}
+
+; void f(int A[][8][32]) {
+;   for (i = 0; i < 42; i++)
+;    for (j = 0; j < 8; j++)
+;     for (k = 0; k < 32; k++)
+;       A[i][7-j][k] = 1;
+; }
+
+; CHECK:      Delinearization on function a_i_nj_k:
+; CHECK:      Base offset: %a
+; CHECK-NEXT: ArrayDecl[UnknownSize][8][32] with elements of 4 bytes.
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{7,+,-1}<nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
+define void @a_i_nj_k(ptr %a) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  %j.subscript = sub i32 7, %j
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %idx = getelementptr [8 x [32 x i32]], ptr %a, i32 %i, i32 %j.subscript, i32 %k
+  store i32 1, ptr %idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 32
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 8
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt i32 %i.inc, 42
+  br i1 %cmp.i, label %for.i.header, label %exit
+
+exit:
+  ret void
+}
+
+; In the following code, the access functions for both stores are represented
+; in the same way in SCEV, so the delinearization results are also the same. We
+; don't have any type information of the underlying objects.
+;
+; void f(int A[][4][64], int B[][8][32]) {
+;   for (i = 0; i < 42; i++)
+;    for (j = 0; j < 4; j++)
+;     for (k = 0; k < 32; k++) {
+;       A[i][j][k] = 1;
+;       B[i][2*j][k] = 1;
+;     }
+; }
+
+; CHECK:      Delinearization on function a_ijk_b_i2jk:
+; CHECK:      Base offset: %a
+; CHECK-NEXT: ArrayDecl[UnknownSize][4][64] with elements of 4 bytes.
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
+; CHECK:      Base offset: %b
+; CHECK-NEXT: ArrayDecl[UnknownSize][4][64] with elements of 4 bytes.
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><nsw><%for.j.header>][{0,+,1}<nuw><nsw><%for.k>]
+define void @a_ijk_b_i2jk(ptr %a, ptr %b) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  %j2 = shl i32 %j, 1
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %a.idx = getelementptr [4 x [64 x i32]], ptr %a, i32 %i, i32 %j, i32 %k
+  %b.idx = getelementptr [8 x [32 x i32]], ptr %b, i32 %i, i32 %j2, i32 %k
+  store i32 1, ptr %a.idx
+  store i32 1, ptr %b.idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 32
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 4
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt i32 %i.inc, 42
+  br i1 %cmp.i, label %for.i.header, label %exit
+
+exit:
+  ret void
+}
+
+; The type information of the underlying object is not available, so the
+; delinearization result is different from it.
+;
+; void f(int A[][8][32]) {
+;   for (i = 0; i < 42; i++)
+;    for (j = 0; j < 3; j++)
+;     for (k = 0; k < 32; k++)
+;       A[i][2*j+1][k] = 1;
+; }
+
+; CHECK:      Delinearization on function a_i_2j1_k:
+; CHECK:      Base offset: %a
+; CHECK-NEXT: ArrayDecl[UnknownSize][4][64] with elements of 4 bytes.
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{0,+,1}<nuw><%for.j.header>][{32,+,1}<nw><%for.k>]
+define void @a_i_2j1_k(ptr %a) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  %j2 = shl i32 %j, 1
+  %j.subscript = add i32 %j2, 1
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %idx = getelementptr [8 x [32 x i32]], ptr %a, i32 %i, i32 %j.subscript, i32 %k
+  store i32 1, ptr %idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 32
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 3
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt i32 %i.inc, 42
+  br i1 %cmp.i, label %for.i.header, label %exit
+
+exit:
+  ret void
+}
+
+; Fail to delinearize because the step value of the j-loop is not dividable by
+; that of the k-loop.
+;
+; void f(int A[][8][32]) {
+;   for (i = 0; i < 42; i++)
+;    for (j = 0; j < 8; j++)
+;     for (k = 0; k < 10; k++)
+;       A[i][j][3*k] = 1;
+; }
+
+; CHECK:      Delinearization on function a_i_j_3k:
+; CHECK:      AccessFunction: {{...}}0,+,1024}<nuw><nsw><%for.i.header>,+,128}<nw><%for.j.header>,+,12}<nw><%for.k>
+; CHECK-NEXT: failed to delinearize
+define void @a_i_j_3k(ptr %a) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %k.subscript = mul i32 %k, 3
+  %idx = getelementptr [8 x [32 x i32]], ptr %a, i32 %i, i32 %j, i32 %k.subscript
+  store i32 1, ptr %idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 10
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 8
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt i32 %i.inc, 42
+  br i1 %cmp.i, label %for.i.header, label %exit
+
+exit:
+  ret void
+}
+
+; Fail to delinearize because i is used in multiple subscripts that are not adjacent.
+;
+; void f(int A[][8][32]) {
+;   for (i = 0; i < 32; i++)
+;    for (j = 0; j < 4; j++)
+;     for (k = 0; k < 4; k++)
+;       A[i][j+k][i] = 1;
+; }
+
+; CHECK:      Delinearization on function a_i_jk_i:
+; CHECK:      AccessFunction: {{...}}0,+,1028}<%for.i.header>,+,128}<nw><%for.j.header>,+,128}<nw><%for.k>
+; CHECK-NEXT: failed to delinearize
+define void @a_i_jk_i(ptr %a) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %jk = add i32 %j, %k
+  %idx = getelementptr [8 x [32 x i32]], ptr %a, i32 %i, i32 %jk, i32 %i
+  store i32 1, ptr %idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 4
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 4
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt i32 %i.inc, 32
+  br i1 %cmp.i, label %for.i.header, label %exit
+
+exit:
+  ret void
+}
+
+; Can delinearize, but the result is different from the original array size. In
+; this case, the outermost two dimensions are melded into one.
+;
+; void f(int A[][8][32]) {
+;   for (i = 0; i < 8; i++)
+;    for (j = 0; j < 10; j++)
+;     for (k = 0; k < 10; k++)
+;       A[i][i][j+k] = 1;
+; }
+
+; CHECK:      Delinearization on function a_i_i_jk:
+; CHECK:      Base offset: %a
+; CHECK-NEXT: ArrayDecl[UnknownSize][288] with elements of 4 bytes.
+; CHECK-NEXT: ArrayRef[{0,+,1}<nuw><nsw><%for.i.header>][{{..}}0,+,1}<nuw><nsw><%for.j.header>,+,1}<nuw><nsw><%for.k>]
+define void @a_i_i_jk(ptr %a) {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i32 [ 0, %entry ], [ %i.inc, %for.i.latch ]
+  br label %for.j.header
+
+for.j.header:
+  %j = phi i32 [ 0, %for.i.header ], [ %j.inc, %for.j.latch ]
+  br label %for.k
+
+for.k:
+  %k = phi i32 [ 0, %for.j.header ], [ %k.inc, %for.k ]
+  %jk = add i32 %j, %k
+  %idx = getelementptr [8 x [32 x i32]], ptr %a, i32 %i, i32 %i, i32 %jk
+  store i32 1, ptr %idx
+  %k.inc = add i32 %k, 1
+  %cmp.k = icmp slt i32 %k.inc, 10
+  br i1 %cmp.k, label %for.k, label %for.j.latch
+
+for.j.latch:
+  %j.inc = add i32 %j, 1
+  %cmp.j = icmp slt i32 %j.inc, 10
+  br i1 %cmp.j, label %for.j.header, label %for.i.latch
+
+for.i.latch:
+  %i.inc = add i32 %i, 1
+  %cmp.i = icmp slt...
[truncated]

Copy link

github-actions bot commented Jun 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kasuga-fj kasuga-fj force-pushed the fixed-size-array-deliniarize-0 branch 2 times, most recently from 145e22a to a1d611b Compare June 20, 2025 15:51
…n GEP

The existing functions `getIndexExpressionsFromGEP` and
`tryDelinearizeFixedSizeImpl` provide functionality to delinearize
memory accesses for fixed size array. They use the GEP source element
type in their optimization heuristics. However, driving optimization
heuristics based on GEP type information is not allowed.
This patch introduces a new function `delinearizeFixedSizeArray` to
remove them. This is an initial implementation that may not cover all
cases, but is intended to replace the existing function in the future.
@kasuga-fj kasuga-fj force-pushed the fixed-size-array-deliniarize-0 branch from a1d611b to e3e2f5a Compare June 20, 2025 15:54
/// AddRec: {{{0,+,2048}<%for.i>,+,512}<%for.j>,+,8}<%for.k> (ElementSize=8)
///
/// The array sizes for both A and B will be computed as
/// ArrayDecl[UnknownSize][4][64], which matches for A, but not for B.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To disambiguate this case, we need to transmit the array declarations double A[42][4][32]; and double B[42][8][64]; to the LLVM IR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To disambiguate this case, we need to transmit the array declarations double A[42][4][32]; and double B[42][8][64]; to the LLVM IR.

Probably true, but I'm not sure if it makes sense to distinguish between them in this case. Also, my comment was wrong: The correct sizes are A[42][4][64] and B[42][8][32].

if (UseFixedSizeArrayHeuristic && IsDelinearizationFailed()) {
Subscripts.clear();
Sizes.clear();
delinearizeFixedSizeArray(*SE, AccessFn, Subscripts, Sizes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to call delinearizeFixedSizeArray from other places that call delinearize?

Or maybe we can call delinearizeFixedSizeArray from delinearize when it fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend to change the behavior of existing passes with this patch. This is just an initial implementation, and this part of the code is for testing purposes.

@sjoerdmeijer
Copy link
Collaborator

Sorry for asking a very high level question: if this is just for testing purposes, what is the direction of travel and what are the next steps? In other words, do we think we think we will be able to properly fix it? Do we have confidence that we can promote this as a solution soon'ish?

@kasuga-fj
Copy link
Contributor Author

If you're referring to my previous comment, I meant "this part of the code" as in "the code from lines 774 to 777". Sorry for confusion, but I didn't mean to imply that the entire change in this PR is just for testing purposes. Once this is merged, I'm considering submitting another patch to replace the call to tryDelinearizeFixedSizeImpl in DA with the new function.

Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're referring to #145050 (comment), I meant "this part of the code" as in "the code from lines 774 to 777". Sorry for confusion, but I didn't mean to imply that the entire change in this PR is just for testing purposes. Once this is merged, I'm considering submitting another patch to replace the call to tryDelinearizeFixedSizeImpl in DA with the new function.

Okay, got it, thanks for clarifying!

@@ -32,6 +33,11 @@ using namespace llvm;
#define DL_NAME "delinearize"
#define DEBUG_TYPE DL_NAME

static cl::opt<bool> UseFixedSizeArrayHeuristic(
"delinearize-use-fixed-size-array-heuristic", cl::init(false), cl::Hidden,
cl::desc("When printing analysis, use the heuristic for fixed-size arrays "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nitpick on the wording here: this heuristic is used more than just for printing, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a misunderstanding? Currently there are two functions for delinearization; delinearize and tryDelinearizeFixedSizeImpl. These two functions are independent of each other, and the newly added function delinearizeFixedSize in this patch is intended to eventually replace all of calls to tryDelinearizeFixedSizeImpl. So this flag only affects the behavior of printing.

Although I think it would be ideal to merge delinearize and delinearizeFixedSize into a single function, that is a different topic.

@@ -32,6 +33,11 @@ using namespace llvm;
#define DL_NAME "delinearize"
#define DEBUG_TYPE DL_NAME

static cl::opt<bool> UseFixedSizeArrayHeuristic(
"delinearize-use-fixed-size-array-heuristic", cl::init(false), cl::Hidden,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to improve this, then my preference would be to have it on by default. But I now understand you would like to follow up on this soon.


// At this point, Sizes contains the absolute step recurrences for all
// induction variables. Each step recurrence must be a multiple of the size of
// the array element. Assuming that the each value represents the size of an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need to look a little more, but just wanted to add that it isn't clear to me yet if there are any problems with this assumption. So questions I have at this point are if this is safe, does it always hold, and what if it doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is safe, does it always hold, and what if it doesn't?

To answer the last two questions: This assumption does NOT always hold. In cases where it doesn't, I believe the additional checks on the caller side can catch them (example in DA). While I think such checks ideally belong inside the delinearization function itself, I’m beginning to suspect that skipping them may actually be beneficial in some scenarios, such as LoopCacheAnalysis. Either way, I need to investigate and consider more about this.
To answer the first question: I believe this is safe at least in the context of DA, thanks to the additional checks I mentioned above.


for.i.latch:
%i.inc = add i32 %i, 1
%cmp.i = icmp slt i32 %i.inc, 42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at this test, in cases like this where we have known backedge-taken counts, can we make us of it to determine the UnknownSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it. The main reason for doing it this way is to make its behavior closer to the existing delinearize function. I'd prefer to open another PR to handle outermost dimension size.

@sjoerdmeijer
Copy link
Collaborator

While I am still digesting some parts of this change, here are some high level comments (and possibly nitpicks):

  • New function delinearizeFixedSizeArray will replace old function tryDelinearizeFixedSizeImpl: shall we therefore make the signature the same, i.e. should it at least return a bool upon success/failure of delinearization?
  • Shall we rename delinearizeFixedSizeArray to tryDelinearizeFixedSizeImplExperimental; maybe that makes things more clear/explicit that one is going to replace the other?
  • Is there a way we can compare the the new and old delinearization? Or is it too early for that? For example, can we compare the return values where this is used, e.g.:
    bool TrySrcOld = tryDelinearizeFixedSizeImpl(SE, Src, SrcAccessFn, SrcSubscripts, SrcSizes);
    bool TrySrcNew = tryDelinearizeFixedSizeImplExperimental(SE, Src, SrcAccessFn, SrcSubscripts, SrcSizes);
    assert(TrySrcOld == TrySrcNew && "something");
    
  • In this comment for another change you provided another example where DA is not doing the right thing. Is that case handled here too? Is it worth adding this here, or is it exactly the same as one of the other cases?

@kasuga-fj
Copy link
Contributor Author

  • New function delinearizeFixedSizeArray will replace old function tryDelinearizeFixedSizeImpl: shall we therefore make the signature the same, i.e. should it at least return a bool upon success/failure of delinearization?

Yes, returning a bool makes perfect sense to me. I will update it.

  • Shall we rename delinearizeFixedSizeArray to tryDelinearizeFixedSizeImplExperimental; maybe that makes things more clear/explicit that one is going to replace the other?

I don't have a strong preference for the current name. However, I'd rather avoid using a word like experimental in the function name (similarly, I'm not a big fan of intrinsics such as llvm.experimental.* either). IMHO, it's preferable to attach that kind of information via an attribute or a comment instead. So my suggestion would be to mark the existing tryDelinearizeFixedSizeImpl as deprecated with [[deprecated]] attribute. WDYT?

  • Is there a way we can compare the the new and old delinearization? Or is it too early for that? For example, can we compare the return values where this is used, e.g.:
    bool TrySrcOld = tryDelinearizeFixedSizeImpl(SE, Src, SrcAccessFn, SrcSubscripts, SrcSizes);
    bool TrySrcNew = tryDelinearizeFixedSizeImplExperimental(SE, Src, SrcAccessFn, SrcSubscripts, SrcSizes);
    assert(TrySrcOld == TrySrcNew && "something");
    

What are you trying to achieve? As things stand, there are still cases where the old function works but the new one doesn't, so I think adding an assert like that would be too early. If your goal is just to compare the capabilities of the two functions, I think that's possible, but I don’t intend to upstream that kind of code.

  • In this comment for another change you provided another example where DA is not doing the right thing. Is that case handled here too? Is it worth adding this here, or is it exactly the same as one of the other cases?

I owe you an apology regarding this. I realized the corner case I mentioned was slightly wrong. Sorry about that! (I had used a small script to generate the test case, but it was buggy. I’ve discarded that script immediately). I'll add more details in the comment you mentioned.
Anyway, these cases still cause issues in DA, and I believe replacing tryDelinearizeFixedSizeImpl with the new function could resolve them. I'm not entirely sure whether it's worth adding a test case, but if so, it would be better to include it in a separate patch.

@sjoerdmeijer
Copy link
Collaborator

  • Shall we rename delinearizeFixedSizeArray to tryDelinearizeFixedSizeImplExperimental; maybe that makes things more clear/explicit that one is going to replace the other?

I don't have a strong preference for the current name. However, I'd rather avoid using a word like experimental in the function name (similarly, I'm not a big fan of intrinsics such as llvm.experimental.* either). IMHO, it's preferable to attach that kind of information via an attribute or a comment instead. So my suggestion would be to mark the existing tryDelinearizeFixedSizeImpl as deprecated with [[deprecated]] attribute. WDYT?

That will work for me, and I don't have very strong opinions in the first place. My only observation was that having both functions is confusing for people who have less context. So if [[deprecated]] is one way to do this, then that sounds good to me.

  • Is there a way we can compare the the new and old delinearization? Or is it too early for that? For example, can we compare the return values where this is used, e.g.:
    bool TrySrcOld = tryDelinearizeFixedSizeImpl(SE, Src, SrcAccessFn, SrcSubscripts, SrcSizes);
    bool TrySrcNew = tryDelinearizeFixedSizeImplExperimental(SE, Src, SrcAccessFn, SrcSubscripts, SrcSizes);
    assert(TrySrcOld == TrySrcNew && "something");
    

What are you trying to achieve? As things stand, there are still cases where the old function works but the new one doesn't, so I think adding an assert like that would be too early. If your goal is just to compare the capabilities of the two functions, I think that's possible, but I don’t intend to upstream that kind of code.

Yeah, exactly, I was trying to see if we could systematically catch cases where there's divergence between old and new. But I see that it is early days, and maybe tricky.

  • In this comment for another change you provided another example where DA is not doing the right thing. Is that case handled here too? Is it worth adding this here, or is it exactly the same as one of the other cases?

I owe you an apology regarding this. I realized the corner case I mentioned was slightly wrong. Sorry about that! (I had used a small script to generate the test case, but it was buggy. I’ve discarded that script immediately). I'll add more details in the comment you mentioned. Anyway, these cases still cause issues in DA, and I believe replacing tryDelinearizeFixedSizeImpl with the new function could resolve them. I'm not entirely sure whether it's worth adding a test case, but if so, it would be better to include it in a separate patch.

No worries, and thanks for adding the test case. That's the only thing I wanted to achieve: collect all interesting test cases that were floating around in different merge requests. :)

@kasuga-fj
Copy link
Contributor Author

I replaced the call to tryDelinearizeFixedSizeImpl in DA with the new function and tried running regression tests for DA and LoopInterchange. As a result, all tests under test/Transforms/LoopInterchange passed, but about half of the tests under test/Analysis/DependenceAnalysis failed. I haven't checked all the cases yet, but here's a quick summary of the current findings:

  • For example, triangular loops such as the one in NonCanonicalizedSubscript.ll were not handled properly. At the moment, I don't have a good idea for addressing these cases. However, since LoopInterchange also doesn't support triangular loops, I believe this limitation is acceptable in the context of enabling that pass.
    • On the other hand, I'm not sure if this is a problem for the other passes that depend on DA, such as LoopUnrollAndJam.
  • Regarding DA, it calls the delinearization function twice, once for Src and once for Dst. In some cases, inferred array sizes differ between them, causing DependenceInfo::tryDelinearizeFixedSize to fail (e.g., Invariant.ll, where rr[j][j] doesn't depend on i). However, if those two results are "compatible", delinearization may still succeed. It would be nice to add such logic into DA to detect such cases. From a brief test, it seemed to work well.
    • IIUIC, this is conceptually similar to how NumPy handles tensor strides when reshaping.
  • In some cases, the results seems to be improved. At first glance, dependencies like [* * *] often turns into ones like [S S 0]. I need to check if they are actually correct.

While investigating them, I noticed some minor glitches as well.

  • DependenceAnalysisPrinter seems a bit flaky. If a loop only has dependencies within a single iteration (i.e., no loop-carried dependencies), it prints either 0 or =. It seemed that this caused several tests to fail.
  • In addition to fixed-size array delinearization, we also need to incorporate logic to check whether the address is divisible by ElementSize in DependenceInfo::tryDelinearizeParametricSize. The first corner case I posted was resolved by fixing only the fixed-size array delinearization, but the second one required changes to this function as well.

@sjoerdmeijer
Copy link
Collaborator

I replaced the call to tryDelinearizeFixedSizeImpl in DA with the new function and tried running regression tests for DA and LoopInterchange. As a result, all tests under test/Transforms/LoopInterchange passed, but about half of the tests under test/Analysis/DependenceAnalysis failed. I haven't checked all the cases yet, but here's a quick summary of the current findings:

  • For example, triangular loops such as the one in NonCanonicalizedSubscript.ll were not handled properly. At the moment, I don't have a good idea for addressing these cases. However, since LoopInterchange also doesn't support triangular loops, I believe this limitation is acceptable in the context of enabling that pass.

    • On the other hand, I'm not sure if this is a problem for the other passes that depend on DA, such as LoopUnrollAndJam.
  • Regarding DA, it calls the delinearization function twice, once for Src and once for Dst. In some cases, inferred array sizes differ between them, causing DependenceInfo::tryDelinearizeFixedSize to fail (e.g., Invariant.ll, where rr[j][j] doesn't depend on i). However, if those two results are "compatible", delinearization may still succeed. It would be nice to add such logic into DA to detect such cases. From a brief test, it seemed to work well.

    • IIUIC, this is conceptually similar to how NumPy handles tensor strides when reshaping.
  • In some cases, the results seems to be improved. At first glance, dependencies like [* * *] often turns into ones like [S S 0]. I need to check if they are actually correct.

While investigating them, I noticed some minor glitches as well.

  • DependenceAnalysisPrinter seems a bit flaky. If a loop only has dependencies within a single iteration (i.e., no loop-carried dependencies), it prints either 0 or =. It seemed that this caused several tests to fail.
  • In addition to fixed-size array delinearization, we also need to incorporate logic to check whether the address is divisible by ElementSize in DependenceInfo::tryDelinearizeParametricSize. The first corner case I posted was resolved by fixing only the fixed-size array delinearization, but the second one required changes to this function as well.

I think this is a great result.

If this is already useful, and with all interchange tests passing I think that is the case, how about this as a strategy:

  • we replace the old function with the new one, and do that in this merge request. So, we just flip the switch straight away. The reason is: this new one handles less cases, but it is correct. I think this is always preferred over handling more cases but having correctness issues. We do not have any passes running by default in the optimisation pipeline that rely on this, so I don't really see the point of a prolonged deprecation time and using the old one any time longer.
  • But we keep the old function around, and guard it with a new option -use-old-type-based-gep-delinearization or something along those lines that is off by default, so that we use the new one by default. This provides downstream user the option to easily fall back on the old one if they rely on that version that handles more cases.
  • We XFAIL the failing DependenceAnalysis cases. We can then start chipping away at the different cases, one at a time, and I can then help with that, we can look into parallelising that (if that helps).

What do you think?

@kasuga-fj
Copy link
Contributor Author

I generally agree with the overall direction. Adding an option like -use-old-type-based-gep-delinearization seems a practical way at this moment (though I feel it's ideal to erase the GEP source element type driven heuristics entirely and literally from DA, if possible). Here are my thoughts regarding the minor points:

  • We should submit another PR to replace the old function with the new one. Since their signatures differ, this change will require more than just a simple renaming the calls to the old function in DA. I'd also prefer to keep commits as functionally separated as possible.
  • Instead of marking tests as XFAIL, I think it’s better to verify behavior with and without the -use-old-type-based-gep-delinearization option. This can be done using FileCheck's --check-prefix argument.
  • Before adding such an option, I'd like to fix the issue where DependenceAnalysisPrinter prints either 0 or = depending on the situation. This is not a correctness issue, but makes tests fragile. I'll work on it.

@sjoerdmeijer
Copy link
Collaborator

I generally agree with the overall direction. Adding an option like -use-old-type-based-gep-delinearization seems a practical way at this moment (though I feel it's ideal to erase the GEP source element type driven heuristics entirely and literally from DA, if possible). Here are my thoughts regarding the minor points:

  • We should submit another PR to replace the old function with the new one. Since their signatures differ, this change will require more than just a simple renaming the calls to the old function in DA. I'd also prefer to keep commits as functionally separated as possible.
  • Instead of marking tests as XFAIL, I think it’s better to verify behavior with and without the -use-old-type-based-gep-delinearization option. This can be done using FileCheck's --check-prefix argument.
  • Before adding such an option, I'd like to fix the issue where DependenceAnalysisPrinter prints either 0 or = depending on the situation. This is not a correctness issue, but makes tests fragile. I'll work on it.

Ok, sounds good, let me know if I can help somehow.

@kasuga-fj
Copy link
Contributor Author

Ok, sounds good, let me know if I can help somehow.

This approach would prevent us from entirely eliminating the calls to old function from DA (i.e., although the old function is unlikely to be invoked logically, we can’t eliminate the string tryDelinearizeFixedSizeImpl entirely from DependenceAnalysis.cpp). Therefore, it would be ideal to add tests to ensure the old function isn't invoked by default, though I'm not sure what the best method would be. Could you investigate this further, including whether such tests are actually necessary?

@kasuga-fj
Copy link
Contributor Author

I looked into the details of changes in the results when replacing the current tryDelinearizeFixedSizeImpl in DA with the new function delinearizeFixedSizeArray, and found the two issues (#148241 and #148813) on the DA side. However, the new function seems to behave correctly. So I believe this PR is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants