Skip to content

Conversation

@fiigii
Copy link
Contributor

@fiigii fiigii commented Oct 10, 2025

This PR implements parts of #162376

  • Broader equivalence than constant index deltas:
    • Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
    • Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
  • Dominance-aware dictionary instead of linear scans:
    • Tuple-keyed candidate dictionary grouped by basic block.
    • Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
  • Simple cost model and best-rewrite selection:
    • Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
    • Skip rewriting when expressions are already foldable or high-efficiency.
  • Path compression for better ILP:
    • Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
  • Dependency-aware rewrite ordering:
    • Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
    • This dependency graph will be needed by the next PR that adds partial strength reduction.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Fei Peng (fiigii)

Changes

This PR implements parts of #162376

  • Broader equivalence than constant index deltas:
    • Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
    • Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
  • Dominance-aware dictionary instead of linear scans:
    • Tuple-keyed candidate dictionary grouped by basic block.
    • Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
  • Simple cost model and best-rewrite selection:
    • Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
    • Skip rewriting when expressions are already foldable or high-efficiency.
  • Path compression for better ILP:
    • Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
  • Dependency-aware rewrite ordering:
    • Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
    • This dependency graph will be needed by the next PR that adds partial strength reduction.

Patch is 163.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162930.diff

17 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp (+833-271)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+9-11)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-reassociate-bug.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/idot2.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4s.ll (+82-79)
  • (modified) llvm/test/CodeGen/AMDGPU/idot8u.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+230-231)
  • (modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll (+135-200)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/pr23975.ll (+1-1)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/reassociate-geps-and-slsr-addrspace.ll (+5-5)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll (+156)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-var-delta.ll (+49)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll (+35)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/pick-candidate.ll (+32)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-add.ll (+96)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll (+148)
diff --git a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
index 7d017095c88ce..e6d1b168cf69d 100644
--- a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
@@ -12,17 +12,16 @@
 // effective in simplifying arithmetic statements derived from an unrolled loop.
 // It can also simplify the logic of SeparateConstOffsetFromGEP.
 //
-// There are many optimizations we can perform in the domain of SLSR. This file
-// for now contains only an initial step. Specifically, we look for strength
-// reduction candidates in the following forms:
+// There are many optimizations we can perform in the domain of SLSR.
+// We look for strength reduction candidates in the following forms:
 //
-// Form 1: B + i * S
-// Form 2: (B + i) * S
-// Form 3: &B[i * S]
+// Form Add: B + i * S
+// Form Mul: (B + i) * S
+// Form GEP: &B[i * S]
 //
 // where S is an integer variable, and i is a constant integer. If we found two
 // candidates S1 and S2 in the same form and S1 dominates S2, we may rewrite S2
-// in a simpler way with respect to S1. For example,
+// in a simpler way with respect to S1 (index delta). For example,
 //
 // S1: X = B + i * S
 // S2: Y = B + i' * S   => X + (i' - i) * S
@@ -35,8 +34,29 @@
 //
 // Note: (i' - i) * S is folded to the extent possible.
 //
+// For form Add and GEP, we can also rewrite a candidate in a simpler way
+// with respect to other dominating candidates if their B or S are different
+// but other parts are the same. For example,
+//
+// Base Delta:
+// S1: X = B  + i * S
+// S2: Y = B' + i * S   => X + (B' - B)
+//
+// S1: X = &B [i * S]
+// S2: Y = &B'[i * S]   => X + (B' - B)
+//
+// Stride Delta:
+// S1: X = B + i * S
+// S2: Y = B + i * S'   => X + i * (S' - S)
+//
+// S1: X = &B[i * S]
+// S2: Y = &B[i * S']   => X + i * (S' - S)
+//
+// PS: Stride delta write on form Mul is usually non-profitable, and Base delta
+// write sometimes is profitable, so we do not support them on form Mul.
+//
 // This rewriting is in general a good idea. The code patterns we focus on
-// usually come from loop unrolling, so (i' - i) * S is likely the same
+// usually come from loop unrolling, so the delta is likely the same
 // across iterations and can be reused. When that happens, the optimized form
 // takes only one add starting from the second iteration.
 //
@@ -47,19 +67,14 @@
 // TODO:
 //
 // - Floating point arithmetics when fast math is enabled.
-//
-// - SLSR may decrease ILP at the architecture level. Targets that are very
-//   sensitive to ILP may want to disable it. Having SLSR to consider ILP is
-//   left as future work.
-//
-// - When (i' - i) is constant but i and i' are not, we could still perform
-//   SLSR.
 
 #include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/DepthFirstIterator.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Constants.h"
@@ -86,11 +101,14 @@
 #include <cstdint>
 #include <limits>
 #include <list>
+#include <queue>
 #include <vector>
 
 using namespace llvm;
 using namespace PatternMatch;
 
+#define DEBUG_TYPE "slsr"
+
 static const unsigned UnknownAddressSpace =
     std::numeric_limits<unsigned>::max();
 
@@ -142,15 +160,23 @@ class StraightLineStrengthReduce {
       GEP,     // &B[..][i * S][..]
     };
 
+    enum DKind {
+      InvalidDelta, // reserved for the default constructor
+      IndexDelta,   // Delta is a constant from Index
+      BaseDelta,    // Delta is a constant or variable from Base
+      StrideDelta,  // Delta is a constant or variable from Stride
+    };
+
     Candidate() = default;
     Candidate(Kind CT, const SCEV *B, ConstantInt *Idx, Value *S,
-              Instruction *I)
-        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I) {}
+              Instruction *I, const SCEV *StrideSCEV)
+        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I),
+          StrideSCEV(StrideSCEV) {}
 
     Kind CandidateKind = Invalid;
 
     const SCEV *Base = nullptr;
-
+    // TODO: Swap Index and Stride's name.
     // Note that Index and Stride of a GEP candidate do not necessarily have the
     // same integer type. In that case, during rewriting, Stride will be
     // sign-extended or truncated to Index's type.
@@ -177,22 +203,136 @@ class StraightLineStrengthReduce {
     // Points to the immediate basis of this candidate, or nullptr if we cannot
     // find any basis for this candidate.
     Candidate *Basis = nullptr;
+
+    DKind DeltaKind = InvalidDelta;
+
+    // Store SCEV of Stride to compute delta from different strides
+    const SCEV *StrideSCEV = nullptr;
+
+    // Points to (Y - X) that will be used to rewrite this candidate.
+    Value *Delta = nullptr;
+
+    /// Cost model: Evaluate the computational efficiency of the candidate.
+    ///
+    /// Efficiency levels (higher is better):
+    ///   5 - No instruction:
+    ///       [Variable] or [Const]
+    ///   4 - One instruction with one variable:
+    ///       [Variable + Const] or [Variable * Const]
+    ///   3 - One instruction with two variables:
+    ///       [Variable + Variable] or [Variable * Variable]
+    ///   2 - Two instructions with one variable:
+    ///       [Const + Const * Variable]
+    ///   1 - Two instructions with two variables:
+    ///       [Variable + Const * Variable]
+    static unsigned getComputationEfficiency(Kind CandidateKind,
+                                             const ConstantInt *Index,
+                                             const Value *Stride,
+                                             const SCEV *Base = nullptr) {
+      bool IsConstantBase = false;
+      bool IsZeroBase = false;
+      // When evaluating the efficiency of a rewrite, if the Basis's SCEV is
+      // not available, conservatively assume the base is not constant.
+      if (auto *ConstBase = dyn_cast_or_null<SCEVConstant>(Base)) {
+        IsConstantBase = true;
+        IsZeroBase = ConstBase->getValue()->isZero();
+      }
+
+      bool IsConstantStride = isa<ConstantInt>(Stride);
+      bool IsZeroStride =
+          IsConstantStride && cast<ConstantInt>(Stride)->isZero();
+      // All constants
+      if (IsConstantBase && IsConstantStride)
+        return 5;
+
+      // [(Base + Index) * Stride]
+      if (CandidateKind == Mul) {
+        if (IsZeroStride)
+          return 5;
+        if (Index->isZero())
+          return (IsConstantStride || IsConstantBase) ? 4 : 3;
+
+        if (IsConstantBase)
+          return IsZeroBase && (Index->isOne() || Index->isMinusOne()) ? 5 : 4;
+
+        if (IsConstantStride) {
+          auto *CI = cast<ConstantInt>(Stride);
+          return (CI->isOne() || CI->isMinusOne()) ? 4 : 2;
+        }
+        return 1;
+      }
+
+      // Base + Index * Stride
+      assert(CandidateKind == Add || CandidateKind == GEP);
+      if (Index->isZero() || IsZeroStride)
+        return 5;
+
+      bool IsSimpleIndex = Index->isOne() || Index->isMinusOne();
+
+      if (IsConstantBase)
+        return IsZeroBase ? (IsSimpleIndex ? 5 : 4) : (IsSimpleIndex ? 4 : 2);
+
+      if (IsConstantStride)
+        return IsZeroStride ? 5 : 4;
+
+      if (IsSimpleIndex)
+        return 3;
+
+      return 1;
+    }
+
+    // Evaluate if the given delta is profitable to rewrite this candidate.
+    bool isProfitableRewrite(const Value *Delta, const DKind DeltaKind) const {
+      // This function cannot accurately evaluate the profit of whole expression
+      // with context. A candidate (B + I * S) cannot express whether this
+      // instruction needs to compute on its own (I * S), which may be shared
+      // with other candidates or may need instructions to compute.
+      // If the rewritten form has the same strength, still rewrite to
+      // (X + Delta) since it may expose more CSE opportunities on Delta, as
+      // unrolled loops usually have identical Delta for each unrolled body.
+      //
+      // Note, this function should only be used on Index Delta rewrite.
+      // Base and Stride delta need context info to evaluate the register
+      // pressure impact from variable delta.
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) <=
+             getRewriteProfit(Delta, DeltaKind);
+    }
+
+    // Evaluate the rewrite profit of this candidate with its Basis
+    unsigned getRewriteProfit() const {
+      return Basis ? getRewriteProfit(Delta, DeltaKind) : 0;
+    }
+
+    // Evaluate the rewrite profit of this candidate with a given delta
+    unsigned getRewriteProfit(const Value *Delta, const DKind DeltaKind) const {
+      switch (DeltaKind) {
+      case BaseDelta: // [X + Delta]
+        return getComputationEfficiency(
+            CandidateKind,
+            ConstantInt::get(cast<IntegerType>(Delta->getType()), 1), Delta);
+      case StrideDelta: // [X + Index * Delta]
+        return getComputationEfficiency(CandidateKind, Index, Delta);
+      case IndexDelta: // [X + Delta * Stride]
+        return getComputationEfficiency(CandidateKind, cast<ConstantInt>(Delta),
+                                        Stride);
+      default:
+        return 0;
+      }
+    }
+
+    bool isHighEfficiency() const {
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) >= 4;
+    }
   };
 
   bool runOnFunction(Function &F);
 
 private:
-  // Returns true if Basis is a basis for C, i.e., Basis dominates C and they
-  // share the same base and stride.
-  bool isBasisFor(const Candidate &Basis, const Candidate &C);
-
+  // Fetch straight-line basis for rewriting C, update C.Basis to point to it,
+  // and store the delta between C and its Basis in C.Delta.
+  void setBasisAndDeltaFor(Candidate &C);
   // Returns whether the candidate can be folded into an addressing mode.
-  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI,
-                  const DataLayout *DL);
-
-  // Returns true if C is already in a simplest form and not worth being
-  // rewritten.
-  bool isSimplestForm(const Candidate &C);
+  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI);
 
   // Checks whether I is in a candidate form. If so, adds all the matching forms
   // to Candidates, and tries to find the immediate basis for each of them.
@@ -216,12 +356,6 @@ class StraightLineStrengthReduce {
   // Allocate candidates and find bases for GetElementPtr instructions.
   void allocateCandidatesAndFindBasisForGEP(GetElementPtrInst *GEP);
 
-  // A helper function that scales Idx with ElementSize before invoking
-  // allocateCandidatesAndFindBasis.
-  void allocateCandidatesAndFindBasisForGEP(const SCEV *B, ConstantInt *Idx,
-                                            Value *S, uint64_t ElementSize,
-                                            Instruction *I);
-
   // Adds the given form <CT, B, Idx, S> to Candidates, and finds its immediate
   // basis.
   void allocateCandidatesAndFindBasis(Candidate::Kind CT, const SCEV *B,
@@ -231,12 +365,6 @@ class StraightLineStrengthReduce {
   // Rewrites candidate C with respect to Basis.
   void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);
 
-  // A helper function that factors ArrayIdx to a product of a stride and a
-  // constant index, and invokes allocateCandidatesAndFindBasis with the
-  // factorings.
-  void factorArrayIndex(Value *ArrayIdx, const SCEV *Base, uint64_t ElementSize,
-                        GetElementPtrInst *GEP);
-
   // Emit code that computes the "bump" from Basis to C.
   static Value *emitBump(const Candidate &Basis, const Candidate &C,
                          IRBuilder<> &Builder, const DataLayout *DL);
@@ -247,12 +375,205 @@ class StraightLineStrengthReduce {
   TargetTransformInfo *TTI = nullptr;
   std::list<Candidate> Candidates;
 
-  // Temporarily holds all instructions that are unlinked (but not deleted) by
-  // rewriteCandidateWithBasis. These instructions will be actually removed
-  // after all rewriting finishes.
-  std::vector<Instruction *> UnlinkedInstructions;
+  // Map from SCEV to instructions that represent the value,
+  // instructions are sorted in depth-first order.
+  DenseMap<const SCEV *, SmallSetVector<Instruction *, 2>> SCEVToInsts;
+
+  // Record the dependency between instructions. If C.Basis == B, we would have
+  // {B.Ins -> {C.Ins, ...}}.
+  MapVector<Instruction *, std::vector<Instruction *>> DependencyGraph;
+
+  // Map between each instruction and its possible candidates.
+  DenseMap<Instruction *, SmallVector<Candidate *, 3>> RewriteCandidates;
+
+  // All instructions that have candidates sort in topological order based on
+  // dependency graph, from roots to leaves.
+  std::vector<Instruction *> SortedCandidateInsts;
+
+  // Record all instructions that are already rewritten and will be removed
+  // later.
+  std::vector<Instruction *> DeadInstructions;
+
+  // Classify candidates against Delta kind
+  class CandidateDictTy {
+  public:
+    using CandsTy = SmallVector<Candidate *, 8>;
+    using BBToCandsTy = DenseMap<const BasicBlock *, CandsTy>;
+
+  private:
+    // Index delta Basis must have the same (Base, StrideSCEV, Inst.Type)
+    using IndexDeltaKeyTy = std::tuple<const SCEV *, const SCEV *, Type *>;
+    DenseMap<IndexDeltaKeyTy, BBToCandsTy> IndexDeltaCandidates;
+
+    // Base delta Basis must have the same (StrideSCEV, Index, Inst.Type)
+    using BaseDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<BaseDeltaKeyTy, BBToCandsTy> BaseDeltaCandidates;
+
+    // Stride delta Basis must have the same (Base, Index, Inst.Type)
+    using StrideDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<StrideDeltaKeyTy, BBToCandsTy> StrideDeltaCandidates;
+
+  public:
+    // TODO: Disable index delta on GEP after we completely move
+    // from typed GEP to PtrAdd.
+    const BBToCandsTy *getCandidatesWithDeltaKind(const Candidate &C,
+                                                  Candidate::DKind K) const {
+      assert(K != Candidate::InvalidDelta);
+      if (K == Candidate::IndexDelta) {
+        IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, C.Ins->getType());
+        auto It = IndexDeltaCandidates.find(IndexDeltaKey);
+        if (It != IndexDeltaCandidates.end())
+          return &It->second;
+      } else if (K == Candidate::BaseDelta) {
+        BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, C.Ins->getType());
+        auto It = BaseDeltaCandidates.find(BaseDeltaKey);
+        if (It != BaseDeltaCandidates.end())
+          return &It->second;
+      } else {
+        assert(K == Candidate::StrideDelta);
+        StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, C.Ins->getType());
+        auto It = StrideDeltaCandidates.find(StrideDeltaKey);
+        if (It != StrideDeltaCandidates.end())
+          return &It->second;
+      }
+      return nullptr;
+    }
+
+    // Pointers to C must remain valid until CandidateDict is cleared.
+    void add(Candidate &C) {
+      Type *ValueType = C.Ins->getType();
+      BasicBlock *BB = C.Ins->getParent();
+      IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, ValueType);
+      BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, ValueType);
+      StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, ValueType);
+      IndexDeltaCandidates[IndexDeltaKey][BB].push_back(&C);
+      BaseDeltaCandidates[BaseDeltaKey][BB].push_back(&C);
+      StrideDeltaCandidates[StrideDeltaKey][BB].push_back(&C);
+    }
+    // Remove all mappings from set
+    void clear() {
+      IndexDeltaCandidates.clear();
+      BaseDeltaCandidates.clear();
+      StrideDeltaCandidates.clear();
+    }
+  } CandidateDict;
+
+  const SCEV *getAndRecordSCEV(Value *V) {
+    auto *S = SE->getSCEV(V);
+    if (auto *I = dyn_cast<Instruction>(V))
+      if (!isa<SCEVCouldNotCompute>(S) && !isa<SCEVUnknown>(S) &&
+          !isa<SCEVConstant>(S))
+        SCEVToInsts[S].insert(I);
+
+    return S;
+  }
+
+  // Get the nearest instruction before CI that represents the value of S,
+  // return nullptr if no instruction is associated with S or S is not a
+  // reusable expression.
+  Value *getNearestValueOfSCEV(const SCEV *S, const Instruction *CI) const {
+    if (isa<SCEVCouldNotCompute>(S))
+      return nullptr;
+
+    if (auto *SU = dyn_cast<SCEVUnknown>(S))
+      return SU->getValue();
+    if (auto *SC = dyn_cast<SCEVConstant>(S))
+      return SC->getValue();
+
+    auto It = SCEVToInsts.find(S);
+    if (It == SCEVToInsts.end())
+      return nullptr;
+
+    for (Instruction *I : reverse(It->second))
+      if (DT->dominates(I, CI))
+        return I;
+
+    return nullptr;
+  }
+
+  struct DeltaInfo {
+    Candidate *Cand;
+    Candidate::DKind DeltaKind;
+    Value *Delta;
+
+    DeltaInfo()
+        : Cand(nullptr), DeltaKind(Candidate::InvalidDelta), Delta(nullptr) {}
+    DeltaInfo(Candidate *Cand, Candidate::DKind DeltaKind, Value *Delta)
+        : Cand(Cand), DeltaKind(DeltaKind), Delta(Delta) {}
+    operator bool() const { return Cand != nullptr; }
+  };
+
+  friend raw_ostream &operator<<(raw_ostream &OS, const DeltaInfo &DI);
+
+  DeltaInfo compressPath(Candidate &C, Candidate *Basis) const;
+
+  Candidate *pickRewriteCandidate(Instruction *I) const;
+  void sortCandidateInstructions();
+  static Constant *getIndexDelta(Candidate &C, Candidate &Basis);
+  static bool isSimilar(Candidate &C, Candidate &Basis, Candidate::DKind K);
+
+  // Add Basis -> C in DependencyGraph and propagate
+  // C.Stride and C.Delta's dependency to C
+  void addDependency(Candidate &C, Candidate *Basis) {
+    if (Basis)
+      DependencyGraph[Basis->Ins].emplace_back(C.Ins);
+
+    // If any candidate of Inst has a basis, then Inst will be rewritten,
+    // C must be rewritten after rewriting Inst, so we need to propagate
+    // the dependency to C
+    auto PropagateDependency = [&](Instruction *Inst) {
+      if (auto CandsIt = RewriteCandidates.find(Inst);
+          CandsIt != RewriteCandidates.end())
+        if (std::any_of(CandsIt->second.begin(), CandsIt->second.end(),
+                        [](Candidate *Cand) { return Cand->Basis; }))
+          DependencyGraph[Inst].emplace_back(C.Ins);
+    };
+
+    // If C has a variable delta and the delta is a candidate,
+    // propagate its dependency to C
+    if (auto *DeltaInst = dyn_cast_or_null<Instruction>(C.Delta))
+      PropagateDependency(DeltaInst);
+
+    // If the stride is a candidate, propagate its dependency to C
+    if (auto *StrideInst = dyn_cast<Instruction>(C.Stride))
+      PropagateDependency(StrideInst);
+  };
 };
 
+inline llvm::raw_ostream &
+operator<<(llvm::raw_ostream &OS,
+           const StraightLineStrengthReduce::Candidate &C) {
+  OS << "Ins: " << *C.Ins << "\n  Base: " << *C.Base
+     << "\n  Index: " << *C.Index << "\n  Stride: " << *C.Stride
+     << "\n  StrideSCEV: " << *C.StrideSCEV;
+  if (C.Basis)
+    OS << "\n  Delta: " << *C.Delta << "\n  Basis: \n  [ " << *C.Basis << " ]";
+  return OS;
+}
+
+LLVM_ATTRIBUTE_UNUSED
+inline llvm::raw_ostream &
+operator<<(llvm::raw_ostream &OS,
+           const StraightLineStrengthReduce::DeltaInfo &DI) {
+  OS << "Cand: " << *DI.Cand << "\n";
+  OS << "Delta Kind: ";
+  switch (DI.DeltaKind) {
+  case StraightLineStrengthReduce::Candidate::IndexDelta:
+    OS << "Index";
+    break;
+  case StraightLineStrengthReduce::Candidate::BaseDelta:
+    OS << "Base";
+    break;
+  case StraightLineStrengthReduce::Candidate::StrideDelta:
+    OS << "Stride";
+    break;
+  default:
+    break;
+  }
+  OS << "\nDelta: " << *DI.Delta;
+  return OS;
+}
+
 } // end anonymous namespace
 
 char StraightLineStrengthReduceLegacyPass::ID = 0;
@@ -269,17 +590,284 @@ FunctionPass *llvm::createStraightLineStrengthReducePass() {
   return new StraightLineStrengthReduceLegacyPass();
 }
 
-bool StraightLineStrengthReduce::isBasisFor(const Candidate &Basis,
-                                            const Candidate &C) {
-  return (Basis.Ins != C.Ins && // skip the same instruction
-          // They must have the same type too. Basis.Base == C.Base
-          // doesn't guarantee thei...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Fei Peng (fiigii)

Changes

This PR implements parts of #162376

  • Broader equivalence than constant index deltas:
    • Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
    • Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
  • Dominance-aware dictionary instead of linear scans:
    • Tuple-keyed candidate dictionary grouped by basic block.
    • Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
  • Simple cost model and best-rewrite selection:
    • Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
    • Skip rewriting when expressions are already foldable or high-efficiency.
  • Path compression for better ILP:
    • Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
  • Dependency-aware rewrite ordering:
    • Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
    • This dependency graph will be needed by the next PR that adds partial strength reduction.

Patch is 163.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162930.diff

17 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp (+833-271)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+9-11)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-reassociate-bug.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/idot2.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4s.ll (+82-79)
  • (modified) llvm/test/CodeGen/AMDGPU/idot8u.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+230-231)
  • (modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll (+135-200)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/pr23975.ll (+1-1)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/reassociate-geps-and-slsr-addrspace.ll (+5-5)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll (+156)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-var-delta.ll (+49)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll (+35)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/pick-candidate.ll (+32)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-add.ll (+96)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll (+148)
diff --git a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
index 7d017095c88ce..e6d1b168cf69d 100644
--- a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
@@ -12,17 +12,16 @@
 // effective in simplifying arithmetic statements derived from an unrolled loop.
 // It can also simplify the logic of SeparateConstOffsetFromGEP.
 //
-// There are many optimizations we can perform in the domain of SLSR. This file
-// for now contains only an initial step. Specifically, we look for strength
-// reduction candidates in the following forms:
+// There are many optimizations we can perform in the domain of SLSR.
+// We look for strength reduction candidates in the following forms:
 //
-// Form 1: B + i * S
-// Form 2: (B + i) * S
-// Form 3: &B[i * S]
+// Form Add: B + i * S
+// Form Mul: (B + i) * S
+// Form GEP: &B[i * S]
 //
 // where S is an integer variable, and i is a constant integer. If we found two
 // candidates S1 and S2 in the same form and S1 dominates S2, we may rewrite S2
-// in a simpler way with respect to S1. For example,
+// in a simpler way with respect to S1 (index delta). For example,
 //
 // S1: X = B + i * S
 // S2: Y = B + i' * S   => X + (i' - i) * S
@@ -35,8 +34,29 @@
 //
 // Note: (i' - i) * S is folded to the extent possible.
 //
+// For form Add and GEP, we can also rewrite a candidate in a simpler way
+// with respect to other dominating candidates if their B or S are different
+// but other parts are the same. For example,
+//
+// Base Delta:
+// S1: X = B  + i * S
+// S2: Y = B' + i * S   => X + (B' - B)
+//
+// S1: X = &B [i * S]
+// S2: Y = &B'[i * S]   => X + (B' - B)
+//
+// Stride Delta:
+// S1: X = B + i * S
+// S2: Y = B + i * S'   => X + i * (S' - S)
+//
+// S1: X = &B[i * S]
+// S2: Y = &B[i * S']   => X + i * (S' - S)
+//
+// PS: Stride delta write on form Mul is usually non-profitable, and Base delta
+// write sometimes is profitable, so we do not support them on form Mul.
+//
 // This rewriting is in general a good idea. The code patterns we focus on
-// usually come from loop unrolling, so (i' - i) * S is likely the same
+// usually come from loop unrolling, so the delta is likely the same
 // across iterations and can be reused. When that happens, the optimized form
 // takes only one add starting from the second iteration.
 //
@@ -47,19 +67,14 @@
 // TODO:
 //
 // - Floating point arithmetics when fast math is enabled.
-//
-// - SLSR may decrease ILP at the architecture level. Targets that are very
-//   sensitive to ILP may want to disable it. Having SLSR to consider ILP is
-//   left as future work.
-//
-// - When (i' - i) is constant but i and i' are not, we could still perform
-//   SLSR.
 
 #include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/DepthFirstIterator.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Constants.h"
@@ -86,11 +101,14 @@
 #include <cstdint>
 #include <limits>
 #include <list>
+#include <queue>
 #include <vector>
 
 using namespace llvm;
 using namespace PatternMatch;
 
+#define DEBUG_TYPE "slsr"
+
 static const unsigned UnknownAddressSpace =
     std::numeric_limits<unsigned>::max();
 
@@ -142,15 +160,23 @@ class StraightLineStrengthReduce {
       GEP,     // &B[..][i * S][..]
     };
 
+    enum DKind {
+      InvalidDelta, // reserved for the default constructor
+      IndexDelta,   // Delta is a constant from Index
+      BaseDelta,    // Delta is a constant or variable from Base
+      StrideDelta,  // Delta is a constant or variable from Stride
+    };
+
     Candidate() = default;
     Candidate(Kind CT, const SCEV *B, ConstantInt *Idx, Value *S,
-              Instruction *I)
-        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I) {}
+              Instruction *I, const SCEV *StrideSCEV)
+        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I),
+          StrideSCEV(StrideSCEV) {}
 
     Kind CandidateKind = Invalid;
 
     const SCEV *Base = nullptr;
-
+    // TODO: Swap Index and Stride's name.
     // Note that Index and Stride of a GEP candidate do not necessarily have the
     // same integer type. In that case, during rewriting, Stride will be
     // sign-extended or truncated to Index's type.
@@ -177,22 +203,136 @@ class StraightLineStrengthReduce {
     // Points to the immediate basis of this candidate, or nullptr if we cannot
     // find any basis for this candidate.
     Candidate *Basis = nullptr;
+
+    DKind DeltaKind = InvalidDelta;
+
+    // Store SCEV of Stride to compute delta from different strides
+    const SCEV *StrideSCEV = nullptr;
+
+    // Points to (Y - X) that will be used to rewrite this candidate.
+    Value *Delta = nullptr;
+
+    /// Cost model: Evaluate the computational efficiency of the candidate.
+    ///
+    /// Efficiency levels (higher is better):
+    ///   5 - No instruction:
+    ///       [Variable] or [Const]
+    ///   4 - One instruction with one variable:
+    ///       [Variable + Const] or [Variable * Const]
+    ///   3 - One instruction with two variables:
+    ///       [Variable + Variable] or [Variable * Variable]
+    ///   2 - Two instructions with one variable:
+    ///       [Const + Const * Variable]
+    ///   1 - Two instructions with two variables:
+    ///       [Variable + Const * Variable]
+    static unsigned getComputationEfficiency(Kind CandidateKind,
+                                             const ConstantInt *Index,
+                                             const Value *Stride,
+                                             const SCEV *Base = nullptr) {
+      bool IsConstantBase = false;
+      bool IsZeroBase = false;
+      // When evaluating the efficiency of a rewrite, if the Basis's SCEV is
+      // not available, conservatively assume the base is not constant.
+      if (auto *ConstBase = dyn_cast_or_null<SCEVConstant>(Base)) {
+        IsConstantBase = true;
+        IsZeroBase = ConstBase->getValue()->isZero();
+      }
+
+      bool IsConstantStride = isa<ConstantInt>(Stride);
+      bool IsZeroStride =
+          IsConstantStride && cast<ConstantInt>(Stride)->isZero();
+      // All constants
+      if (IsConstantBase && IsConstantStride)
+        return 5;
+
+      // [(Base + Index) * Stride]
+      if (CandidateKind == Mul) {
+        if (IsZeroStride)
+          return 5;
+        if (Index->isZero())
+          return (IsConstantStride || IsConstantBase) ? 4 : 3;
+
+        if (IsConstantBase)
+          return IsZeroBase && (Index->isOne() || Index->isMinusOne()) ? 5 : 4;
+
+        if (IsConstantStride) {
+          auto *CI = cast<ConstantInt>(Stride);
+          return (CI->isOne() || CI->isMinusOne()) ? 4 : 2;
+        }
+        return 1;
+      }
+
+      // Base + Index * Stride
+      assert(CandidateKind == Add || CandidateKind == GEP);
+      if (Index->isZero() || IsZeroStride)
+        return 5;
+
+      bool IsSimpleIndex = Index->isOne() || Index->isMinusOne();
+
+      if (IsConstantBase)
+        return IsZeroBase ? (IsSimpleIndex ? 5 : 4) : (IsSimpleIndex ? 4 : 2);
+
+      if (IsConstantStride)
+        return IsZeroStride ? 5 : 4;
+
+      if (IsSimpleIndex)
+        return 3;
+
+      return 1;
+    }
+
+    // Evaluate if the given delta is profitable to rewrite this candidate.
+    bool isProfitableRewrite(const Value *Delta, const DKind DeltaKind) const {
+      // This function cannot accurately evaluate the profit of whole expression
+      // with context. A candidate (B + I * S) cannot express whether this
+      // instruction needs to compute on its own (I * S), which may be shared
+      // with other candidates or may need instructions to compute.
+      // If the rewritten form has the same strength, still rewrite to
+      // (X + Delta) since it may expose more CSE opportunities on Delta, as
+      // unrolled loops usually have identical Delta for each unrolled body.
+      //
+      // Note, this function should only be used on Index Delta rewrite.
+      // Base and Stride delta need context info to evaluate the register
+      // pressure impact from variable delta.
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) <=
+             getRewriteProfit(Delta, DeltaKind);
+    }
+
+    // Evaluate the rewrite profit of this candidate with its Basis
+    unsigned getRewriteProfit() const {
+      return Basis ? getRewriteProfit(Delta, DeltaKind) : 0;
+    }
+
+    // Evaluate the rewrite profit of this candidate with a given delta
+    unsigned getRewriteProfit(const Value *Delta, const DKind DeltaKind) const {
+      switch (DeltaKind) {
+      case BaseDelta: // [X + Delta]
+        return getComputationEfficiency(
+            CandidateKind,
+            ConstantInt::get(cast<IntegerType>(Delta->getType()), 1), Delta);
+      case StrideDelta: // [X + Index * Delta]
+        return getComputationEfficiency(CandidateKind, Index, Delta);
+      case IndexDelta: // [X + Delta * Stride]
+        return getComputationEfficiency(CandidateKind, cast<ConstantInt>(Delta),
+                                        Stride);
+      default:
+        return 0;
+      }
+    }
+
+    bool isHighEfficiency() const {
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) >= 4;
+    }
   };
 
   bool runOnFunction(Function &F);
 
 private:
-  // Returns true if Basis is a basis for C, i.e., Basis dominates C and they
-  // share the same base and stride.
-  bool isBasisFor(const Candidate &Basis, const Candidate &C);
-
+  // Fetch straight-line basis for rewriting C, update C.Basis to point to it,
+  // and store the delta between C and its Basis in C.Delta.
+  void setBasisAndDeltaFor(Candidate &C);
   // Returns whether the candidate can be folded into an addressing mode.
-  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI,
-                  const DataLayout *DL);
-
-  // Returns true if C is already in a simplest form and not worth being
-  // rewritten.
-  bool isSimplestForm(const Candidate &C);
+  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI);
 
   // Checks whether I is in a candidate form. If so, adds all the matching forms
   // to Candidates, and tries to find the immediate basis for each of them.
@@ -216,12 +356,6 @@ class StraightLineStrengthReduce {
   // Allocate candidates and find bases for GetElementPtr instructions.
   void allocateCandidatesAndFindBasisForGEP(GetElementPtrInst *GEP);
 
-  // A helper function that scales Idx with ElementSize before invoking
-  // allocateCandidatesAndFindBasis.
-  void allocateCandidatesAndFindBasisForGEP(const SCEV *B, ConstantInt *Idx,
-                                            Value *S, uint64_t ElementSize,
-                                            Instruction *I);
-
   // Adds the given form <CT, B, Idx, S> to Candidates, and finds its immediate
   // basis.
   void allocateCandidatesAndFindBasis(Candidate::Kind CT, const SCEV *B,
@@ -231,12 +365,6 @@ class StraightLineStrengthReduce {
   // Rewrites candidate C with respect to Basis.
   void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);
 
-  // A helper function that factors ArrayIdx to a product of a stride and a
-  // constant index, and invokes allocateCandidatesAndFindBasis with the
-  // factorings.
-  void factorArrayIndex(Value *ArrayIdx, const SCEV *Base, uint64_t ElementSize,
-                        GetElementPtrInst *GEP);
-
   // Emit code that computes the "bump" from Basis to C.
   static Value *emitBump(const Candidate &Basis, const Candidate &C,
                          IRBuilder<> &Builder, const DataLayout *DL);
@@ -247,12 +375,205 @@ class StraightLineStrengthReduce {
   TargetTransformInfo *TTI = nullptr;
   std::list<Candidate> Candidates;
 
-  // Temporarily holds all instructions that are unlinked (but not deleted) by
-  // rewriteCandidateWithBasis. These instructions will be actually removed
-  // after all rewriting finishes.
-  std::vector<Instruction *> UnlinkedInstructions;
+  // Map from SCEV to instructions that represent the value,
+  // instructions are sorted in depth-first order.
+  DenseMap<const SCEV *, SmallSetVector<Instruction *, 2>> SCEVToInsts;
+
+  // Record the dependency between instructions. If C.Basis == B, we would have
+  // {B.Ins -> {C.Ins, ...}}.
+  MapVector<Instruction *, std::vector<Instruction *>> DependencyGraph;
+
+  // Map between each instruction and its possible candidates.
+  DenseMap<Instruction *, SmallVector<Candidate *, 3>> RewriteCandidates;
+
+  // All instructions that have candidates sort in topological order based on
+  // dependency graph, from roots to leaves.
+  std::vector<Instruction *> SortedCandidateInsts;
+
+  // Record all instructions that are already rewritten and will be removed
+  // later.
+  std::vector<Instruction *> DeadInstructions;
+
+  // Classify candidates against Delta kind
+  class CandidateDictTy {
+  public:
+    using CandsTy = SmallVector<Candidate *, 8>;
+    using BBToCandsTy = DenseMap<const BasicBlock *, CandsTy>;
+
+  private:
+    // Index delta Basis must have the same (Base, StrideSCEV, Inst.Type)
+    using IndexDeltaKeyTy = std::tuple<const SCEV *, const SCEV *, Type *>;
+    DenseMap<IndexDeltaKeyTy, BBToCandsTy> IndexDeltaCandidates;
+
+    // Base delta Basis must have the same (StrideSCEV, Index, Inst.Type)
+    using BaseDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<BaseDeltaKeyTy, BBToCandsTy> BaseDeltaCandidates;
+
+    // Stride delta Basis must have the same (Base, Index, Inst.Type)
+    using StrideDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<StrideDeltaKeyTy, BBToCandsTy> StrideDeltaCandidates;
+
+  public:
+    // TODO: Disable index delta on GEP after we completely move
+    // from typed GEP to PtrAdd.
+    const BBToCandsTy *getCandidatesWithDeltaKind(const Candidate &C,
+                                                  Candidate::DKind K) const {
+      assert(K != Candidate::InvalidDelta);
+      if (K == Candidate::IndexDelta) {
+        IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, C.Ins->getType());
+        auto It = IndexDeltaCandidates.find(IndexDeltaKey);
+        if (It != IndexDeltaCandidates.end())
+          return &It->second;
+      } else if (K == Candidate::BaseDelta) {
+        BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, C.Ins->getType());
+        auto It = BaseDeltaCandidates.find(BaseDeltaKey);
+        if (It != BaseDeltaCandidates.end())
+          return &It->second;
+      } else {
+        assert(K == Candidate::StrideDelta);
+        StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, C.Ins->getType());
+        auto It = StrideDeltaCandidates.find(StrideDeltaKey);
+        if (It != StrideDeltaCandidates.end())
+          return &It->second;
+      }
+      return nullptr;
+    }
+
+    // Pointers to C must remain valid until CandidateDict is cleared.
+    void add(Candidate &C) {
+      Type *ValueType = C.Ins->getType();
+      BasicBlock *BB = C.Ins->getParent();
+      IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, ValueType);
+      BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, ValueType);
+      StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, ValueType);
+      IndexDeltaCandidates[IndexDeltaKey][BB].push_back(&C);
+      BaseDeltaCandidates[BaseDeltaKey][BB].push_back(&C);
+      StrideDeltaCandidates[StrideDeltaKey][BB].push_back(&C);
+    }
+    // Remove all mappings from set
+    void clear() {
+      IndexDeltaCandidates.clear();
+      BaseDeltaCandidates.clear();
+      StrideDeltaCandidates.clear();
+    }
+  } CandidateDict;
+
+  const SCEV *getAndRecordSCEV(Value *V) {
+    auto *S = SE->getSCEV(V);
+    if (auto *I = dyn_cast<Instruction>(V))
+      if (!isa<SCEVCouldNotCompute>(S) && !isa<SCEVUnknown>(S) &&
+          !isa<SCEVConstant>(S))
+        SCEVToInsts[S].insert(I);
+
+    return S;
+  }
+
+  // Get the nearest instruction before CI that represents the value of S,
+  // return nullptr if no instruction is associated with S or S is not a
+  // reusable expression.
+  Value *getNearestValueOfSCEV(const SCEV *S, const Instruction *CI) const {
+    if (isa<SCEVCouldNotCompute>(S))
+      return nullptr;
+
+    if (auto *SU = dyn_cast<SCEVUnknown>(S))
+      return SU->getValue();
+    if (auto *SC = dyn_cast<SCEVConstant>(S))
+      return SC->getValue();
+
+    auto It = SCEVToInsts.find(S);
+    if (It == SCEVToInsts.end())
+      return nullptr;
+
+    for (Instruction *I : reverse(It->second))
+      if (DT->dominates(I, CI))
+        return I;
+
+    return nullptr;
+  }
+
+  struct DeltaInfo {
+    Candidate *Cand;
+    Candidate::DKind DeltaKind;
+    Value *Delta;
+
+    DeltaInfo()
+        : Cand(nullptr), DeltaKind(Candidate::InvalidDelta), Delta(nullptr) {}
+    DeltaInfo(Candidate *Cand, Candidate::DKind DeltaKind, Value *Delta)
+        : Cand(Cand), DeltaKind(DeltaKind), Delta(Delta) {}
+    operator bool() const { return Cand != nullptr; }
+  };
+
+  friend raw_ostream &operator<<(raw_ostream &OS, const DeltaInfo &DI);
+
+  DeltaInfo compressPath(Candidate &C, Candidate *Basis) const;
+
+  Candidate *pickRewriteCandidate(Instruction *I) const;
+  void sortCandidateInstructions();
+  static Constant *getIndexDelta(Candidate &C, Candidate &Basis);
+  static bool isSimilar(Candidate &C, Candidate &Basis, Candidate::DKind K);
+
+  // Add Basis -> C in DependencyGraph and propagate
+  // C.Stride and C.Delta's dependency to C
+  void addDependency(Candidate &C, Candidate *Basis) {
+    if (Basis)
+      DependencyGraph[Basis->Ins].emplace_back(C.Ins);
+
+    // If any candidate of Inst has a basis, then Inst will be rewritten,
+    // C must be rewritten after rewriting Inst, so we need to propagate
+    // the dependency to C
+    auto PropagateDependency = [&](Instruction *Inst) {
+      if (auto CandsIt = RewriteCandidates.find(Inst);
+          CandsIt != RewriteCandidates.end())
+        if (std::any_of(CandsIt->second.begin(), CandsIt->second.end(),
+                        [](Candidate *Cand) { return Cand->Basis; }))
+          DependencyGraph[Inst].emplace_back(C.Ins);
+    };
+
+    // If C has a variable delta and the delta is a candidate,
+    // propagate its dependency to C
+    if (auto *DeltaInst = dyn_cast_or_null<Instruction>(C.Delta))
+      PropagateDependency(DeltaInst);
+
+    // If the stride is a candidate, propagate its dependency to C
+    if (auto *StrideInst = dyn_cast<Instruction>(C.Stride))
+      PropagateDependency(StrideInst);
+  };
 };
 
+inline llvm::raw_ostream &
+operator<<(llvm::raw_ostream &OS,
+           const StraightLineStrengthReduce::Candidate &C) {
+  OS << "Ins: " << *C.Ins << "\n  Base: " << *C.Base
+     << "\n  Index: " << *C.Index << "\n  Stride: " << *C.Stride
+     << "\n  StrideSCEV: " << *C.StrideSCEV;
+  if (C.Basis)
+    OS << "\n  Delta: " << *C.Delta << "\n  Basis: \n  [ " << *C.Basis << " ]";
+  return OS;
+}
+
+LLVM_ATTRIBUTE_UNUSED
+inline llvm::raw_ostream &
+operator<<(llvm::raw_ostream &OS,
+           const StraightLineStrengthReduce::DeltaInfo &DI) {
+  OS << "Cand: " << *DI.Cand << "\n";
+  OS << "Delta Kind: ";
+  switch (DI.DeltaKind) {
+  case StraightLineStrengthReduce::Candidate::IndexDelta:
+    OS << "Index";
+    break;
+  case StraightLineStrengthReduce::Candidate::BaseDelta:
+    OS << "Base";
+    break;
+  case StraightLineStrengthReduce::Candidate::StrideDelta:
+    OS << "Stride";
+    break;
+  default:
+    break;
+  }
+  OS << "\nDelta: " << *DI.Delta;
+  return OS;
+}
+
 } // end anonymous namespace
 
 char StraightLineStrengthReduceLegacyPass::ID = 0;
@@ -269,17 +590,284 @@ FunctionPass *llvm::createStraightLineStrengthReducePass() {
   return new StraightLineStrengthReduceLegacyPass();
 }
 
-bool StraightLineStrengthReduce::isBasisFor(const Candidate &Basis,
-                                            const Candidate &C) {
-  return (Basis.Ins != C.Ins && // skip the same instruction
-          // They must have the same type too. Basis.Base == C.Base
-          // doesn't guarantee thei...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Fei Peng (fiigii)

Changes

This PR implements parts of #162376

  • Broader equivalence than constant index deltas:
    • Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas.
    • Reuse enabled for both constant and variable deltas when an available IR value dominates the user.
  • Dominance-aware dictionary instead of linear scans:
    • Tuple-keyed candidate dictionary grouped by basic block.
    • Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically.
  • Simple cost model and best-rewrite selection:
    • Score candidate expressions and rewrites; select the highest-profit rewrite per instruction.
    • Skip rewriting when expressions are already foldable or high-efficiency.
  • Path compression for better ILP:
    • Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths.
  • Dependency-aware rewrite ordering:
    • Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order.
    • This dependency graph will be needed by the next PR that adds partial strength reduction.

Patch is 163.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162930.diff

17 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp (+833-271)
  • (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+9-11)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-reassociate-bug.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/idot2.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/idot4s.ll (+82-79)
  • (modified) llvm/test/CodeGen/AMDGPU/idot8u.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+230-231)
  • (modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vscnt.ll (+135-200)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/pr23975.ll (+1-1)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/AMDGPU/reassociate-geps-and-slsr-addrspace.ll (+5-5)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-i8-gep.ll (+156)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/NVPTX/slsr-var-delta.ll (+49)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/path-compression.ll (+35)
  • (added) llvm/test/Transforms/StraightLineStrengthReduce/pick-candidate.ll (+32)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-add.ll (+96)
  • (modified) llvm/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll (+148)
diff --git a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
index 7d017095c88ce..e6d1b168cf69d 100644
--- a/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
@@ -12,17 +12,16 @@
 // effective in simplifying arithmetic statements derived from an unrolled loop.
 // It can also simplify the logic of SeparateConstOffsetFromGEP.
 //
-// There are many optimizations we can perform in the domain of SLSR. This file
-// for now contains only an initial step. Specifically, we look for strength
-// reduction candidates in the following forms:
+// There are many optimizations we can perform in the domain of SLSR.
+// We look for strength reduction candidates in the following forms:
 //
-// Form 1: B + i * S
-// Form 2: (B + i) * S
-// Form 3: &B[i * S]
+// Form Add: B + i * S
+// Form Mul: (B + i) * S
+// Form GEP: &B[i * S]
 //
 // where S is an integer variable, and i is a constant integer. If we found two
 // candidates S1 and S2 in the same form and S1 dominates S2, we may rewrite S2
-// in a simpler way with respect to S1. For example,
+// in a simpler way with respect to S1 (index delta). For example,
 //
 // S1: X = B + i * S
 // S2: Y = B + i' * S   => X + (i' - i) * S
@@ -35,8 +34,29 @@
 //
 // Note: (i' - i) * S is folded to the extent possible.
 //
+// For form Add and GEP, we can also rewrite a candidate in a simpler way
+// with respect to other dominating candidates if their B or S are different
+// but other parts are the same. For example,
+//
+// Base Delta:
+// S1: X = B  + i * S
+// S2: Y = B' + i * S   => X + (B' - B)
+//
+// S1: X = &B [i * S]
+// S2: Y = &B'[i * S]   => X + (B' - B)
+//
+// Stride Delta:
+// S1: X = B + i * S
+// S2: Y = B + i * S'   => X + i * (S' - S)
+//
+// S1: X = &B[i * S]
+// S2: Y = &B[i * S']   => X + i * (S' - S)
+//
+// PS: Stride delta write on form Mul is usually non-profitable, and Base delta
+// write sometimes is profitable, so we do not support them on form Mul.
+//
 // This rewriting is in general a good idea. The code patterns we focus on
-// usually come from loop unrolling, so (i' - i) * S is likely the same
+// usually come from loop unrolling, so the delta is likely the same
 // across iterations and can be reused. When that happens, the optimized form
 // takes only one add starting from the second iteration.
 //
@@ -47,19 +67,14 @@
 // TODO:
 //
 // - Floating point arithmetics when fast math is enabled.
-//
-// - SLSR may decrease ILP at the architecture level. Targets that are very
-//   sensitive to ILP may want to disable it. Having SLSR to consider ILP is
-//   left as future work.
-//
-// - When (i' - i) is constant but i and i' are not, we could still perform
-//   SLSR.
 
 #include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/DepthFirstIterator.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ScalarEvolution.h"
+#include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Constants.h"
@@ -86,11 +101,14 @@
 #include <cstdint>
 #include <limits>
 #include <list>
+#include <queue>
 #include <vector>
 
 using namespace llvm;
 using namespace PatternMatch;
 
+#define DEBUG_TYPE "slsr"
+
 static const unsigned UnknownAddressSpace =
     std::numeric_limits<unsigned>::max();
 
@@ -142,15 +160,23 @@ class StraightLineStrengthReduce {
       GEP,     // &B[..][i * S][..]
     };
 
+    enum DKind {
+      InvalidDelta, // reserved for the default constructor
+      IndexDelta,   // Delta is a constant from Index
+      BaseDelta,    // Delta is a constant or variable from Base
+      StrideDelta,  // Delta is a constant or variable from Stride
+    };
+
     Candidate() = default;
     Candidate(Kind CT, const SCEV *B, ConstantInt *Idx, Value *S,
-              Instruction *I)
-        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I) {}
+              Instruction *I, const SCEV *StrideSCEV)
+        : CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I),
+          StrideSCEV(StrideSCEV) {}
 
     Kind CandidateKind = Invalid;
 
     const SCEV *Base = nullptr;
-
+    // TODO: Swap Index and Stride's name.
     // Note that Index and Stride of a GEP candidate do not necessarily have the
     // same integer type. In that case, during rewriting, Stride will be
     // sign-extended or truncated to Index's type.
@@ -177,22 +203,136 @@ class StraightLineStrengthReduce {
     // Points to the immediate basis of this candidate, or nullptr if we cannot
     // find any basis for this candidate.
     Candidate *Basis = nullptr;
+
+    DKind DeltaKind = InvalidDelta;
+
+    // Store SCEV of Stride to compute delta from different strides
+    const SCEV *StrideSCEV = nullptr;
+
+    // Points to (Y - X) that will be used to rewrite this candidate.
+    Value *Delta = nullptr;
+
+    /// Cost model: Evaluate the computational efficiency of the candidate.
+    ///
+    /// Efficiency levels (higher is better):
+    ///   5 - No instruction:
+    ///       [Variable] or [Const]
+    ///   4 - One instruction with one variable:
+    ///       [Variable + Const] or [Variable * Const]
+    ///   3 - One instruction with two variables:
+    ///       [Variable + Variable] or [Variable * Variable]
+    ///   2 - Two instructions with one variable:
+    ///       [Const + Const * Variable]
+    ///   1 - Two instructions with two variables:
+    ///       [Variable + Const * Variable]
+    static unsigned getComputationEfficiency(Kind CandidateKind,
+                                             const ConstantInt *Index,
+                                             const Value *Stride,
+                                             const SCEV *Base = nullptr) {
+      bool IsConstantBase = false;
+      bool IsZeroBase = false;
+      // When evaluating the efficiency of a rewrite, if the Basis's SCEV is
+      // not available, conservatively assume the base is not constant.
+      if (auto *ConstBase = dyn_cast_or_null<SCEVConstant>(Base)) {
+        IsConstantBase = true;
+        IsZeroBase = ConstBase->getValue()->isZero();
+      }
+
+      bool IsConstantStride = isa<ConstantInt>(Stride);
+      bool IsZeroStride =
+          IsConstantStride && cast<ConstantInt>(Stride)->isZero();
+      // All constants
+      if (IsConstantBase && IsConstantStride)
+        return 5;
+
+      // [(Base + Index) * Stride]
+      if (CandidateKind == Mul) {
+        if (IsZeroStride)
+          return 5;
+        if (Index->isZero())
+          return (IsConstantStride || IsConstantBase) ? 4 : 3;
+
+        if (IsConstantBase)
+          return IsZeroBase && (Index->isOne() || Index->isMinusOne()) ? 5 : 4;
+
+        if (IsConstantStride) {
+          auto *CI = cast<ConstantInt>(Stride);
+          return (CI->isOne() || CI->isMinusOne()) ? 4 : 2;
+        }
+        return 1;
+      }
+
+      // Base + Index * Stride
+      assert(CandidateKind == Add || CandidateKind == GEP);
+      if (Index->isZero() || IsZeroStride)
+        return 5;
+
+      bool IsSimpleIndex = Index->isOne() || Index->isMinusOne();
+
+      if (IsConstantBase)
+        return IsZeroBase ? (IsSimpleIndex ? 5 : 4) : (IsSimpleIndex ? 4 : 2);
+
+      if (IsConstantStride)
+        return IsZeroStride ? 5 : 4;
+
+      if (IsSimpleIndex)
+        return 3;
+
+      return 1;
+    }
+
+    // Evaluate if the given delta is profitable to rewrite this candidate.
+    bool isProfitableRewrite(const Value *Delta, const DKind DeltaKind) const {
+      // This function cannot accurately evaluate the profit of whole expression
+      // with context. A candidate (B + I * S) cannot express whether this
+      // instruction needs to compute on its own (I * S), which may be shared
+      // with other candidates or may need instructions to compute.
+      // If the rewritten form has the same strength, still rewrite to
+      // (X + Delta) since it may expose more CSE opportunities on Delta, as
+      // unrolled loops usually have identical Delta for each unrolled body.
+      //
+      // Note, this function should only be used on Index Delta rewrite.
+      // Base and Stride delta need context info to evaluate the register
+      // pressure impact from variable delta.
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) <=
+             getRewriteProfit(Delta, DeltaKind);
+    }
+
+    // Evaluate the rewrite profit of this candidate with its Basis
+    unsigned getRewriteProfit() const {
+      return Basis ? getRewriteProfit(Delta, DeltaKind) : 0;
+    }
+
+    // Evaluate the rewrite profit of this candidate with a given delta
+    unsigned getRewriteProfit(const Value *Delta, const DKind DeltaKind) const {
+      switch (DeltaKind) {
+      case BaseDelta: // [X + Delta]
+        return getComputationEfficiency(
+            CandidateKind,
+            ConstantInt::get(cast<IntegerType>(Delta->getType()), 1), Delta);
+      case StrideDelta: // [X + Index * Delta]
+        return getComputationEfficiency(CandidateKind, Index, Delta);
+      case IndexDelta: // [X + Delta * Stride]
+        return getComputationEfficiency(CandidateKind, cast<ConstantInt>(Delta),
+                                        Stride);
+      default:
+        return 0;
+      }
+    }
+
+    bool isHighEfficiency() const {
+      return getComputationEfficiency(CandidateKind, Index, Stride, Base) >= 4;
+    }
   };
 
   bool runOnFunction(Function &F);
 
 private:
-  // Returns true if Basis is a basis for C, i.e., Basis dominates C and they
-  // share the same base and stride.
-  bool isBasisFor(const Candidate &Basis, const Candidate &C);
-
+  // Fetch straight-line basis for rewriting C, update C.Basis to point to it,
+  // and store the delta between C and its Basis in C.Delta.
+  void setBasisAndDeltaFor(Candidate &C);
   // Returns whether the candidate can be folded into an addressing mode.
-  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI,
-                  const DataLayout *DL);
-
-  // Returns true if C is already in a simplest form and not worth being
-  // rewritten.
-  bool isSimplestForm(const Candidate &C);
+  bool isFoldable(const Candidate &C, TargetTransformInfo *TTI);
 
   // Checks whether I is in a candidate form. If so, adds all the matching forms
   // to Candidates, and tries to find the immediate basis for each of them.
@@ -216,12 +356,6 @@ class StraightLineStrengthReduce {
   // Allocate candidates and find bases for GetElementPtr instructions.
   void allocateCandidatesAndFindBasisForGEP(GetElementPtrInst *GEP);
 
-  // A helper function that scales Idx with ElementSize before invoking
-  // allocateCandidatesAndFindBasis.
-  void allocateCandidatesAndFindBasisForGEP(const SCEV *B, ConstantInt *Idx,
-                                            Value *S, uint64_t ElementSize,
-                                            Instruction *I);
-
   // Adds the given form <CT, B, Idx, S> to Candidates, and finds its immediate
   // basis.
   void allocateCandidatesAndFindBasis(Candidate::Kind CT, const SCEV *B,
@@ -231,12 +365,6 @@ class StraightLineStrengthReduce {
   // Rewrites candidate C with respect to Basis.
   void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);
 
-  // A helper function that factors ArrayIdx to a product of a stride and a
-  // constant index, and invokes allocateCandidatesAndFindBasis with the
-  // factorings.
-  void factorArrayIndex(Value *ArrayIdx, const SCEV *Base, uint64_t ElementSize,
-                        GetElementPtrInst *GEP);
-
   // Emit code that computes the "bump" from Basis to C.
   static Value *emitBump(const Candidate &Basis, const Candidate &C,
                          IRBuilder<> &Builder, const DataLayout *DL);
@@ -247,12 +375,205 @@ class StraightLineStrengthReduce {
   TargetTransformInfo *TTI = nullptr;
   std::list<Candidate> Candidates;
 
-  // Temporarily holds all instructions that are unlinked (but not deleted) by
-  // rewriteCandidateWithBasis. These instructions will be actually removed
-  // after all rewriting finishes.
-  std::vector<Instruction *> UnlinkedInstructions;
+  // Map from SCEV to instructions that represent the value,
+  // instructions are sorted in depth-first order.
+  DenseMap<const SCEV *, SmallSetVector<Instruction *, 2>> SCEVToInsts;
+
+  // Record the dependency between instructions. If C.Basis == B, we would have
+  // {B.Ins -> {C.Ins, ...}}.
+  MapVector<Instruction *, std::vector<Instruction *>> DependencyGraph;
+
+  // Map between each instruction and its possible candidates.
+  DenseMap<Instruction *, SmallVector<Candidate *, 3>> RewriteCandidates;
+
+  // All instructions that have candidates sort in topological order based on
+  // dependency graph, from roots to leaves.
+  std::vector<Instruction *> SortedCandidateInsts;
+
+  // Record all instructions that are already rewritten and will be removed
+  // later.
+  std::vector<Instruction *> DeadInstructions;
+
+  // Classify candidates against Delta kind
+  class CandidateDictTy {
+  public:
+    using CandsTy = SmallVector<Candidate *, 8>;
+    using BBToCandsTy = DenseMap<const BasicBlock *, CandsTy>;
+
+  private:
+    // Index delta Basis must have the same (Base, StrideSCEV, Inst.Type)
+    using IndexDeltaKeyTy = std::tuple<const SCEV *, const SCEV *, Type *>;
+    DenseMap<IndexDeltaKeyTy, BBToCandsTy> IndexDeltaCandidates;
+
+    // Base delta Basis must have the same (StrideSCEV, Index, Inst.Type)
+    using BaseDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<BaseDeltaKeyTy, BBToCandsTy> BaseDeltaCandidates;
+
+    // Stride delta Basis must have the same (Base, Index, Inst.Type)
+    using StrideDeltaKeyTy = std::tuple<const SCEV *, ConstantInt *, Type *>;
+    DenseMap<StrideDeltaKeyTy, BBToCandsTy> StrideDeltaCandidates;
+
+  public:
+    // TODO: Disable index delta on GEP after we completely move
+    // from typed GEP to PtrAdd.
+    const BBToCandsTy *getCandidatesWithDeltaKind(const Candidate &C,
+                                                  Candidate::DKind K) const {
+      assert(K != Candidate::InvalidDelta);
+      if (K == Candidate::IndexDelta) {
+        IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, C.Ins->getType());
+        auto It = IndexDeltaCandidates.find(IndexDeltaKey);
+        if (It != IndexDeltaCandidates.end())
+          return &It->second;
+      } else if (K == Candidate::BaseDelta) {
+        BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, C.Ins->getType());
+        auto It = BaseDeltaCandidates.find(BaseDeltaKey);
+        if (It != BaseDeltaCandidates.end())
+          return &It->second;
+      } else {
+        assert(K == Candidate::StrideDelta);
+        StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, C.Ins->getType());
+        auto It = StrideDeltaCandidates.find(StrideDeltaKey);
+        if (It != StrideDeltaCandidates.end())
+          return &It->second;
+      }
+      return nullptr;
+    }
+
+    // Pointers to C must remain valid until CandidateDict is cleared.
+    void add(Candidate &C) {
+      Type *ValueType = C.Ins->getType();
+      BasicBlock *BB = C.Ins->getParent();
+      IndexDeltaKeyTy IndexDeltaKey(C.Base, C.StrideSCEV, ValueType);
+      BaseDeltaKeyTy BaseDeltaKey(C.StrideSCEV, C.Index, ValueType);
+      StrideDeltaKeyTy StrideDeltaKey(C.Base, C.Index, ValueType);
+      IndexDeltaCandidates[IndexDeltaKey][BB].push_back(&C);
+      BaseDeltaCandidates[BaseDeltaKey][BB].push_back(&C);
+      StrideDeltaCandidates[StrideDeltaKey][BB].push_back(&C);
+    }
+    // Remove all mappings from set
+    void clear() {
+      IndexDeltaCandidates.clear();
+      BaseDeltaCandidates.clear();
+      StrideDeltaCandidates.clear();
+    }
+  } CandidateDict;
+
+  const SCEV *getAndRecordSCEV(Value *V) {
+    auto *S = SE->getSCEV(V);
+    if (auto *I = dyn_cast<Instruction>(V))
+      if (!isa<SCEVCouldNotCompute>(S) && !isa<SCEVUnknown>(S) &&
+          !isa<SCEVConstant>(S))
+        SCEVToInsts[S].insert(I);
+
+    return S;
+  }
+
+  // Get the nearest instruction before CI that represents the value of S,
+  // return nullptr if no instruction is associated with S or S is not a
+  // reusable expression.
+  Value *getNearestValueOfSCEV(const SCEV *S, const Instruction *CI) const {
+    if (isa<SCEVCouldNotCompute>(S))
+      return nullptr;
+
+    if (auto *SU = dyn_cast<SCEVUnknown>(S))
+      return SU->getValue();
+    if (auto *SC = dyn_cast<SCEVConstant>(S))
+      return SC->getValue();
+
+    auto It = SCEVToInsts.find(S);
+    if (It == SCEVToInsts.end())
+      return nullptr;
+
+    for (Instruction *I : reverse(It->second))
+      if (DT->dominates(I, CI))
+        return I;
+
+    return nullptr;
+  }
+
+  struct DeltaInfo {
+    Candidate *Cand;
+    Candidate::DKind DeltaKind;
+    Value *Delta;
+
+    DeltaInfo()
+        : Cand(nullptr), DeltaKind(Candidate::InvalidDelta), Delta(nullptr) {}
+    DeltaInfo(Candidate *Cand, Candidate::DKind DeltaKind, Value *Delta)
+        : Cand(Cand), DeltaKind(DeltaKind), Delta(Delta) {}
+    operator bool() const { return Cand != nullptr; }
+  };
+
+  friend raw_ostream &operator<<(raw_ostream &OS, const DeltaInfo &DI);
+
+  DeltaInfo compressPath(Candidate &C, Candidate *Basis) const;
+
+  Candidate *pickRewriteCandidate(Instruction *I) const;
+  void sortCandidateInstructions();
+  static Constant *getIndexDelta(Candidate &C, Candidate &Basis);
+  static bool isSimilar(Candidate &C, Candidate &Basis, Candidate::DKind K);
+
+  // Add Basis -> C in DependencyGraph and propagate
+  // C.Stride and C.Delta's dependency to C
+  void addDependency(Candidate &C, Candidate *Basis) {
+    if (Basis)
+      DependencyGraph[Basis->Ins].emplace_back(C.Ins);
+
+    // If any candidate of Inst has a basis, then Inst will be rewritten,
+    // C must be rewritten after rewriting Inst, so we need to propagate
+    // the dependency to C
+    auto PropagateDependency = [&](Instruction *Inst) {
+      if (auto CandsIt = RewriteCandidates.find(Inst);
+          CandsIt != RewriteCandidates.end())
+        if (std::any_of(CandsIt->second.begin(), CandsIt->second.end(),
+                        [](Candidate *Cand) { return Cand->Basis; }))
+          DependencyGraph[Inst].emplace_back(C.Ins);
+    };
+
+    // If C has a variable delta and the delta is a candidate,
+    // propagate its dependency to C
+    if (auto *DeltaInst = dyn_cast_or_null<Instruction>(C.Delta))
+      PropagateDependency(DeltaInst);
+
+    // If the stride is a candidate, propagate its dependency to C
+    if (auto *StrideInst = dyn_cast<Instruction>(C.Stride))
+      PropagateDependency(StrideInst);
+  };
 };
 
+inline llvm::raw_ostream &
+operator<<(llvm::raw_ostream &OS,
+           const StraightLineStrengthReduce::Candidate &C) {
+  OS << "Ins: " << *C.Ins << "\n  Base: " << *C.Base
+     << "\n  Index: " << *C.Index << "\n  Stride: " << *C.Stride
+     << "\n  StrideSCEV: " << *C.StrideSCEV;
+  if (C.Basis)
+    OS << "\n  Delta: " << *C.Delta << "\n  Basis: \n  [ " << *C.Basis << " ]";
+  return OS;
+}
+
+LLVM_ATTRIBUTE_UNUSED
+inline llvm::raw_ostream &
+operator<<(llvm::raw_ostream &OS,
+           const StraightLineStrengthReduce::DeltaInfo &DI) {
+  OS << "Cand: " << *DI.Cand << "\n";
+  OS << "Delta Kind: ";
+  switch (DI.DeltaKind) {
+  case StraightLineStrengthReduce::Candidate::IndexDelta:
+    OS << "Index";
+    break;
+  case StraightLineStrengthReduce::Candidate::BaseDelta:
+    OS << "Base";
+    break;
+  case StraightLineStrengthReduce::Candidate::StrideDelta:
+    OS << "Stride";
+    break;
+  default:
+    break;
+  }
+  OS << "\nDelta: " << *DI.Delta;
+  return OS;
+}
+
 } // end anonymous namespace
 
 char StraightLineStrengthReduceLegacyPass::ID = 0;
@@ -269,17 +590,284 @@ FunctionPass *llvm::createStraightLineStrengthReducePass() {
   return new StraightLineStrengthReduceLegacyPass();
 }
 
-bool StraightLineStrengthReduce::isBasisFor(const Candidate &Basis,
-                                            const Candidate &C) {
-  return (Basis.Ins != C.Ins && // skip the same instruction
-          // They must have the same type too. Basis.Base == C.Base
-          // doesn't guarantee thei...
[truncated]

@github-actions
Copy link

github-actions bot commented Oct 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fiigii
Copy link
Contributor Author

fiigii commented Oct 13, 2025

@arsenm Thank you for reviewing. I have addressed the comments, PTAL

@fiigii
Copy link
Contributor Author

fiigii commented Oct 14, 2025

Kindly ping @arsenm for reviewing.

Copy link
Contributor

@dakersnar dakersnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, this seems like a big improvement. LGTM, with the caveat that I'm somewhat new to this pass and I would recommend getting a second approval.

@fiigii
Copy link
Contributor Author

fiigii commented Oct 21, 2025

@arsenm @Artem-B Does the PR look good to you?

@fiigii fiigii force-pushed the dev/feip/slsr branch 2 times, most recently from 48cc008 to 90710b5 Compare October 22, 2025 18:54
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a handful of minor cosmetic/style nits.

No opinion on the substance of the pass changes. Based on the test diffs, it appears to be a wash in terms of code size for AMDGPU. It may be useful to extract the tests into a separate patch and submit it first, so it's easier to see the effect of this patch on the generated code.

@fiigii
Copy link
Contributor Author

fiigii commented Oct 28, 2025

@Artem-B Thank you for reviewing. I have addressed your comments, PTAL.

I’ve reorganized the commits

  1. the minor AMDGPU test changes are grouped into one commit, you may ignore this commit
  2. the new SLSR and NVPTX changes are in another. Their checks match the IR/Codegen without this PR
  3. the main commit now only includes the meaningful new and existing test updates. So, you can focus on this commit.

I hope this structure makes it clearer for you to see the impact of the PR during your review.

yzhang93 added a commit to iree-org/iree that referenced this pull request Nov 25, 2025
- Still carrying revert from last week
llvm/llvm-project#162930
- New reverts of llvm/llvm-project#168933 and
llvm/llvm-project#168928 because of bazel build
failure. There are some discussions on discord
https://discord.com/channels/689900678990135345/1080178290188374049/1442537425086709772.

---------

Signed-off-by: yzhang93 <[email protected]>
yzhang93 pushed a commit to iree-org/llvm-project that referenced this pull request Nov 25, 2025
@akuegel
Copy link
Member

akuegel commented Nov 25, 2025

@fiigii I have a reproducer where this change seems to introduce wrong behavior:

target datalayout = "e-p6:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define ptx_kernel void @loop_subtract_fusion(ptr noalias readonly align 16 captures(none) dereferenceable(12) %0, ptr noalias writeonly align 256 captures(none) dereferenceable(15) %1, i32 %row, i32 %column) {
  %3 = addrspacecast ptr %0 to ptr addrspace(1)
  %4 = addrspacecast ptr %1 to ptr addrspace(1)
  %8 = icmp samesign ult i32 %column, 4
  %9 = shl nuw nsw i32 %row, 2
  %10 = or disjoint i32 %9, %column
  %11 = zext nneg i32 %10 to i64
  %12 = getelementptr inbounds i8, ptr addrspace(1) %3, i64 %11
  br i1 %8, label %13, label %15

13:                                               ; preds = %2
  %14 = load i8, ptr addrspace(1) %12, align 1
  br label %15

15:                                               ; preds = %13, %2
  %16 = phi i8 [ %14, %13 ], [ 1, %2 ]
  %.not = icmp eq i32 %column, 0
  %17 = shl nuw nsw i32 %row, 2
  %18 = add nuw nsw i32 %17, %column
  %19 = zext nneg i32 %18 to i64
  %20 = getelementptr i8, ptr addrspace(1) %3, i64 %19
  %21 = getelementptr i8, ptr addrspace(1) %20, i64 -1
  br i1 %.not, label %24, label %22

22:                                               ; preds = %15
  %23 = load i8, ptr addrspace(1) %21, align 1
  br label %24

24:                                               ; preds = %22, %15
  ret void
}

When you run the opt tool on this IR with -passes=slsr -S arguments, you can see that it will reuse the GEP %12 for %21. It does this under the assumption that %10 and %18 are equivalent. However that ignores the control flow, and in fact the unoptimized LLVM IR has add instead of disjoint or, and InstCombine pass turns it into disjoint or because the usage of the result is guarded in a way that makes this transformation correct. However by reusing the GEP that is based on disjoint or calculation in a different block, it becomes incorrect. Control flow has to be taken into account when making use of disjoint or.

yzhang93 pushed a commit to iree-org/llvm-project that referenced this pull request Nov 25, 2025
lialan added a commit to lialan/llvm-project that referenced this pull request Nov 25, 2025
yzhang93 added a commit to iree-org/iree that referenced this pull request Nov 25, 2025
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 25, 2025
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 25, 2025
Still carrying yesterday's revert of:
llvm/llvm-project#162930

Per the bug, we've identify it as an intermittent, non-deterministic
issue: iree-org#22649
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 25, 2025
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 25, 2025
- Still carrying revert from last week
llvm/llvm-project#162930
- New reverts of llvm/llvm-project#168933 and
llvm/llvm-project#168928 because of bazel build
failure. There are some discussions on discord
https://discord.com/channels/689900678990135345/1080178290188374049/1442537425086709772.

---------

Signed-off-by: yzhang93 <[email protected]>
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 25, 2025
lialan added a commit that referenced this pull request Nov 25, 2025
…169546)

This reverts commit f67409c.

cc @fiigii 
Including us, several separate groups are experiencing regressions with
this change. This is the smallest reproducer pasted by @akuegel :
#162930 (comment)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 25, 2025
…#162930)" (#169546)

This reverts commit f67409c.

cc @fiigii
Including us, several separate groups are experiencing regressions with
this change. This is the smallest reproducer pasted by @akuegel :
llvm/llvm-project#162930 (comment)
@fiigii
Copy link
Contributor Author

fiigii commented Nov 25, 2025

@akuegel Thank you for reporting and analyzing the issue. I did not know SCEV 's equivalent check is so optimistic. Let me investigate how to prevent it.

Btw, the original SLSR also has this problem, but my change triggered this issue with more cases. I believe the example below is still miscompiled even if reverting my change.

; RUN: opt < %s -passes=slsr -S | FileCheck %s
target datalayout = "e-p6:32:32-i64:64-i128:128-i256:256-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define ptx_kernel void @loop_subtract_fusion(i64 %0, ptr  writeonly align 256 captures(none) dereferenceable(15) %1, i32 %row, i32 %column) {
  %3 = icmp samesign ult i32 %column, 4
  %4 = shl nuw nsw i32 %row, 2
  %5 = or disjoint i32 %4, %column
  %6 = zext nneg i32 %5 to i64
  %7 = add i64 %0, %6
  br i1 %3, label %8, label %10

8:                                                ; preds = %2
  %9 = call i64 @foo(i64 %7)
  br label %10

10:                                               ; preds = %8, %2
  %11 = phi i64 [ %9, %8 ], [ 1, %2 ]
  %.not = icmp eq i32 %column, 0
  %12 = shl nuw nsw i32 %row, 2
  %13 = add nuw nsw i32 %12, %column
  %14 = zext nneg i32 %13 to i64
  %15 = mul i64 %0, 4
  %16 = add i64 %15, %14
  %17 = add i64 %16, -1
  br i1 %.not, label %20, label %18

18:                                               ; preds = %10
  %19 = call i64 @bar(i64 %17)
  br label %20

20:                                               ; preds = %17, %10
  ret void
}

declare i64 @foo(i64)
declare i64 @bar(i64)

fiigii added a commit to fiigii/llvm-project that referenced this pull request Nov 26, 2025
This PR implements parts of
llvm#162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
- **Path compression for better ILP**:
- Compress chains of rewrites to a deeper dominating basis when a
constant delta exists along the path, reducing dependent bumps on
critical paths.
- **Dependency-aware rewrite ordering**:
- Build a dependency graph (basis, stride, variable delta producers) and
rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial
strength reduction.
lialan pushed a commit to iree-org/llvm-project that referenced this pull request Nov 27, 2025
This PR implements parts of
llvm#162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
- **Path compression for better ILP**:
- Compress chains of rewrites to a deeper dominating basis when a
constant delta exists along the path, reducing dependent bumps on
critical paths.
- **Dependency-aware rewrite ordering**:
- Build a dependency graph (basis, stride, variable delta producers) and
rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial
strength reduction.
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
Still carrying yesterday's revert of:
llvm/llvm-project#162930

Per the bug, we've identify it as an intermittent, non-deterministic
issue: iree-org#22649
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
- Still carrying revert from last week
llvm/llvm-project#162930
- New reverts of llvm/llvm-project#168933 and
llvm/llvm-project#168928 because of bazel build
failure. There are some discussions on discord
https://discord.com/channels/689900678990135345/1080178290188374049/1442537425086709772.

---------

Signed-off-by: yzhang93 <[email protected]>
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Dec 3, 2025
…" (llvm#169546)

This reverts commit f67409c.

cc @fiigii 
Including us, several separate groups are experiencing regressions with
this change. This is the smallest reproducer pasted by @akuegel :
llvm#162930 (comment)
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…" (llvm#169546)

This reverts commit f67409c.

cc @fiigii 
Including us, several separate groups are experiencing regressions with
this change. This is the smallest reproducer pasted by @akuegel :
llvm#162930 (comment)
dakersnar pushed a commit that referenced this pull request Dec 8, 2025
…169614)

This PR implements parts of
#162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
- **Path compression for better ILP**:
- Compress chains of rewrites to a deeper dominating basis when a
constant delta exists along the path, reducing dependent bumps on
critical paths.
- **Dependency-aware rewrite ordering**:
- Build a dependency graph (basis, stride, variable delta producers) and
rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial
strength reduction.
- **Correctness enhencment**
- Fix a correctness issue that reusing instructions with the same SCEV
may introduce poison.

---------

Co-authored-by: Kazu Hirata <[email protected]>
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
…" (llvm#169614)

This PR implements parts of
llvm#162376

- **Broader equivalence than constant index deltas**:
- Add Base-delta and Stride-delta matching for Add and GEP forms using
ScalarEvolution deltas.
- Reuse enabled for both constant and variable deltas when an available
IR value dominates the user.
- **Dominance-aware dictionary instead of linear scans**:
  - Tuple-keyed candidate dictionary grouped by basic block.
- Walk the immediate-dominator chain to find the nearest dominating
basis quickly and deterministically.
- **Simple cost model and best-rewrite selection**:
- Score candidate expressions and rewrites; select the highest-profit
rewrite per instruction.
- Skip rewriting when expressions are already foldable or
high-efficiency.
- **Path compression for better ILP**:
- Compress chains of rewrites to a deeper dominating basis when a
constant delta exists along the path, reducing dependent bumps on
critical paths.
- **Dependency-aware rewrite ordering**:
- Build a dependency graph (basis, stride, variable delta producers) and
rewrite in topological order.
- This dependency graph will be needed by the next PR that adds partial
strength reduction.
- **Correctness enhencment**
- Fix a correctness issue that reusing instructions with the same SCEV
may introduce poison.

---------

Co-authored-by: Kazu Hirata <[email protected]>
bjacob added a commit to iree-org/llvm-project that referenced this pull request Dec 17, 2025
bjacob added a commit to iree-org/llvm-project that referenced this pull request Dec 17, 2025
bjacob added a commit to iree-org/llvm-project that referenced this pull request Dec 18, 2025
bjacob added a commit to iree-org/llvm-project that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants