-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[DA] disambiguate evolution of base addresses #116628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-testing-tools @llvm/pr-subscribers-llvm-analysis Author: Sebastian Pop (sebpop) ChangesThis patch fixes two bugs: The dependence analysis assumes that the base address of array accesses is invariant across loop iterations. In both bugs the base address evolves following loop iterations: the base address flip-flops between two different memory objects. Based on the scalar evolution of base addresses, the patch adds code to separate the 3 alias cases {must, no, may}-alias where the base addresses are identical at every iteration, never the same, and unknown. Full diff: https://github.com/llvm/llvm-project/pull/116628.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index a4a98ea0bae146..cd7dd481c16513 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -712,10 +712,60 @@ void Dependence::dump(raw_ostream &OS) const {
// tbaa, non-overlapping regions etc), then it is known there is no dependecy.
// Otherwise the underlying objects are checked to see if they point to
// different identifiable objects.
-static AliasResult underlyingObjectsAlias(AAResults *AA,
- const DataLayout &DL,
- const MemoryLocation &LocA,
- const MemoryLocation &LocB) {
+static AliasResult underlyingObjectsAlias(AAResults *AA, LoopInfo *LI,
+ ScalarEvolution *SE, Instruction *A,
+ Instruction *B) {
+ const MemoryLocation &LocA = MemoryLocation::get(A);
+ const MemoryLocation &LocB = MemoryLocation::get(B);
+
+ // Check the underlying objects are the same
+ const Value *AObj = getUnderlyingObject(LocA.Ptr);
+ const Value *BObj = getUnderlyingObject(LocB.Ptr);
+
+ // If the underlying objects are the same, they must alias.
+ if (AObj == BObj)
+ return AliasResult::MustAlias;
+
+ if (auto *APhi = dyn_cast<PHINode>(AObj)) {
+ if (auto *BPhi = dyn_cast<PHINode>(BObj)) {
+ Loop *ALoop = LI->getLoopFor(APhi->getParent());
+ Loop *BLoop = LI->getLoopFor(BPhi->getParent());
+ if (ALoop == BLoop) {
+ auto *SCEVa = SE->getSCEV(const_cast<Value *>(AObj));
+ auto *SCEVb = SE->getSCEV(const_cast<Value *>(BObj));
+
+ // If the SCEVs are the same, they must alias.
+ if (SCEVa == SCEVb)
+ return AliasResult::MustAlias;
+
+ // If SCEV cannot analyze one of the values, then they may alias.
+ if (isa<SCEVUnknown>(SCEVa) || isa<SCEVUnknown>(SCEVb))
+ return AliasResult::MayAlias;
+
+ // Check whether the start values alias.
+ const SCEV *StartA = SCEVa;
+ while (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(StartA))
+ StartA = AR->getStart();
+
+ const SCEV *StartB = SCEVb;
+ while (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(StartB))
+ StartB = AR->getStart();
+
+ if (const SCEVUnknown *UA = dyn_cast<SCEVUnknown>(StartA)) {
+ if (const SCEVUnknown *UB = dyn_cast<SCEVUnknown>(StartB)) {
+ MemoryLocation LocAS =
+ MemoryLocation::getBeforeOrAfter(UA->getValue());
+ MemoryLocation LocBS =
+ MemoryLocation::getBeforeOrAfter(UB->getValue());
+ if (AA->isNoAlias(LocAS, LocBS))
+ return AliasResult::NoAlias;
+ }
+ }
+ return AliasResult::MayAlias;
+ }
+ }
+ }
+
// Check the original locations (minus size) for noalias, which can happen for
// tbaa, incompatible underlying object locations, etc.
MemoryLocation LocAS =
@@ -725,14 +775,6 @@ static AliasResult underlyingObjectsAlias(AAResults *AA,
if (AA->isNoAlias(LocAS, LocBS))
return AliasResult::NoAlias;
- // Check the underlying objects are the same
- const Value *AObj = getUnderlyingObject(LocA.Ptr);
- const Value *BObj = getUnderlyingObject(LocB.Ptr);
-
- // If the underlying objects are the same, they must alias
- if (AObj == BObj)
- return AliasResult::MustAlias;
-
// We may have hit the recursion limit for underlying objects, or have
// underlying objects where we don't know they will alias.
if (!isIdentifiedObject(AObj) || !isIdentifiedObject(BObj))
@@ -743,7 +785,6 @@ static AliasResult underlyingObjectsAlias(AAResults *AA,
return AliasResult::NoAlias;
}
-
// Returns true if the load or store can be analyzed. Atomic and volatile
// operations have properties which this analysis does not understand.
static
@@ -3606,9 +3647,7 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
Value *SrcPtr = getLoadStorePointerOperand(Src);
Value *DstPtr = getLoadStorePointerOperand(Dst);
- switch (underlyingObjectsAlias(AA, F->getDataLayout(),
- MemoryLocation::get(Dst),
- MemoryLocation::get(Src))) {
+ switch (underlyingObjectsAlias(AA, LI, SE, Dst, Src)) {
case AliasResult::MayAlias:
case AliasResult::PartialAlias:
// cannot analyse objects if we don't understand their aliasing.
@@ -4030,11 +4069,8 @@ const SCEV *DependenceInfo::getSplitIteration(const Dependence &Dep,
assert(Dst->mayReadFromMemory() || Dst->mayWriteToMemory());
assert(isLoadOrStore(Src));
assert(isLoadOrStore(Dst));
- Value *SrcPtr = getLoadStorePointerOperand(Src);
- Value *DstPtr = getLoadStorePointerOperand(Dst);
- assert(underlyingObjectsAlias(
- AA, F->getDataLayout(), MemoryLocation::get(Dst),
- MemoryLocation::get(Src)) == AliasResult::MustAlias);
+ assert(underlyingObjectsAlias(AA, LI, SE, Dst, Src) ==
+ AliasResult::MustAlias);
// establish loop nesting levels
establishNestingLevels(Src, Dst);
@@ -4043,6 +4079,8 @@ const SCEV *DependenceInfo::getSplitIteration(const Dependence &Dep,
unsigned Pairs = 1;
SmallVector<Subscript, 2> Pair(Pairs);
+ Value *SrcPtr = getLoadStorePointerOperand(Src);
+ Value *DstPtr = getLoadStorePointerOperand(Dst);
const SCEV *SrcSCEV = SE->getSCEV(SrcPtr);
const SCEV *DstSCEV = SE->getSCEV(DstPtr);
Pair[0].Src = SrcSCEV;
diff --git a/llvm/test/Analysis/DependenceAnalysis/FlipFlopBaseAddress.ll b/llvm/test/Analysis/DependenceAnalysis/FlipFlopBaseAddress.ll
new file mode 100644
index 00000000000000..2d80b2493a389b
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/FlipFlopBaseAddress.ll
@@ -0,0 +1,136 @@
+; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 \
+; RUN: | FileCheck %s
+
+; Check that dependence analysis correctly handles flip-flop of base addresses.
+; Bug 41488 - https://github.com/llvm/llvm-project/issues/41488
+
+; CHECK-LABEL: bug41488_test1
+; CHECK-NOT: da analyze - none!
+
+define float @bug41488_test1() {
+entry:
+ %g = alloca float, align 4
+ %h = alloca float, align 4
+ br label %for.body
+
+for.body:
+ %p = phi float* [ %g, %entry ], [ %q, %for.body ]
+ %q = phi float* [ %h, %entry ], [ %p, %for.body ]
+ %0 = load float, float* %p, align 4
+ store float undef, float* %q, align 4
+ %branch_cond = fcmp ugt float %0, 0.0
+ br i1 %branch_cond, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+ ret float undef
+}
+
+; CHECK-LABEL: bug41488_test2
+; CHECK-NOT: da analyze - none!
+
+define void @bug41488_test2(i32 %n) {
+entry:
+ %g = alloca float, align 4
+ %h = alloca float, align 4
+ br label %for.body
+
+for.body:
+ %i = phi i32 [0, %entry ], [ %inc, %for.body ]
+ %p = phi float* [ %g, %entry ], [ %q, %for.body ]
+ %q = phi float* [ %h, %entry ], [ %p, %for.body ]
+ %0 = load float, float* %p, align 4
+ store float 0.0, float* %q, align 4
+ %inc = add nuw i32 %i, 1
+ %branch_cond = icmp ult i32 %i, %n
+ br i1 %branch_cond, label %for.body, label %for.cond.cleanup
+
+for.cond.cleanup:
+ ret void
+}
+
+; Bug 53942 - https://github.com/llvm/llvm-project/issues/53942
+; CHECK-LABEL: bug53942_foo
+; CHECK-NOT: da analyze - none!
+
+define void @bug53942_foo(i32 noundef %n, ptr noalias nocapture noundef writeonly %A, ptr noalias nocapture noundef %B) {
+entry:
+ %cmp8 = icmp sgt i32 %n, 1
+ br i1 %cmp8, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader: ; preds = %entry
+ %wide.trip.count = zext nneg i32 %n to i64
+ br label %for.body
+
+for.cond.cleanup: ; preds = %for.body, %entry
+ ret void
+
+for.body: ; preds = %for.body.preheader, %for.body
+ %indvars.iv = phi i64 [ 1, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+ %ptr1.011 = phi ptr [ %A, %for.body.preheader ], [ %ptr2.09, %for.body ]
+ %ptr2.09 = phi ptr [ %B, %for.body.preheader ], [ %ptr1.011, %for.body ]
+ %.pre = load double, ptr %B, align 8
+ %arrayidx2 = getelementptr inbounds double, ptr %ptr1.011, i64 %indvars.iv
+ store double %.pre, ptr %arrayidx2, align 8
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+
+; Bug 53942 - https://github.com/llvm/llvm-project/issues/53942
+; CHECK-LABEL: bug53942_bar
+; CHECK-NOT: da analyze - none!
+
+define void @bug53942_bar(i32 noundef %n, ptr noalias noundef %A, ptr noalias noundef %B) {
+entry:
+ br label %for.cond
+
+for.cond: ; preds = %for.inc, %entry
+ %i.0 = phi i32 [ 1, %entry ], [ %inc, %for.inc ]
+ %cmp = icmp slt i32 %i.0, %n
+ br i1 %cmp, label %for.body, label %for.cond.cleanup
+
+for.cond.cleanup: ; preds = %for.cond
+ br label %for.end
+
+for.body: ; preds = %for.cond
+ %and = and i32 %i.0, 2
+ %tobool.not = icmp eq i32 %and, 0
+ br i1 %tobool.not, label %cond.false, label %cond.true
+
+cond.true: ; preds = %for.body
+ br label %cond.end
+
+cond.false: ; preds = %for.body
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %cond.true
+ %cond = phi ptr [ %A, %cond.true ], [ %B, %cond.false ]
+ %and1 = and i32 %i.0, 2
+ %tobool2.not = icmp eq i32 %and1, 0
+ br i1 %tobool2.not, label %cond.false4, label %cond.true3
+
+cond.true3: ; preds = %cond.end
+ br label %cond.end5
+
+cond.false4: ; preds = %cond.end
+ br label %cond.end5
+
+cond.end5: ; preds = %cond.false4, %cond.true3
+ %cond6 = phi ptr [ %B, %cond.true3 ], [ %A, %cond.false4 ]
+ %sub = add nsw i32 %i.0, -1
+ %idxprom = sext i32 %sub to i64
+ %arrayidx = getelementptr inbounds double, ptr %cond6, i64 %idxprom
+ %0 = load double, ptr %arrayidx, align 8
+ %idxprom7 = zext nneg i32 %i.0 to i64
+ %arrayidx8 = getelementptr inbounds double, ptr %cond, i64 %idxprom7
+ store double %0, ptr %arrayidx8, align 8
+ br label %for.inc
+
+for.inc: ; preds = %cond.end5
+ %inc = add nuw nsw i32 %i.0, 1
+ br label %for.cond
+
+for.end: ; preds = %for.cond.cleanup
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach here isn't right. This is basically trying to mitigate a specific case in the BasicAA implementation, but doesn't really provide a guarantee that we can't hit the same underlying issue through a different pathway. I would expect that it's possible to construct a similar problematic case using selects instead of phis.
I think the correct way to address this issue would be to enable the cross-iteration mode of AA, which is also what LAA does. However, it currently doesn't actually catch the examples here, which I'm addressing in #116802.
Though I should also note that doing this alone is not sufficient to make this code correct, because it also makes an additional incorrect assumptions about AATags. In particular, this will handle scoped noalias metadata incorrectly for cases where the metadata is only valid within one loop iteration. It is necessary to inspect noalias.scope intrinsics to disambiguate these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that #116802 has landed, I think you should be able to fix the issue using BatchAAResults::enableCrossIterationMode.
I see 2 extra errors when using BatchAA in cross iter mode:
Both fails are in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1620 where the dyn_cast fails for some reason on a LoadInst. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
✅ With the latest revision this PR passed the undef deprecator. |
@sebpop Just from a quick look, it's probably because IR is modified during BatchAA lifetime and the cache becomes invalid. To start with you can just create the BatchAA directly inside underlyingObjectsAlias (without any batching). It would be nice to use it in a larger scope for compile-time reasons, but it requires some more care. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -0,0 +1,136 @@ | |||
; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa 2>&1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use the update_analyze_checks support you added recently?
This is a work in progress patch to enable loop-interchange by default and is a continuation of the RFC: https://discourse.llvm.org/t/enabling-loop-interchange/82589 Basically, we promised to fix any compile-time and correctness issues in the different components involved here (loop-interchange and dependence analaysis.) before discussing enabling interchange by default. We think are close to complete this; I would like to explain where we are and wanted to check if there are any thoughts or concerns. A quick overview of the correctness and compile-time improvements that we have made include: Correctness: - [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345) - [LoopInterchange] Fix overflow in cost calculation (llvm#111807) - [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj - [DA] disambiguate evolution of base addresses (llvm#116628) Compile-times: - [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973) - [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128) - [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247) And in terms of remaining work, we think we are very close to fixing these depenence analysis issues: - [DA] do not handle array accesses of different offsets (llvm#123436) - [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630) - [DA] use NSW arithmetic llvm#116632 The compile-time increase with a geomean increase of 0.19% looks good (after committing llvm#124247), I think: stage1-O3: Benchmark kimwitu++ +0.10% sqlite3 +0.14% consumer-typeset +0.07% Bullet +0.06% tramp3d-v4 +0.21% mafft +0.39% ClamAVi +0.06% lencod +0.61% SPASS +0.17% 7zip +0.08% geomean +0.19% See also: http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au We might want to look into lencod to see if we can improve more, but not sure it is strictly necessary.
This patch fixes two bugs: llvm#41488 llvm#53942 The dependence analysis assumes that the base address of array accesses is invariant across loop iterations. In both bugs the base address evolves following loop iterations: the base address flip-flops between two different memory objects. The patch uses the cross iteration mode of alias analysis to disambiguate the base objects.
This patch fixes two bugs:
#41488
#53942
The dependence analysis assumes that the base address of array accesses is invariant across loop iterations. In both bugs the base address evolves following loop iterations: the base address flip-flops between two different memory objects.
Based on the scalar evolution of base addresses, the patch adds code to separate the 3 alias cases {must, no, may}-alias where the base addresses are identical at every iteration, never the same, and unknown.