-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. #72679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. #72679
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Alexey Bataev (alexey-bataev) ChangesAfter changes, that does not require support from InstCombine, we can Full diff: https://github.com/llvm/llvm-project/pull/72679.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index bff3e49b14bc6bc..42e4540eb8ecbd7 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -13078,10 +13078,13 @@ bool BoUpSLP::collectValuesToDemote(Value *V,
return true;
}
- // If the value is not a vectorized instruction in the expression with only
- // one use, it cannot be demoted.
+ // If the value is not a vectorized instruction in the expression and not the
+ // used by the insertelement instruction, it cannot be demoted.
auto *I = dyn_cast<Instruction>(V);
- if (!I || !I->hasOneUse() || !getTreeEntry(I) || !Visited.insert(I).second)
+ if (!I || !getTreeEntry(I) || !Visited.insert(I).second ||
+ all_of(I->users(), [&](User *U) {
+ return isa<InsertElementInst>(U) && !getTreeEntry(U);
+ }))
return false;
switch (I->getOpcode()) {
@@ -13140,11 +13143,6 @@ bool BoUpSLP::collectValuesToDemote(Value *V,
}
void BoUpSLP::computeMinimumValueSizes() {
- // If there are no external uses, the expression tree must be rooted by a
- // store. We can't demote in-memory values, so there is nothing to do here.
- if (ExternalUses.empty())
- return;
-
// We only attempt to truncate integer expressions.
auto &TreeRoot = VectorizableTree[0]->Scalars;
auto *TreeRootIT = dyn_cast<IntegerType>(TreeRoot[0]->getType());
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll b/llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll
index f48528e502b8cf1..af46b4f576234b2 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/root-trunc-extract-reuse.ll
@@ -8,20 +8,18 @@ define i1 @test() {
; CHECK: then:
; CHECK-NEXT: br label [[ELSE]]
; CHECK: else:
-; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x i1> [ zeroinitializer, [[THEN]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
-; CHECK-NEXT: [[TMP1:%.*]] = zext <2 x i1> [[TMP0]] to <2 x i32>
-; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0
-; CHECK-NEXT: [[TMP3:%.*]] = zext i1 [[TMP2]] to i32
-; CHECK-NEXT: [[BF_CAST162:%.*]] = and i32 [[TMP3]], 0
-; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> zeroinitializer, <2 x i32> [[TMP1]], <2 x i32> <i32 3, i32 1>
-; CHECK-NEXT: [[T13:%.*]] = and <2 x i32> [[TMP4]], zeroinitializer
+; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x i32> [ zeroinitializer, [[THEN]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i32> [[TMP0]], i32 0
+; CHECK-NEXT: [[BF_CAST162:%.*]] = and i32 [[TMP1]], 0
+; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> zeroinitializer, <2 x i32> [[TMP0]], <2 x i32> <i32 3, i32 1>
+; CHECK-NEXT: [[T13:%.*]] = and <2 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: br label [[ELSE1:%.*]]
; CHECK: else1:
-; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[T13]], <2 x i32> poison, <2 x i32> <i32 poison, i32 0>
-; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[BF_CAST162]], i32 0
-; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt <2 x i32> [[TMP6]], zeroinitializer
-; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
-; CHECK-NEXT: ret i1 [[TMP8]]
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[T13]], <2 x i32> poison, <2 x i32> <i32 poison, i32 0>
+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[BF_CAST162]], i32 0
+; CHECK-NEXT: [[TMP5:%.*]] = icmp ugt <2 x i32> [[TMP4]], zeroinitializer
+; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP5]], i32 1
+; CHECK-NEXT: ret i1 [[TMP6]]
;
entry:
br i1 false, label %then, label %else
|
0b9adf8
to
2c9bc62
Compare
Ping! |
@alexey-bataev please can you rebase? |
1c699bc
to
0a526cc
Compare
Ping! |
0a526cc
to
708edc7
Compare
Ping! |
708edc7
to
9f174f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// If the value is not a vectorized instruction in the expression with only | ||
// one use, it cannot be demoted. | ||
// If the value is not a vectorized instruction in the expression and not the | ||
// used by the insertelement instruction, it cannot be demoted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and not the used by the insertelement instruction" -> "and not used by the insertelement instruction"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct? It still doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed your comments, for some reason did not get the updates in e-mails.
Fixed this message.
; CHECK-NEXT: [[TMP0:%.*]] = call i16 @llvm.vector.reduce.or.v4i16(<4 x i16> zeroinitializer) | ||
; CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[TMP0]] to i32 | ||
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 65535 | ||
; CHECK-NEXT: store i32 [[TMP2]], ptr null, align 4 | ||
; CHECK-NEXT: ret void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have better tests than these cases that will all constant fold away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few. I'll try to improve this one.
instructions in computeMinimumValueSizes. After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening.
9f174f5
to
486f4bb
Compare
This triggers failed asserts: char a[];
short *b;
int c, d, e, f;
void g() {
char *h;
for (;;) {
for (; f; ++f) {
h[f] = b[0] * a[e] + b[c] * a[1] >> 7;
++b;
}
h += d;
}
} $ clang -target x86_64-linux-gnu -c repro.c -O2
clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. This comes up in compiling libvpx and openh264. I'll push a revert soon. |
…r other instructions in computeMinimumValueSizes. (#72679)" This reverts commit 408dce8. This triggered failed asserts with code like this: char a[]; short *b; int c, d, e, f; void g() { char *h; for (;;) { for (; f; ++f) { h[f] = b[0] * a[e] + b[c] * a[1] >> 7; ++b; } h += d; } } Compiled like this: $ clang -target x86_64-linux-gnu -c repro.c -O2 clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
…instructions in computeMinimumValueSizes. (#72679) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening. Review: #72679
Hi @alexey-bataev
It crashes with
|
Thanks, will fix it ASAP |
Fixed in 39b2104 |
Thanks! |
…instructions in computeMinimumValueSizes. (llvm#72679) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening.
…r other instructions in computeMinimumValueSizes. (llvm#72679)" This reverts commit 408dce8. This triggered failed asserts with code like this: char a[]; short *b; int c, d, e, f; void g() { char *h; for (;;) { for (; f; ++f) { h[f] = b[0] * a[e] + b[c] * a[1] >> 7; ++b; } h += d; } } Compiled like this: $ clang -target x86_64-linux-gnu -c repro.c -O2 clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
…instructions in computeMinimumValueSizes. (llvm#72679) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening. Review: llvm#72679
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
no non-vectorized insertelement instruction, which may require
widening.