Skip to content

[SLPVectorizer][AArch64][SVE] Opt crashed when optimizing #68953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
XChy opened this issue Oct 13, 2023 · 8 comments · Fixed by #69617
Closed

[SLPVectorizer][AArch64][SVE] Opt crashed when optimizing #68953

XChy opened this issue Oct 13, 2023 · 8 comments · Fixed by #69617
Labels
backend:AArch64 SVE ARM Scalable Vector Extensions

Comments

@XChy
Copy link
Member

XChy commented Oct 13, 2023

Clang crashed when compiling flang/lib/Evaluate/intrinsics-library.cpp after #67275 in clang-aarch64-sve-vls-2stage1.
After reproducing and reducing, I get the IR causing crash: aarch-sve-crash.ll.

I'm not a expert in backend. So I try opt -S -O3 -vectorize-slp -vectorize-loops, and it performed well.
But when I tried opt -S -O3 -vectorize-loops -vectorize-slp -mtriple aarch64-unknown-linux-gnu --mcpu=neoverse-512tvb, it crashed and reported "Not a vector MVT!"

It seems to be a backend/target bug.

@EugeneZelenko EugeneZelenko added backend:AArch64 SVE ARM Scalable Vector Extensions and removed new issue labels Oct 13, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 13, 2023

@llvm/issue-subscribers-backend-aarch64

Author: XChy (XChy)

Clang crashed when compiling `flang/lib/Evaluate/intrinsics-library.cpp` after #67275 in [clang-aarch64-sve-vls-2stage1](https://lab.llvm.org/buildbot/#/builders/176/builds/5474). After reproducing and reducing, I get the IR causing crash: [aarch-sve-crash.ll](https://gist.github.com/XChy/4cb02b756de70d807440aaa88dbeeab3)

I'm not a expert in backend. So I try opt -S -O3 -vectorize-slp -vectorize-loops, and it performs well.
But when I try opt -S -O3 -vectorize-loops -vectorize-slp -mtriple aarch64-unknown-linux-gnu --mcpu=neoverse-512tvb, it crashed and reported "Not a vector MVT!"

It seems to be a backend/target bug.

@nikic
Copy link
Contributor

nikic commented Oct 13, 2023

Reduced:

; RUN: opt -S -passes=slp-vectorizer < %s
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

; Function Attrs: vscale_range(2,2)
define void @ham(ptr %arg) #0 {
bb:
  %load = load fp128, ptr %arg, align 1
  %getelementptr = getelementptr i8, ptr %arg, i64 16
  %load1 = load fp128, ptr %getelementptr, align 1
  %load2 = load fp128, ptr null, align 1
  %load3 = load fp128, ptr null, align 1
  %bitcast = bitcast fp128 %load2 to i128
  %trunc = trunc i128 %bitcast to i32
  %icmp = icmp eq i32 %trunc, 0
  %bitcast4 = bitcast fp128 %load3 to i128
  %trunc5 = trunc i128 %bitcast4 to i32
  %icmp6 = icmp eq i32 %trunc5, 0
  %bitcast7 = bitcast fp128 %load to i128
  %trunc8 = trunc i128 %bitcast7 to i32
  %icmp9 = icmp eq i32 %trunc8, 0
  %bitcast10 = bitcast fp128 %load1 to i128
  %trunc11 = trunc i128 %bitcast10 to i32
  %icmp12 = icmp eq i32 %trunc11, 0
  ret void
}

attributes #0 = { vscale_range(2,2) "target-cpu"="neoverse-512tvb" }

@antoniofrighetto
Copy link
Contributor

Seems like we're querying TTI to retrieve the cost of a gather/scatter vector operation on a non-vector type. Taking a look as well.

@kawashima-fj
Copy link
Member

A shorter reproducer based on #68953 (comment) is:

; RUN: opt -S -passes=slp-vectorizer < %s
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

; Function Attrs: vscale_range(2,2)
define void @ham(ptr %arg) #0 {
bb:
  %gep = getelementptr i8, ptr %arg, i64 16
  %load0 = load fp128, ptr %arg, align 1
  %load1 = load fp128, ptr %gep, align 1
  %load2 = load fp128, ptr null, align 1
  %load3 = load fp128, ptr null, align 1
  %fcmp0 = fcmp oeq fp128 %load0, 0xL00000000000000000000000000000000
  %fcmp1 = fcmp oeq fp128 %load1, 0xL00000000000000000000000000000000
  %fcmp2 = fcmp oeq fp128 %load2, 0xL00000000000000000000000000000000
  %fcmp3 = fcmp oeq fp128 %load3, 0xL00000000000000000000000000000000
  ret void
}

attributes #0 = { vscale_range(2,2) "target-cpu"="neoverse-512tvb" }
$ opt -S -passes=slp-vectorizer < reproducer.ll
Not a vector MVT!
UNREACHABLE executed at /src/llvm/llvm/include/llvm/CodeGen/MachineValueType.h:276!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /opt/llvm/bin/opt -S -passes=slp-vectorizer
 #0 0x00007ff6be2a6eca llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /src/llvm/llvm/lib/Support/Unix/Signals.inc:723:22
 #1 0x00007ff6be2a72e6 PrintStackTraceSignalHandler(void*) /src/llvm/llvm/lib/Support/Unix/Signals.inc:798:1
 #2 0x00007ff6be2a4733 llvm::sys::RunSignalHandlers() /src/llvm/llvm/lib/Support/Signals.cpp:105:20
 #3 0x00007ff6be2a6762 SignalHandler(int) /src/llvm/llvm/lib/Support/Unix/Signals.inc:413:1
 #4 0x00007ff6bda8b520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007ff6bdadf9fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #6 0x00007ff6bdadf9fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #7 0x00007ff6bdadf9fc pthread_kill ./nptl/pthread_kill.c:89:10
 #8 0x00007ff6bda8b476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007ff6bda717f3 abort ./stdlib/abort.c:81:7
#10 0x00007ff6be16e3be bindingsErrorHandler(void*, char const*, bool) /src/llvm/llvm/lib/Support/ErrorHandling.cpp:221:55
#11 0x00007ff6c4f4216d llvm::MVT::getVectorMinNumElements() const /src/llvm/builds/xcgldcnstd/include/llvm/CodeGen/GenVT.inc:242:3
#12 0x00007ff6c4f4280e llvm::MVT::getVectorElementCount() const /src/llvm/llvm/include/llvm/CodeGen/MachineValueType.h:287:31
#13 0x00007ff6c51ad045 llvm::AArch64TTIImpl::getGatherScatterOpCost(unsigned int, llvm::Type*, llvm::Value const*, bool, llvm::Align, llvm::TargetTransformInfo::TargetCostKind, llvm::Instruction const*) /src/llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2992:57
#14 0x00007ff6c5194d20 llvm::TargetTransformInfo::Model<llvm::AArch64TTIImpl>::getGatherScatterOpCost(unsigned int, llvm::Type*, llvm::Value const*, bool, llvm::Align, llvm::TargetTransformInfo::TargetCostKind, llvm::Instruction const*) /src/llvm/llvm/include/llvm/Analysis/TargetTransformInfo.h:2580:39
#15 0x00007ff6bfcd0423 llvm::TargetTransformInfo::getGatherScatterOpCost(unsigned int, llvm::Type*, llvm::Value const*, bool, llvm::Align, llvm::TargetTransformInfo::TargetCostKind, llvm::Instruction const*) const /src/llvm/llvm/lib/Analysis/TargetTransformInfo.cpp:1006:57
#16 0x00007ff6c09eb466 llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::getBuildVectorCost(llvm::ArrayRef<llvm::Value*>, llvm::Value*) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7074:49
#17 0x00007ff6c09ee50a llvm::slpvectorizer::BoUpSLP::ShuffleCostEstimator::gather(llvm::ArrayRef<llvm::Value*>, llvm::Value*) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7500:31
#18 0x00007ff6c098be45 llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallPtrSetImpl<llvm::Value*>&) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7688:35
#19 0x00007ff6c09914cb llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8800:37
#20 0x00007ff6c09a743e llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:13119:43
#21 0x00007ff6c0a18882 bool llvm::SLPVectorizerPass::vectorizeCmpInsts<std::reverse_iterator<llvm::CmpInst* const*>>(llvm::iterator_range<std::reverse_iterator<llvm::CmpInst* const*>>, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&)::'lambda1'(llvm::ArrayRef<llvm::Value*>, bool)::operator()(llvm::ArrayRef<llvm::Value*>, bool) const /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:15139:59
#22 0x00007ff6c0a5c545 bool llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>::callback_fn<bool llvm::SLPVectorizerPass::vectorizeCmpInsts<std::reverse_iterator<llvm::CmpInst* const*>>(llvm::iterator_range<std::reverse_iterator<llvm::CmpInst* const*>>, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&)::'lambda1'(llvm::ArrayRef<llvm::Value*>, bool)>(long, llvm::ArrayRef<llvm::Value*>, bool) /src/llvm/llvm/include/llvm/ADT/STLFunctionalExtras.h:47:3
#23 0x00007ff6c0a3bcfe llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>::operator()(llvm::ArrayRef<llvm::Value*>, bool) const /src/llvm/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:3
#24 0x00007ff6c09bb35c bool tryToVectorizeSequence<llvm::Value>(llvm::SmallVectorImpl<llvm::Value*>&, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>, bool, llvm::slpvectorizer::BoUpSLP&) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:14973:21
#25 0x00007ff6c0a18ddd bool llvm::SLPVectorizerPass::vectorizeCmpInsts<std::reverse_iterator<llvm::CmpInst* const*>>(llvm::iterator_range<std::reverse_iterator<llvm::CmpInst* const*>>, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:15126:43
#26 0x00007ff6c09b2fff llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&)::'lambda2'(bool)::operator()(bool) const /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:15342:15
#27 0x00007ff6c09b3e23 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:15449:18
#28 0x00007ff6c09a476a llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12716:13
#29 0x00007ff6c09a41de llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /src/llvm/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:12647:25
#30 0x00007ff6c3e08c97 llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /src/llvm/llvm/include/llvm/IR/PassManagerInternal.h:90:3
#31 0x00007ff6bed06733 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /src/llvm/llvm/include/llvm/IR/PassManager.h:521:20
#32 0x00007ff6c3e14ea7 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /src/llvm/llvm/include/llvm/IR/PassManagerInternal.h:90:3
#33 0x00007ff6bed05a80 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /src/llvm/llvm/lib/IR/PassManager.cpp:129:23
#34 0x00007ff6c3e14dd7 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /src/llvm/llvm/include/llvm/IR/PassManagerInternal.h:90:3
#35 0x00007ff6bed063f3 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /src/llvm/llvm/include/llvm/IR/PassManager.h:521:20
#36 0x0000556cdf6df4f1 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) /src/llvm/llvm/tools/opt/NewPMDriver.cpp:527:10
#37 0x0000556cdf70e222 main /src/llvm/llvm/tools/opt/opt.cpp:698:27
#38 0x00007ff6bda72d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#39 0x00007ff6bda72e40 call_init ./csu/../csu/libc-start.c:128:20
#40 0x00007ff6bda72e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#41 0x0000556cdf6dc785 _start (/opt/llvm/bin/opt+0x56785)

When the error occured in llvm::AArch64TTIImpl::getGatherScatterOpCost (#13), DataTy is <2 x fp128> and LT->second is llvm::MVT::f128, which is the cause of "Not a vector MVT!".

When SVE is not fixed-sized (no vscale_range(2,2) in IR), DataTy is <2 x fp128> but this function returns at if (useNeonVector(DataTy)) and LT.second.getVectorElementCount() is not called.

InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) {
if (useNeonVector(DataTy))
return BaseT::getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
Alignment, CostKind, I);
auto *VT = cast<VectorType>(DataTy);
auto LT = getTypeLegalizationCost(DataTy);
if (!LT.first.isValid())
return InstructionCost::getInvalid();
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (cast<VectorType>(DataTy)->getElementCount() ==
ElementCount::getScalable(1))
return InstructionCost::getInvalid();
ElementCount LegalVF = LT.second.getVectorElementCount();
InstructionCost MemOpCost =
getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind,
{TTI::OK_AnyValue, TTI::OP_None}, I);
// Add on an overhead cost for using gathers/scatters.
// TODO: At the moment this is applied unilaterally for all CPUs, but at some
// point we may want a per-CPU overhead.
MemOpCost *= getSVEGatherScatterOverhead(Opcode);
return LT.first * MemOpCost * getMaxNumElements(LegalVF);
}

I'm not familiar with this code and don't know how to fix it.

@antoniofrighetto
Copy link
Contributor

By the time getGatherScatterOpCost is queried, we are handling a fixed-size vector of type <2 x fp128>, and getTypeLegalizationCost is called with such type. In the first iteration, the legalized kind action is properly set to TypeSplitVector. Thus, the type becomes <1 x fp128>. During the second iteration, the kind action is set to TypeScalarizeVector (as vectors with only one element are always scalarized). There seems to be no handling for it, but I guess we would simplify it further anyways. The type is finally simplified to simple type fp128, which is returned.

Considering that the current state of the legalization seems to be always reducing the vector type to its scalar element type, I think the safest approach here is the following one:

+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2989,7 +2989,14 @@ InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
       ElementCount::getScalable(1))
     return InstructionCost::getInvalid();
 
-  ElementCount LegalVF = LT.second.getVectorElementCount();
+  ElementCount LegalVF;
+  if (LT.second.isVector()) {
+    LegalVF = LT.second.getVectorElementCount();
+  } else {
+    // If the legalized type is a simple type, treat it as a 1-element vector.
+    LegalVF = ElementCount::getFixed(1);
+  }
+
   InstructionCost MemOpCost =
       getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind,
                       {TTI::OK_AnyValue, TTI::OP_None}, I);

LT.first has already been set to have an instruction cost of 2. Though, I really wonder if LT.second can ever be a vector type, as of now. @rengolin, does that sound good to you?

@rengolin
Copy link
Member

My read of GenVT.inc is that getVectorMinNumElements() works even with non-vector types, including f128, so I'm surprised that you get into the default case in that switch. @fhahn, ideas?

@antoniofrighetto
Copy link
Contributor

@rengolin, is that? Looking at GenVT.inc, it seems that f128 is defined with GET_VT_ATTR, whereas getVectorMinNumElements expects type defined via GET_VT_VECATTR (vector-only types).

@rengolin
Copy link
Member

Ah! Right! I missed that. Ignore me, then. I don't see a problem with the solution, but @fhahn has been working on it for much longer, I defer to him to give the final answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 SVE ARM Scalable Vector Extensions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants