-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333
Comments
I can reproduce it with: clang -O2 -march=skylake sha512.i If I attach gdb to the process after it's been running for a little while: 0x00007f54f0f4cab2 in llvm::APInt::maskBit (bitPosition=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1900
1900 return 1ULL << whichBit(bitPosition);
(gdb) bt
#0 0x00007f54f0f4cab2 in llvm::APInt::maskBit (bitPosition=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1900
#1 llvm::APInt::operator[] (bitPosition=<optimized out>, this=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1017
#2 llvm::APInt::isSignBitSet (this=0x7ffefaa604e0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:319
#3 llvm::KnownBits::isNegative (this=0x7ffefaa604d0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/Support/KnownBits.h:96
#4 llvm::KnownBits::computeForAddSub (Add=<optimized out>, NSW=NSW@entry=false, LHS=..., RHS=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Support/KnownBits.cpp:72
#5 0x00007f54f1c5bfa8 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=2, AssumeSingleUse=false)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:2740
#6 0x00007f54f1c5e41d in llvm::TargetLowering::SimplifyDemandedBits (this=this@entry=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=1,
AssumeSingleUse=false) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1225
#7 0x00007f54f502dc6d in llvm::X86TargetLowering::SimplifyDemandedBitsForTargetNode (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=0)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Target/X86/X86ISelLowering.cpp:44365
#8 0x00007f54f1c60533 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=0, AssumeSingleUse=false)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:2754
#9 0x00007f54f1c6bf57 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., DemandedBits=..., Known=..., TLO=..., Depth=0, AssumeSingleUse=false)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:646
#10 0x00007f54f1c6c0f2 in llvm::TargetLowering::SimplifyDemandedBits (this=this@entry=0x565551705ff0, Op=..., DemandedBits=..., DCI=...)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:608
#11 0x00007f54f5135377 in combineVectorShiftImm (N=<optimized out>, DAG=..., DCI=..., Subtarget=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:1139
#12 0x00007f54f19d47c4 in (anonymous namespace)::DAGCombiner::combine (this=this@entry=0x7ffefaa63130, N=N@entry=0x56555191d900)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:2049
#13 0x00007f54f19d62f9 in (anonymous namespace)::DAGCombiner::Run (AtLevel=llvm::AfterLegalizeDAG, this=0x7ffefaa63130)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1827
#14 llvm::SelectionDAG::Combine (this=<optimized out>, Level=Level@entry=llvm::AfterLegalizeDAG, AA=<optimized out>, OptLevel=<optimized out>)
at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:27592
#15 0x00007f54f1bfbb63 in llvm::SelectionDAGISel::CodeGenAndEmitDAG (this=0x565551829cb0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:925
#16 0x00007f54f1c00fc0 in llvm::SelectionDAGISel::SelectAllBasicBlocks (this=0x565551829cb0, Fn=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1700
#17 0x00007f54f1c027d6 in llvm::SelectionDAGISel::runOnMachineFunction (this=this@entry=0x565551829cb0, mf=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:482
#18 0x00007f54f4fd4619 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction (this=0x565551829cb0, MF=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:191
#19 0x00007f54f1547cd4 in llvm::MachineFunctionPass::runOnFunction (this=0x565551829cb0, F=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/MachineFunctionPass.cpp:91
#20 0x00007f54f11bdb13 in llvm::FPPassManager::runOnFunction (this=0x565551826a40, F=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1435
#21 0x00007f54f11bdd51 in llvm::FPPassManager::runOnModule (this=0x565551826a40, M=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1481
#22 0x00007f54f11be7f4 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1550
#23 llvm::legacy::PassManagerImpl::run (this=0x5655517a7120, M=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:535
#24 0x00007f54fb97774a in (anonymous namespace)::EmitAssemblyHelper::RunCodegenPipeline (DwoOS=<synthetic pointer>std::unique_ptr<llvm::ToolOutputFile> = {...},
OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}, Action=clang::Backend_EmitObj, this=0x7ffefaa64b60) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1115
#25 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}, Action=clang::Backend_EmitObj, this=0x7ffefaa64b60)
at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1137
#26 clang::EmitBackendOutput (Diags=..., HeaderOpts=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=M@entry=0x5655513bb820, Action=clang::Backend_EmitObj, VFS=...,
OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1299
#27 0x00007f54fbe13f15 in clang::BackendConsumer::HandleTranslationUnit (this=0x5655513b64d0, C=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/CodeGenAction.cpp:386
#28 0x00007f54fa41aa55 in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=<optimized out>) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Parse/ParseAST.cpp:176
#29 0x00007f54fca51be9 in clang::FrontendAction::Execute (this=this@entry=0x5655513b6b60) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Frontend/FrontendAction.cpp:1059
#30 0x00007f54fc9de17b in clang::CompilerInstance::ExecuteAction (this=this@entry=0x5655513ada00, Act=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Frontend/CompilerInstance.cpp:1053
#31 0x00007f54fcaebb2b in clang::ExecuteCompilerInvocation (Clang=Clang@entry=0x5655513ada00) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:272
#32 0x0000565550c63035 in cc1_main (Argv=..., Argv0=0x5655513a4f70 "/usr/lib/llvm/17/bin/clang-17", MainAddr=MainAddr@entry=0x565550c5c270 <GetExecutablePath[abi:cxx11](char const*, bool)>)
at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/cc1_main.cpp:249
#33 0x0000565550c5bcab in ExecuteCC1Tool (ArgV=..., ToolContext=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/driver.cpp:366
#34 0x00007f54fc5d826d in llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::operator()(llvm::SmallVectorImpl<char const*>&) const (params#0=..., this=<optimized out>)
at /usr/lib/llvm/17/include/llvm/ADT/STLFunctionalExtras.h:68
#35 operator() (__closure=0x7ffefaa65f40) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Job.cpp:440
#36 llvm::function_ref<void()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef> >, std::string*, bool*) const::<lambda()> >(intptr_t) (
callable=callable@entry=140733103628048) at /usr/lib/llvm/17/include/llvm/ADT/STLFunctionalExtras.h:45
#37 0x00007f54f0f19b5e in llvm::function_ref<void ()>::operator()() const (this=<synthetic pointer>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
#38 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (this=this@entry=0x7ffefaa65ef0, Fn=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Support/CrashRecoveryContext.cpp:426
#39 0x00007f54fc5dac70 in clang::driver::CC1Command::Execute (this=0x565551345560, Redirects=..., ErrMsg=<optimized out>, ExecutionFailed=<optimized out>)
at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Job.cpp:440
#40 0x00007f54fc598def in clang::driver::Compilation::ExecuteCommand (this=0x5655513a7550, C=..., FailingCommand=@0x7ffefaa66470: 0x0, LogOnly=<optimized out>)
at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Compilation.cpp:199
#41 0x00007f54fc5995f6 in clang::driver::Compilation::ExecuteJobs (this=this@entry=0x5655513a7550, Jobs=..., FailingCommands=..., LogOnly=LogOnly@entry=false)
at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Compilation.cpp:253
#42 0x00007f54fc5a87a4 in clang::driver::Driver::ExecuteCompilation (this=this@entry=0x7ffefaa668f0, C=..., FailingCommands=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Driver.cpp:1903
#43 0x0000565550c600b0 in clang_main (Argc=<optimized out>, Argv=<optimized out>, ToolContext=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/driver.cpp:542
#44 0x0000565550c59cc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/sys-devel/clang-17.0.1/x/y/clang-abi_x86_64.amd64/tools/driver/clang-driver.cpp:15 |
I've reduced it two ways.
#!/bin/sh
set -x
# The clang-16 one should build fine (and quickly, but don't bother checking that yet).
clang-16 -O2 -Werror=return-type -march=skylake -c sha512.c.i -S -o /dev/null || exit 1
timeout 45s clang-17 -O2 -Werror=return-type -march=skylake -c sha512.c.i -S -o /dev/null
ret=$?
case ${ret} in
0)
# If it built fine, it's uninteresting.
exit 1
;;
124)
# It timed out, yay!
exit 0
;;
*)
# It failed in some way but not a timeout, not interesting.
exit 1
;;
esac This gives the following which takes 1m5s w/ clang 17 on a fast machine (it completes in 0.087s w/ clang 16) which is hopefully representative enough: typedef unsigned int PRUint32;
typedef struct SHA256ContextStr SHA256Context;
typedef struct {
}
mp_int;
struct SHA256ContextStr {
union {
PRUint32 w[64];
}
u;
};
static const PRUint32 K256[64] __attribute__((aligned(16))) = {
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
void SHA256_Begin(SHA256Context *ctx) {
{
ctx->u.w[27] = ((((ctx->u.w[27 - 2] >> 17) | (ctx->u.w[27 - 2] << ((8 * sizeof ctx->u.w[27 - 2]) - 17))) ^ ((ctx->u.w[27 - 2] >> 19) | (ctx->u.w[27 - 2] << ((8 * sizeof ctx->u.w[27 - 2]) - 19))) ^ (ctx->u.w[27 - 2] >> 10)) + ctx->u.w[27 - 7] + (((ctx->u.w[27 - 15] >> 7) | (ctx->u.w[27 - 15] << ((8 * sizeof ctx->u.w[27 - 15]) - 7))) ^ ((ctx->u.w[27 - 15] >> 18) | (ctx->u.w[27 - 15] << ((8 * sizeof ctx->u.w[27 - 15]) - 18))) ^ (ctx->u.w[27 - 15] >> 3)) + ctx->u.w[27 - 16]);
ctx->u.w[28] = ((((ctx->u.w[28 - 2] >> 17) | (ctx->u.w[28 - 2] << ((8 * sizeof ctx->u.w[28 - 2]) - 17))) ^ ((ctx->u.w[28 - 2] >> 19) | (ctx->u.w[28 - 2] << ((8 * sizeof ctx->u.w[28 - 2]) - 19))) ^ (ctx->u.w[28 - 2] >> 10)) + ctx->u.w[28 - 7] + (((ctx->u.w[28 - 15] >> 7) | (ctx->u.w[28 - 15] << ((8 * sizeof ctx->u.w[28 - 15]) - 7))) ^ ((ctx->u.w[28 - 15] >> 18) | (ctx->u.w[28 - 15] << ((8 * sizeof ctx->u.w[28 - 15]) - 18))) ^ (ctx->u.w[28 - 15] >> 3)) + ctx->u.w[28 - 16]);
ctx->u.w[29] = ((((ctx->u.w[29 - 2] >> 17) | (ctx->u.w[29 - 2] << ((8 * sizeof ctx->u.w[29 - 2]) - 17))) ^ ((ctx->u.w[29 - 2] >> 19) | (ctx->u.w[29 - 2] << ((8 * sizeof ctx->u.w[29 - 2]) - 19))) ^ (ctx->u.w[29 - 2] >> 10)) + ctx->u.w[29 - 7] + (((ctx->u.w[29 - 15] >> 7) | (ctx->u.w[29 - 15] << ((8 * sizeof ctx->u.w[29 - 15]) - 7))) ^ ((ctx->u.w[29 - 15] >> 18) | (ctx->u.w[29 - 15] << ((8 * sizeof ctx->u.w[29 - 15]) - 18))) ^ (ctx->u.w[29 - 15] >> 3)) + ctx->u.w[29 - 16]);
ctx->u.w[30] = ((((ctx->u.w[30 - 2] >> 17) | (ctx->u.w[30 - 2] << ((8 * sizeof ctx->u.w[30 - 2]) - 17))) ^ ((ctx->u.w[30 - 2] >> 19) | (ctx->u.w[30 - 2] << ((8 * sizeof ctx->u.w[30 - 2]) - 19))) ^ (ctx->u.w[30 - 2] >> 10)) + ctx->u.w[30 - 7] + (((ctx->u.w[30 - 15] >> 7) | (ctx->u.w[30 - 15] << ((8 * sizeof ctx->u.w[30 - 15]) - 7))) ^ ((ctx->u.w[30 - 15] >> 18) | (ctx->u.w[30 - 15] << ((8 * sizeof ctx->u.w[30 - 15]) - 18))) ^ (ctx->u.w[30 - 15] >> 3)) + ctx->u.w[30 - 16]);
ctx->u.w[31] = ((((ctx->u.w[31 - 2] >> 17) | (ctx->u.w[31 - 2] << ((8 * sizeof ctx->u.w[31 - 2]) - 17))) ^ ((ctx->u.w[31 - 2] >> 19) | (ctx->u.w[31 - 2] << ((8 * sizeof ctx->u.w[31 - 2]) - 19))) ^ (ctx->u.w[31 - 2] >> 10)) + ctx->u.w[31 - 7] + (((ctx->u.w[31 - 15] >> 7) | (ctx->u.w[31 - 15] << ((8 * sizeof ctx->u.w[31 - 15]) - 7))) ^ ((ctx->u.w[31 - 15] >> 18) | (ctx->u.w[31 - 15] << ((8 * sizeof ctx->u.w[31 - 15]) - 18))) ^ (ctx->u.w[31 - 15] >> 3)) + ctx->u.w[31 - 16]);
ctx->u.w[32] = ((((ctx->u.w[32 - 2] >> 17) | (ctx->u.w[32 - 2] << ((8 * sizeof ctx->u.w[32 - 2]) - 17))) ^ ((ctx->u.w[32 - 2] >> 19) | (ctx->u.w[32 - 2] << ((8 * sizeof ctx->u.w[32 - 2]) - 19))) ^ (ctx->u.w[32 - 2] >> 10)) + ctx->u.w[32 - 7] + (((ctx->u.w[32 - 15] >> 7) | (ctx->u.w[32 - 15] << ((8 * sizeof ctx->u.w[32 - 15]) - 7))) ^ ((ctx->u.w[32 - 15] >> 18) | (ctx->u.w[32 - 15] << ((8 * sizeof ctx->u.w[32 - 15]) - 18))) ^ (ctx->u.w[32 - 15] >> 3)) + ctx->u.w[32 - 16]);
ctx->u.w[33] = ((((ctx->u.w[33 - 2] >> 17) | (ctx->u.w[33 - 2] << ((8 * sizeof ctx->u.w[33 - 2]) - 17))) ^ ((ctx->u.w[33 - 2] >> 19) | (ctx->u.w[33 - 2] << ((8 * sizeof ctx->u.w[33 - 2]) - 19))) ^ (ctx->u.w[33 - 2] >> 10)) + ctx->u.w[33 - 7] + (((ctx->u.w[33 - 15] >> 7) | (ctx->u.w[33 - 15] << ((8 * sizeof ctx->u.w[33 - 15]) - 7))) ^ ((ctx->u.w[33 - 15] >> 18) | (ctx->u.w[33 - 15] << ((8 * sizeof ctx->u.w[33 - 15]) - 18))) ^ (ctx->u.w[33 - 15] >> 3)) + ctx->u.w[33 - 16]);
ctx->u.w[36] = ((((ctx->u.w[36 - 2] >> 17) | (ctx->u.w[36 - 2] << ((8 * sizeof ctx->u.w[36 - 2]) - 17))) ^ ((ctx->u.w[36 - 2] >> 19) | (ctx->u.w[36 - 2] << ((8 * sizeof ctx->u.w[36 - 2]) - 19))) ^ (ctx->u.w[36 - 2] >> 10)) + ctx->u.w[36 - 7] + (((ctx->u.w[36 - 15] >> 7) | (ctx->u.w[36 - 15] << ((8 * sizeof ctx->u.w[36 - 15]) - 7))) ^ ((ctx->u.w[36 - 15] >> 18) | (ctx->u.w[36 - 15] << ((8 * sizeof ctx->u.w[36 - 15]) - 18))) ^ (ctx->u.w[36 - 15] >> 3)) + ctx->u.w[36 - 16]);
ctx->u.w[37] = ((((ctx->u.w[37 - 2] >> 17) | (ctx->u.w[37 - 2] << ((8 * sizeof ctx->u.w[37 - 2]) - 17))) ^ ((ctx->u.w[37 - 2] >> 19) | (ctx->u.w[37 - 2] << ((8 * sizeof ctx->u.w[37 - 2]) - 19))) ^ (ctx->u.w[37 - 2] >> 10)) + ctx->u.w[37 - 7] + (((ctx->u.w[37 - 15] >> 7) | (ctx->u.w[37 - 15] << ((8 * sizeof ctx->u.w[37 - 15]) - 7))) ^ ((ctx->u.w[37 - 15] >> 18) | (ctx->u.w[37 - 15] << ((8 * sizeof ctx->u.w[37 - 15]) - 18))) ^ (ctx->u.w[37 - 15] >> 3)) + ctx->u.w[37 - 16]);
ctx->u.w[38] = ((((ctx->u.w[38 - 2] >> 17) | (ctx->u.w[38 - 2] << ((8 * sizeof ctx->u.w[38 - 2]) - 17))) ^ ((ctx->u.w[38 - 2] >> 19) | (ctx->u.w[38 - 2] << ((8 * sizeof ctx->u.w[38 - 2]) - 19))) ^ (ctx->u.w[38 - 2] >> 10)) + ctx->u.w[38 - 7] + (((ctx->u.w[38 - 15] >> 7) | (ctx->u.w[38 - 15] << ((8 * sizeof ctx->u.w[38 - 15]) - 7))) ^ ((ctx->u.w[38 - 15] >> 18) | (ctx->u.w[38 - 15] << ((8 * sizeof ctx->u.w[38 - 15]) - 18))) ^ (ctx->u.w[38 - 15] >> 3)) + ctx->u.w[38 - 16]);
ctx->u.w[39] = ((((ctx->u.w[39 - 2] >> 17) | (ctx->u.w[39 - 2] << ((8 * sizeof ctx->u.w[39 - 2]) - 17))) ^ ((ctx->u.w[39 - 2] >> 19) | (ctx->u.w[39 - 2] << ((8 * sizeof ctx->u.w[39 - 2]) - 19))) ^ (ctx->u.w[39 - 2] >> 10)) + ctx->u.w[39 - 7] + (((ctx->u.w[39 - 15] >> 7) | (ctx->u.w[39 - 15] << ((8 * sizeof ctx->u.w[39 - 15]) - 7))) ^ ((ctx->u.w[39 - 15] >> 18) | (ctx->u.w[39 - 15] << ((8 * sizeof ctx->u.w[39 - 15]) - 18))) ^ (ctx->u.w[39 - 15] >> 3)) + ctx->u.w[39 - 16]);
ctx->u.w[40] = ((((ctx->u.w[40 - 2] >> 17) | (ctx->u.w[40 - 2] << ((8 * sizeof ctx->u.w[40 - 2]) - 17))) ^ ((ctx->u.w[40 - 2] >> 19) | (ctx->u.w[40 - 2] << ((8 * sizeof ctx->u.w[40 - 2]) - 19))) ^ (ctx->u.w[40 - 2] >> 10)) + ctx->u.w[40 - 7] + (((ctx->u.w[40 - 15] >> 7) | (ctx->u.w[40 - 15] << ((8 * sizeof ctx->u.w[40 - 15]) - 7))) ^ ((ctx->u.w[40 - 15] >> 18) | (ctx->u.w[40 - 15] << ((8 * sizeof ctx->u.w[40 - 15]) - 18))) ^ (ctx->u.w[40 - 15] >> 3)) + ctx->u.w[40 - 16]);
ctx->u.w[41] = ((((ctx->u.w[41 - 2] >> 17) | (ctx->u.w[41 - 2] << ((8 * sizeof ctx->u.w[41 - 2]) - 17))) ^ ((ctx->u.w[41 - 2] >> 19) | (ctx->u.w[41 - 2] << ((8 * sizeof ctx->u.w[41 - 2]) - 19))) ^ (ctx->u.w[41 - 2] >> 10)) + ctx->u.w[41 - 7] + (((ctx->u.w[41 - 15] >> 7) | (ctx->u.w[41 - 15] << ((8 * sizeof ctx->u.w[41 - 15]) - 7))) ^ ((ctx->u.w[41 - 15] >> 18) | (ctx->u.w[41 - 15] << ((8 * sizeof ctx->u.w[41 - 15]) - 18))) ^ (ctx->u.w[41 - 15] >> 3)) + ctx->u.w[41 - 16]);
ctx->u.w[43] = ((((ctx->u.w[43 - 2] >> 17) | (ctx->u.w[43 - 2] << ((8 * sizeof ctx->u.w[43 - 2]) - 17))) ^ ((ctx->u.w[43 - 2] >> 19) | (ctx->u.w[43 - 2] << ((8 * sizeof ctx->u.w[43 - 2]) - 19))) ^ (ctx->u.w[43 - 2] >> 10)) + ctx->u.w[43 - 7] + (((ctx->u.w[43 - 15] >> 7) | (ctx->u.w[43 - 15] << ((8 * sizeof ctx->u.w[43 - 15]) - 7))) ^ ((ctx->u.w[43 - 15] >> 18) | (ctx->u.w[43 - 15] << ((8 * sizeof ctx->u.w[43 - 15]) - 18))) ^ (ctx->u.w[43 - 15] >> 3)) + ctx->u.w[43 - 16]);
ctx->u.w[45] = ((((ctx->u.w[45 - 2] >> 17) | (ctx->u.w[45 - 2] << ((8 * sizeof ctx->u.w[45 - 2]) - 17))) ^ ((ctx->u.w[45 - 2] >> 19) | (ctx->u.w[45 - 2] << ((8 * sizeof ctx->u.w[45 - 2]) - 19))) ^ (ctx->u.w[45 - 2] >> 10)) + ctx->u.w[45 - 7] + (((ctx->u.w[45 - 15] >> 7) | (ctx->u.w[45 - 15] << ((8 * sizeof ctx->u.w[45 - 15]) - 7))) ^ ((ctx->u.w[45 - 15] >> 18) | (ctx->u.w[45 - 15] << ((8 * sizeof ctx->u.w[45 - 15]) - 18))) ^ (ctx->u.w[45 - 15] >> 3)) + ctx->u.w[45 - 16]);
ctx->u.w[46] = ((((ctx->u.w[46 - 2] >> 17) | (ctx->u.w[46 - 2] << ((8 * sizeof ctx->u.w[46 - 2]) - 17))) ^ ((ctx->u.w[46 - 2] >> 19) | (ctx->u.w[46 - 2] << ((8 * sizeof ctx->u.w[46 - 2]) - 19))) ^ (ctx->u.w[46 - 2] >> 10)) + ctx->u.w[46 - 7] + (((ctx->u.w[46 - 15] >> 7) | (ctx->u.w[46 - 15] << ((8 * sizeof ctx->u.w[46 - 15]) - 7))) ^ ((ctx->u.w[46 - 15] >> 18) | (ctx->u.w[46 - 15] << ((8 * sizeof ctx->u.w[46 - 15]) - 18))) ^ (ctx->u.w[46 - 15] >> 3)) + ctx->u.w[46 - 16]);
ctx->u.w[47] = ((((ctx->u.w[47 - 2] >> 17) | (ctx->u.w[47 - 2] << ((8 * sizeof ctx->u.w[47 - 2]) - 17))) ^ ((ctx->u.w[47 - 2] >> 19) | (ctx->u.w[47 - 2] << ((8 * sizeof ctx->u.w[47 - 2]) - 19))) ^ (ctx->u.w[47 - 2] >> 10)) + ctx->u.w[47 - 7] + (((ctx->u.w[47 - 15] >> 7) | (ctx->u.w[47 - 15] << ((8 * sizeof ctx->u.w[47 - 15]) - 7))) ^ ((ctx->u.w[47 - 15] >> 18) | (ctx->u.w[47 - 15] << ((8 * sizeof ctx->u.w[47 - 15]) - 18))) ^ (ctx->u.w[47 - 15] >> 3)) + ctx->u.w[47 - 16]);
ctx->u.w[48] = ((((ctx->u.w[48 - 2] >> 17) | (ctx->u.w[48 - 2] << ((8 * sizeof ctx->u.w[48 - 2]) - 17))) ^ ((ctx->u.w[48 - 2] >> 19) | (ctx->u.w[48 - 2] << ((8 * sizeof ctx->u.w[48 - 2]) - 19))) ^ (ctx->u.w[48 - 2] >> 10)) + ctx->u.w[48 - 7] + (((ctx->u.w[48 - 15] >> 7) | (ctx->u.w[48 - 15] << ((8 * sizeof ctx->u.w[48 - 15]) - 7))) ^ ((ctx->u.w[48 - 15] >> 18) | (ctx->u.w[48 - 15] << ((8 * sizeof ctx->u.w[48 - 15]) - 18))) ^ (ctx->u.w[48 - 15] >> 3)) + ctx->u.w[48 - 16]);
ctx->u.w[49] = ((((ctx->u.w[49 - 2] >> 17) | (ctx->u.w[49 - 2] << ((8 * sizeof ctx->u.w[49 - 2]) - 17))) ^ ((ctx->u.w[49 - 2] >> 19) | (ctx->u.w[49 - 2] << ((8 * sizeof ctx->u.w[49 - 2]) - 19))) ^ (ctx->u.w[49 - 2] >> 10)) + ctx->u.w[49 - 7] + (((ctx->u.w[49 - 15] >> 7) | (ctx->u.w[49 - 15] << ((8 * sizeof ctx->u.w[49 - 15]) - 7))) ^ ((ctx->u.w[49 - 15] >> 18) | (ctx->u.w[49 - 15] << ((8 * sizeof ctx->u.w[49 - 15]) - 18))) ^ (ctx->u.w[49 - 15] >> 3)) + ctx->u.w[49 - 16]);
ctx->u.w[50] = ((((ctx->u.w[50 - 2] >> 17) | (ctx->u.w[50 - 2] << ((8 * sizeof ctx->u.w[50 - 2]) - 17))) ^ ((ctx->u.w[50 - 2] >> 19) | (ctx->u.w[50 - 2] << ((8 * sizeof ctx->u.w[50 - 2]) - 19))) ^ (ctx->u.w[50 - 2] >> 10)) + ctx->u.w[50 - 7] + (((ctx->u.w[50 - 15] >> 7) | (ctx->u.w[50 - 15] << ((8 * sizeof ctx->u.w[50 - 15]) - 7))) ^ ((ctx->u.w[50 - 15] >> 18) | (ctx->u.w[50 - 15] << ((8 * sizeof ctx->u.w[50 - 15]) - 18))) ^ (ctx->u.w[50 - 15] >> 3)) + ctx->u.w[50 - 16]);
ctx->u.w[51] = ((((ctx->u.w[51 - 2] >> 17) | (ctx->u.w[51 - 2] << ((8 * sizeof ctx->u.w[51 - 2]) - 17))) ^ ((ctx->u.w[51 - 2] >> 19) | (ctx->u.w[51 - 2] << ((8 * sizeof ctx->u.w[51 - 2]) - 19))) ^ (ctx->u.w[51 - 2] >> 10)) + ctx->u.w[51 - 7] + (((ctx->u.w[51 - 15] >> 7) | (ctx->u.w[51 - 15] << ((8 * sizeof ctx->u.w[51 - 15]) - 7))) ^ ((ctx->u.w[51 - 15] >> 18) | (ctx->u.w[51 - 15] << ((8 * sizeof ctx->u.w[51 - 15]) - 18))) ^ (ctx->u.w[51 - 15] >> 3)) + ctx->u.w[51 - 16]);
ctx->u.w[52] = ((((ctx->u.w[52 - 2] >> 17) | (ctx->u.w[52 - 2] << ((8 * sizeof ctx->u.w[52 - 2]) - 17))) ^ ((ctx->u.w[52 - 2] >> 19) | (ctx->u.w[52 - 2] << ((8 * sizeof ctx->u.w[52 - 2]) - 19))) ^ (ctx->u.w[52 - 2] >> 10)) + ctx->u.w[52 - 7] + (((ctx->u.w[52 - 15] >> 7) | (ctx->u.w[52 - 15] << ((8 * sizeof ctx->u.w[52 - 15]) - 7))) ^ ((ctx->u.w[52 - 15] >> 18) | (ctx->u.w[52 - 15] << ((8 * sizeof ctx->u.w[52 - 15]) - 18))) ^ (ctx->u.w[52 - 15] >> 3)) + ctx->u.w[52 - 16]);
ctx->u.w[53] = ((((ctx->u.w[53 - 2] >> 17) | (ctx->u.w[53 - 2] << ((8 * sizeof ctx->u.w[53 - 2]) - 17))) ^ ((ctx->u.w[53 - 2] >> 19) | (ctx->u.w[53 - 2] << ((8 * sizeof ctx->u.w[53 - 2]) - 19))) ^ (ctx->u.w[53 - 2] >> 10)) + ctx->u.w[53 - 7] + (((ctx->u.w[53 - 15] >> 7) | (ctx->u.w[53 - 15] << ((8 * sizeof ctx->u.w[53 - 15]) - 7))) ^ ((ctx->u.w[53 - 15] >> 18) | (ctx->u.w[53 - 15] << ((8 * sizeof ctx->u.w[53 - 15]) - 18))) ^ (ctx->u.w[53 - 15] >> 3)) + ctx->u.w[53 - 16]);
ctx->u.w[54] = ((((ctx->u.w[54 - 2] >> 17) | (ctx->u.w[54 - 2] << ((8 * sizeof ctx->u.w[54 - 2]) - 17))) ^ ((ctx->u.w[54 - 2] >> 19) | (ctx->u.w[54 - 2] << ((8 * sizeof ctx->u.w[54 - 2]) - 19))) ^ (ctx->u.w[54 - 2] >> 10)) + ctx->u.w[54 - 7] + (((ctx->u.w[54 - 15] >> 7) | (ctx->u.w[54 - 15] << ((8 * sizeof ctx->u.w[54 - 15]) - 7))) ^ ((ctx->u.w[54 - 15] >> 18) | (ctx->u.w[54 - 15] << ((8 * sizeof ctx->u.w[54 - 15]) - 18))) ^ (ctx->u.w[54 - 15] >> 3)) + ctx->u.w[54 - 16]);
ctx->u.w[55] = ((((ctx->u.w[55 - 2] >> 17) | (ctx->u.w[55 - 2] << ((8 * sizeof ctx->u.w[55 - 2]) - 17))) ^ ((ctx->u.w[55 - 2] >> 19) | (ctx->u.w[55 - 2] << ((8 * sizeof ctx->u.w[55 - 2]) - 19))) ^ (ctx->u.w[55 - 2] >> 10)) + ctx->u.w[55 - 7] + (((ctx->u.w[55 - 15] >> 7) | (ctx->u.w[55 - 15] << ((8 * sizeof ctx->u.w[55 - 15]) - 7))) ^ ((ctx->u.w[55 - 15] >> 18) | (ctx->u.w[55 - 15] << ((8 * sizeof ctx->u.w[55 - 15]) - 18))) ^ (ctx->u.w[55 - 15] >> 3)) + ctx->u.w[55 - 16]);
ctx->u.w[56] = ((((ctx->u.w[56 - 2] >> 17) | (ctx->u.w[56 - 2] << ((8 * sizeof ctx->u.w[56 - 2]) - 17))) ^ ((ctx->u.w[56 - 2] >> 19) | (ctx->u.w[56 - 2] << ((8 * sizeof ctx->u.w[56 - 2]) - 19))) ^ (ctx->u.w[56 - 2] >> 10)) + ctx->u.w[56 - 7] + (((ctx->u.w[56 - 15] >> 7) | (ctx->u.w[56 - 15] << ((8 * sizeof ctx->u.w[56 - 15]) - 7))) ^ ((ctx->u.w[56 - 15] >> 18) | (ctx->u.w[56 - 15] << ((8 * sizeof ctx->u.w[56 - 15]) - 18))) ^ (ctx->u.w[56 - 15] >> 3)) + ctx->u.w[56 - 16]);
ctx->u.w[57] = ((((ctx->u.w[57 - 2] >> 17) | (ctx->u.w[57 - 2] << ((8 * sizeof ctx->u.w[57 - 2]) - 17))) ^ ((ctx->u.w[57 - 2] >> 19) | (ctx->u.w[57 - 2] << ((8 * sizeof ctx->u.w[57 - 2]) - 19))) ^ (ctx->u.w[57 - 2] >> 10)) + ctx->u.w[57 - 7] + (((ctx->u.w[57 - 15] >> 7) | (ctx->u.w[57 - 15] << ((8 * sizeof ctx->u.w[57 - 15]) - 7))) ^ ((ctx->u.w[57 - 15] >> 18) | (ctx->u.w[57 - 15] << ((8 * sizeof ctx->u.w[57 - 15]) - 18))) ^ (ctx->u.w[57 - 15] >> 3)) + ctx->u.w[57 - 16]);
ctx->u.w[58] = ((((ctx->u.w[58 - 2] >> 17) | (ctx->u.w[58 - 2] << ((8 * sizeof ctx->u.w[58 - 2]) - 17))) ^ ((ctx->u.w[58 - 2] >> 19) | (ctx->u.w[58 - 2] << ((8 * sizeof ctx->u.w[58 - 2]) - 19))) ^ (ctx->u.w[58 - 2] >> 10)) + ctx->u.w[58 - 7] + (((ctx->u.w[58 - 15] >> 7) | (ctx->u.w[58 - 15] << ((8 * sizeof ctx->u.w[58 - 15]) - 7))) ^ ((ctx->u.w[58 - 15] >> 18) | (ctx->u.w[58 - 15] << ((8 * sizeof ctx->u.w[58 - 15]) - 18))) ^ (ctx->u.w[58 - 15] >> 3)) + ctx->u.w[58 - 16]);
ctx->u.w[59] = ((((ctx->u.w[59 - 2] >> 17) | (ctx->u.w[59 - 2] << ((8 * sizeof ctx->u.w[59 - 2]) - 17))) ^ ((ctx->u.w[59 - 2] >> 19) | (ctx->u.w[59 - 2] << ((8 * sizeof ctx->u.w[59 - 2]) - 19))) ^ (ctx->u.w[59 - 2] >> 10)) + ctx->u.w[59 - 7] + (((ctx->u.w[59 - 15] >> 7) | (ctx->u.w[59 - 15] << ((8 * sizeof ctx->u.w[59 - 15]) - 7))) ^ ((ctx->u.w[59 - 15] >> 18) | (ctx->u.w[59 - 15] << ((8 * sizeof ctx->u.w[59 - 15]) - 18))) ^ (ctx->u.w[59 - 15] >> 3)) + ctx->u.w[59 - 16]);
ctx->u.w[60] = ((((ctx->u.w[60 - 2] >> 17) | (ctx->u.w[60 - 2] << ((8 * sizeof ctx->u.w[60 - 2]) - 17))) ^ ((ctx->u.w[60 - 2] >> 19) | (ctx->u.w[60 - 2] << ((8 * sizeof ctx->u.w[60 - 2]) - 19))) ^ (ctx->u.w[60 - 2] >> 10)) + ctx->u.w[60 - 7] + (((ctx->u.w[60 - 15] >> 7) | (ctx->u.w[60 - 15] << ((8 * sizeof ctx->u.w[60 - 15]) - 7))) ^ ((ctx->u.w[60 - 15] >> 18) | (ctx->u.w[60 - 15] << ((8 * sizeof ctx->u.w[60 - 15]) - 18))) ^ (ctx->u.w[60 - 15] >> 3)) + ctx->u.w[60 - 16]);
}
}
$ clang-17 -O2 -c sha512.c.i -march=skylake -emit-llvm -S
$ llc sha512.c.ll # confirm this hangs
$ llvm-reduce --test=test2.sh sha512.c.ll with #!/usr/bin/env bash
timeout 30s llc "$@"
ret=$?
case ${ret} in
0)
# If it built fine, it's uninteresting.
exit 1
;;
124)
# It timed out, yay!
exit 0
;;
*)
# It failed in some way but not a timeout, not interesting.
exit 1
;;
esac This kind of worked. The result takes ~30s for ; ModuleID = '<bc file>'
source_filename = "sha512.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0
; Function Attrs: nounwind sspstrong memory(argmem: readwrite) uwtable
define void @SHA256_Compress_Generic(ptr noundef %ctx) #1 {
entry:
%0 = load i32, ptr null, align 4
%1 = tail call i32 asm "bswap $0", "=r,0,~{dirflag},~{fpsr},~{flags}"(i32 %0) #5
%arrayidx14 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 3
%2 = load i32, ptr %arrayidx14, align 4
%3 = tail call i32 asm "bswap $0", "=r,0,~{dirflag},~{fpsr},~{flags}"(i32 %2) #5
%4 = insertelement <2 x i32> zeroinitializer, i32 %1, i64 1
%5 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %4, <2 x i32> %4, <2 x i32> <i32 15, i32 15>)
%6 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %4, <2 x i32> %4, <2 x i32> <i32 13, i32 13>)
%7 = xor <2 x i32> %5, %6
%8 = lshr <2 x i32> %4, zeroinitializer
%9 = xor <2 x i32> %7, %8
%10 = insertelement <2 x i32> zeroinitializer, i32 %3, i64 0
%11 = shufflevector <2 x i32> zeroinitializer, <2 x i32> %10, <2 x i32> <i32 1, i32 2>
%12 = add <2 x i32> %11, %9
%13 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %12, <2 x i32> %12, <2 x i32> <i32 15, i32 15>)
%14 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %12, <2 x i32> %12, <2 x i32> <i32 13, i32 13>)
%15 = xor <2 x i32> %13, %14
%16 = lshr <2 x i32> %12, zeroinitializer
%17 = xor <2 x i32> %15, %16
%18 = add <2 x i32> %4, %17
%19 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %18, <2 x i32> %18, <2 x i32> <i32 15, i32 15>)
%20 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %18, <2 x i32> %18, <2 x i32> <i32 13, i32 13>)
%21 = xor <2 x i32> %19, %20
%22 = lshr <2 x i32> %18, <i32 10, i32 10>
%23 = xor <2 x i32> %21, %22
%24 = add <2 x i32> %4, %23
%25 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %24, <2 x i32> %24, <2 x i32> <i32 15, i32 15>)
%26 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %24, <2 x i32> %24, <2 x i32> <i32 13, i32 13>)
%27 = xor <2 x i32> %25, %26
%28 = lshr <2 x i32> %24, <i32 10, i32 10>
%29 = xor <2 x i32> %27, %28
%30 = shufflevector <2 x i32> %4, <2 x i32> %12, <2 x i32> <i32 1, i32 2>
%31 = add <2 x i32> %30, %29
%32 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %31, <2 x i32> %31, <2 x i32> <i32 15, i32 15>)
%33 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %31, <2 x i32> %31, <2 x i32> <i32 13, i32 13>)
%34 = xor <2 x i32> %32, %33
%35 = lshr <2 x i32> %31, <i32 10, i32 10>
%36 = xor <2 x i32> %34, %35
%37 = shufflevector <2 x i32> %12, <2 x i32> zeroinitializer, <2 x i32> <i32 1, i32 2>
%38 = add <2 x i32> %37, %36
%arrayidx918 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 33
store <2 x i32> %38, ptr %arrayidx918, align 4
%arrayidx1012 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 35
%39 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %38, <2 x i32> %38, <2 x i32> <i32 15, i32 15>)
%40 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %38, <2 x i32> %38, <2 x i32> <i32 13, i32 13>)
%41 = xor <2 x i32> %39, %40
%42 = lshr <2 x i32> %38, <i32 10, i32 10>
%43 = xor <2 x i32> %41, %42
%44 = add <2 x i32> %37, %43
store <2 x i32> zeroinitializer, ptr %arrayidx1012, align 4
%arrayidx1106 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 37
%45 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %44, <2 x i32> %44, <2 x i32> <i32 15, i32 15>)
%46 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %44, <2 x i32> %44, <2 x i32> <i32 13, i32 13>)
%47 = xor <2 x i32> %45, %46
%48 = lshr <2 x i32> %44, <i32 10, i32 10>
%49 = xor <2 x i32> %47, %48
%50 = lshr <2 x i32> %24, zeroinitializer
%51 = add <2 x i32> %50, %49
store <2 x i32> %51, ptr %arrayidx1106, align 4
%arrayidx1200 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 39
%52 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %51, <2 x i32> %51, <2 x i32> <i32 15, i32 15>)
%53 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %51, <2 x i32> %51, <2 x i32> <i32 13, i32 13>)
%54 = xor <2 x i32> %52, %53
%55 = lshr <2 x i32> %51, <i32 10, i32 10>
%56 = xor <2 x i32> %54, %55
%57 = shufflevector <2 x i32> %38, <2 x i32> zeroinitializer, <2 x i32> <i32 poison, i32 0>
%58 = insertelement <2 x i32> %57, i32 0, i64 0
%59 = add <2 x i32> %58, %56
store <2 x i32> %59, ptr %arrayidx1200, align 4
ret void
; uselistorder directives
uselistorder <2 x i32> %4, { 7, 0, 1, 6, 5, 4, 3, 2 }
uselistorder <2 x i32> %38, { 6, 5, 4, 3, 2, 1, 0 }
}
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.bswap.i64(i64) #2
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.fshl.i32(i32, i32, i32) #2
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.umin.i32(i32, i32) #2
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.fshl.i64(i64, i64, i64) #2
; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #3
; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #4
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i32> @llvm.fshl.v2i32(<2 x i32>, <2 x i32>, <2 x i32>) #2
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>) #2
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i64> @llvm.fshl.v2i64(<2 x i64>, <2 x i64>, <2 x i64>) #2
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>) #2
; uselistorder directives
uselistorder ptr @llvm.fshl.v2i32, { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }
attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #1 = { nounwind sspstrong memory(argmem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="inline-asm" "stack-protector-buffer-size"="8" "target-cpu"="skylake" "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sgx,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #5 = { nounwind memory(none) } |
cc @RKSimon Bisect says af32e51:
If I revert it on release/17.x, I get decent performance (although seemingly consistently a bit slower by half a second or so than clang 16). |
@llvm/issue-subscribers-backend-x86
When using clang 17.0.1 in Gentoo and compiling [dev-libs/nss](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS) an infinite loop is detected in sha512.c compilation unit (reported [downstream Gentoo bug report](https://bugs.gentoo.org/914657)) which is not reproducible using Clang 16.
An initial testing revealed that march=native (also tested march=skylake and march=alderlake triggering the problem) might be causing the trouble. In order to ease the test process, I attach the preprocessed output for that compile unit which I expect can reproduce this issue easily (remove the .txt extension) |
Finally worked this out - we're missing a oneuse limit in combineConcatVectorOps - fix incoming |
/cherry-pick 04b403d |
Failed to cherry-pick: 04b403d https://github.com/llvm/llvm-project/actions/runs/6389691511 Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:
|
@RKSimon Could you handle the backport? Cheers. |
/branch RKSimon/llvm-project/PR67333 |
/pull-request llvm/llvm-project-release-prs#724 |
We could maybe extend this by allowing the lowest subop to have multiple uses and extract the lowest subvector result of the concatenated op, but let's just get the fix in first. Fixes #67333
Uh oh!
There was an error while loading. Please reload this page.
When using clang 17.0.1 in Gentoo and compiling dev-libs/nss an infinite loop is detected in sha512.c compilation unit (reported downstream Gentoo bug report) which is not reproducible using Clang 16.
An initial testing revealed that march=native (also tested march=skylake and march=alderlake triggering the problem) might be causing the trouble.
In order to ease the test process, I attach the preprocessed output for that compile unit which I expect can reproduce this issue easily (remove the .txt extension)
sha512-preprocessed.c.txt
The text was updated successfully, but these errors were encountered: