Skip to content

[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
StormBytePP opened this issue Sep 25, 2023 · 10 comments · Fixed by llvm/llvm-project-release-prs#724
Assignees
Labels
backend:X86 hang Compiler hang (infinite loop) regression

Comments

@StormBytePP
Copy link

StormBytePP commented Sep 25, 2023

When using clang 17.0.1 in Gentoo and compiling dev-libs/nss an infinite loop is detected in sha512.c compilation unit (reported downstream Gentoo bug report) which is not reproducible using Clang 16.

An initial testing revealed that march=native (also tested march=skylake and march=alderlake triggering the problem) might be causing the trouble.

In order to ease the test process, I attach the preprocessed output for that compile unit which I expect can reproduce this issue easily (remove the .txt extension)

sha512-preprocessed.c.txt

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Sep 25, 2023
@StormBytePP StormBytePP changed the title [Clang 17.0.1] [Regression] Infinite loop in compile unit [clang 17.0.1] [regression] Infinite loop in compile unit Sep 25, 2023
@shafik shafik added the needs-reduction Large reproducer that should be reduced into a simpler form label Sep 25, 2023
@thesamesam thesamesam changed the title [clang 17.0.1] [regression] Infinite loop in compile unit [clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake Sep 25, 2023
@thesamesam
Copy link
Member

thesamesam commented Sep 25, 2023

I can reproduce it with:

clang -O2 -march=skylake sha512.i

If I attach gdb to the process after it's been running for a little while:

0x00007f54f0f4cab2 in llvm::APInt::maskBit (bitPosition=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1900
1900        return 1ULL << whichBit(bitPosition);
(gdb) bt
#0  0x00007f54f0f4cab2 in llvm::APInt::maskBit (bitPosition=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1900
#1  llvm::APInt::operator[] (bitPosition=<optimized out>, this=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1017
#2  llvm::APInt::isSignBitSet (this=0x7ffefaa604e0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:319
#3  llvm::KnownBits::isNegative (this=0x7ffefaa604d0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/Support/KnownBits.h:96
#4  llvm::KnownBits::computeForAddSub (Add=<optimized out>, NSW=NSW@entry=false, LHS=..., RHS=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Support/KnownBits.cpp:72
#5  0x00007f54f1c5bfa8 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=2, AssumeSingleUse=false)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:2740
#6  0x00007f54f1c5e41d in llvm::TargetLowering::SimplifyDemandedBits (this=this@entry=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=1,
    AssumeSingleUse=false) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1225
#7  0x00007f54f502dc6d in llvm::X86TargetLowering::SimplifyDemandedBitsForTargetNode (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=0)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Target/X86/X86ISelLowering.cpp:44365
#8  0x00007f54f1c60533 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=0, AssumeSingleUse=false)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:2754
#9  0x00007f54f1c6bf57 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., DemandedBits=..., Known=..., TLO=..., Depth=0, AssumeSingleUse=false)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:646
#10 0x00007f54f1c6c0f2 in llvm::TargetLowering::SimplifyDemandedBits (this=this@entry=0x565551705ff0, Op=..., DemandedBits=..., DCI=...)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:608
#11 0x00007f54f5135377 in combineVectorShiftImm (N=<optimized out>, DAG=..., DCI=..., Subtarget=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:1139
#12 0x00007f54f19d47c4 in (anonymous namespace)::DAGCombiner::combine (this=this@entry=0x7ffefaa63130, N=N@entry=0x56555191d900)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:2049
#13 0x00007f54f19d62f9 in (anonymous namespace)::DAGCombiner::Run (AtLevel=llvm::AfterLegalizeDAG, this=0x7ffefaa63130)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1827
#14 llvm::SelectionDAG::Combine (this=<optimized out>, Level=Level@entry=llvm::AfterLegalizeDAG, AA=<optimized out>, OptLevel=<optimized out>)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:27592
#15 0x00007f54f1bfbb63 in llvm::SelectionDAGISel::CodeGenAndEmitDAG (this=0x565551829cb0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:925
#16 0x00007f54f1c00fc0 in llvm::SelectionDAGISel::SelectAllBasicBlocks (this=0x565551829cb0, Fn=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1700
#17 0x00007f54f1c027d6 in llvm::SelectionDAGISel::runOnMachineFunction (this=this@entry=0x565551829cb0, mf=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:482
#18 0x00007f54f4fd4619 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction (this=0x565551829cb0, MF=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:191
#19 0x00007f54f1547cd4 in llvm::MachineFunctionPass::runOnFunction (this=0x565551829cb0, F=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/MachineFunctionPass.cpp:91
#20 0x00007f54f11bdb13 in llvm::FPPassManager::runOnFunction (this=0x565551826a40, F=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1435
#21 0x00007f54f11bdd51 in llvm::FPPassManager::runOnModule (this=0x565551826a40, M=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1481
#22 0x00007f54f11be7f4 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1550
#23 llvm::legacy::PassManagerImpl::run (this=0x5655517a7120, M=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:535
#24 0x00007f54fb97774a in (anonymous namespace)::EmitAssemblyHelper::RunCodegenPipeline (DwoOS=<synthetic pointer>std::unique_ptr<llvm::ToolOutputFile> = {...},
    OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}, Action=clang::Backend_EmitObj, this=0x7ffefaa64b60) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1115
#25 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}, Action=clang::Backend_EmitObj, this=0x7ffefaa64b60)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1137
#26 clang::EmitBackendOutput (Diags=..., HeaderOpts=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=M@entry=0x5655513bb820, Action=clang::Backend_EmitObj, VFS=...,
    OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1299
#27 0x00007f54fbe13f15 in clang::BackendConsumer::HandleTranslationUnit (this=0x5655513b64d0, C=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/CodeGenAction.cpp:386
#28 0x00007f54fa41aa55 in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=<optimized out>) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Parse/ParseAST.cpp:176
#29 0x00007f54fca51be9 in clang::FrontendAction::Execute (this=this@entry=0x5655513b6b60) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Frontend/FrontendAction.cpp:1059
#30 0x00007f54fc9de17b in clang::CompilerInstance::ExecuteAction (this=this@entry=0x5655513ada00, Act=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Frontend/CompilerInstance.cpp:1053
#31 0x00007f54fcaebb2b in clang::ExecuteCompilerInvocation (Clang=Clang@entry=0x5655513ada00) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:272
#32 0x0000565550c63035 in cc1_main (Argv=..., Argv0=0x5655513a4f70 "/usr/lib/llvm/17/bin/clang-17", MainAddr=MainAddr@entry=0x565550c5c270 <GetExecutablePath[abi:cxx11](char const*, bool)>)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/cc1_main.cpp:249
#33 0x0000565550c5bcab in ExecuteCC1Tool (ArgV=..., ToolContext=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/driver.cpp:366
#34 0x00007f54fc5d826d in llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::operator()(llvm::SmallVectorImpl<char const*>&) const (params#0=..., this=<optimized out>)
    at /usr/lib/llvm/17/include/llvm/ADT/STLFunctionalExtras.h:68
#35 operator() (__closure=0x7ffefaa65f40) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Job.cpp:440
#36 llvm::function_ref<void()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef> >, std::string*, bool*) const::<lambda()> >(intptr_t) (
    callable=callable@entry=140733103628048) at /usr/lib/llvm/17/include/llvm/ADT/STLFunctionalExtras.h:45
#37 0x00007f54f0f19b5e in llvm::function_ref<void ()>::operator()() const (this=<synthetic pointer>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
#38 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (this=this@entry=0x7ffefaa65ef0, Fn=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Support/CrashRecoveryContext.cpp:426
#39 0x00007f54fc5dac70 in clang::driver::CC1Command::Execute (this=0x565551345560, Redirects=..., ErrMsg=<optimized out>, ExecutionFailed=<optimized out>)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Job.cpp:440
#40 0x00007f54fc598def in clang::driver::Compilation::ExecuteCommand (this=0x5655513a7550, C=..., FailingCommand=@0x7ffefaa66470: 0x0, LogOnly=<optimized out>)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Compilation.cpp:199
#41 0x00007f54fc5995f6 in clang::driver::Compilation::ExecuteJobs (this=this@entry=0x5655513a7550, Jobs=..., FailingCommands=..., LogOnly=LogOnly@entry=false)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Compilation.cpp:253
#42 0x00007f54fc5a87a4 in clang::driver::Driver::ExecuteCompilation (this=this@entry=0x7ffefaa668f0, C=..., FailingCommands=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Driver.cpp:1903
#43 0x0000565550c600b0 in clang_main (Argc=<optimized out>, Argv=<optimized out>, ToolContext=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/driver.cpp:542
#44 0x0000565550c59cc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/sys-devel/clang-17.0.1/x/y/clang-abi_x86_64.amd64/tools/driver/clang-driver.cpp:15

@thesamesam thesamesam added the hang Compiler hang (infinite loop) label Sep 25, 2023
@thesamesam
Copy link
Member

thesamesam commented Sep 26, 2023

I've reduced it two ways.

  1. Using cvise on the C reproducer:
#!/bin/sh
set -x

# The clang-16 one should build fine (and quickly, but don't bother checking that yet).
clang-16 -O2 -Werror=return-type -march=skylake -c sha512.c.i -S -o /dev/null || exit 1

timeout 45s clang-17 -O2 -Werror=return-type -march=skylake -c sha512.c.i -S -o /dev/null
ret=$?

case ${ret} in
        0)
                # If it built fine, it's uninteresting.
                exit 1
                ;;
        124)
                # It timed out, yay!
                exit 0
                ;;
        *)
                # It failed in some way but not a timeout, not interesting.
                exit 1
                ;;
esac

This gives the following which takes 1m5s w/ clang 17 on a fast machine (it completes in 0.087s w/ clang 16) which is hopefully representative enough:

    typedef unsigned int PRUint32;
        typedef struct SHA256ContextStr SHA256Context;
        typedef struct {
       }
        mp_int;
        struct SHA256ContextStr {
           union {
              PRUint32 w[64];
          }
       u;
       };
                  static const PRUint32 K256[64] __attribute__((aligned(16))) = {
           0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,     0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,     0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,     0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,     0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,     0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,     0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,     0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,     0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,     0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,     0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,     0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,     0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,     0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,     0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,     0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
          void SHA256_Begin(SHA256Context *ctx) {
           {
              ctx->u.w[27] = ((((ctx->u.w[27 - 2] >> 17) | (ctx->u.w[27 - 2] << ((8 * sizeof ctx->u.w[27 - 2]) - 17))) ^ ((ctx->u.w[27 - 2] >> 19) | (ctx->u.w[27 - 2] << ((8 * sizeof ctx->u.w[27 - 2]) - 19))) ^ (ctx->u.w[27 - 2] >> 10)) + ctx->u.w[27 - 7] + (((ctx->u.w[27 - 15] >> 7) | (ctx->u.w[27 - 15] << ((8 * sizeof ctx->u.w[27 - 15]) - 7))) ^ ((ctx->u.w[27 - 15] >> 18) | (ctx->u.w[27 - 15] << ((8 * sizeof ctx->u.w[27 - 15]) - 18))) ^ (ctx->u.w[27 - 15] >> 3)) + ctx->u.w[27 - 16]);
              ctx->u.w[28] = ((((ctx->u.w[28 - 2] >> 17) | (ctx->u.w[28 - 2] << ((8 * sizeof ctx->u.w[28 - 2]) - 17))) ^ ((ctx->u.w[28 - 2] >> 19) | (ctx->u.w[28 - 2] << ((8 * sizeof ctx->u.w[28 - 2]) - 19))) ^ (ctx->u.w[28 - 2] >> 10)) + ctx->u.w[28 - 7] + (((ctx->u.w[28 - 15] >> 7) | (ctx->u.w[28 - 15] << ((8 * sizeof ctx->u.w[28 - 15]) - 7))) ^ ((ctx->u.w[28 - 15] >> 18) | (ctx->u.w[28 - 15] << ((8 * sizeof ctx->u.w[28 - 15]) - 18))) ^ (ctx->u.w[28 - 15] >> 3)) + ctx->u.w[28 - 16]);
              ctx->u.w[29] = ((((ctx->u.w[29 - 2] >> 17) | (ctx->u.w[29 - 2] << ((8 * sizeof ctx->u.w[29 - 2]) - 17))) ^ ((ctx->u.w[29 - 2] >> 19) | (ctx->u.w[29 - 2] << ((8 * sizeof ctx->u.w[29 - 2]) - 19))) ^ (ctx->u.w[29 - 2] >> 10)) + ctx->u.w[29 - 7] + (((ctx->u.w[29 - 15] >> 7) | (ctx->u.w[29 - 15] << ((8 * sizeof ctx->u.w[29 - 15]) - 7))) ^ ((ctx->u.w[29 - 15] >> 18) | (ctx->u.w[29 - 15] << ((8 * sizeof ctx->u.w[29 - 15]) - 18))) ^ (ctx->u.w[29 - 15] >> 3)) + ctx->u.w[29 - 16]);
              ctx->u.w[30] = ((((ctx->u.w[30 - 2] >> 17) | (ctx->u.w[30 - 2] << ((8 * sizeof ctx->u.w[30 - 2]) - 17))) ^ ((ctx->u.w[30 - 2] >> 19) | (ctx->u.w[30 - 2] << ((8 * sizeof ctx->u.w[30 - 2]) - 19))) ^ (ctx->u.w[30 - 2] >> 10)) + ctx->u.w[30 - 7] + (((ctx->u.w[30 - 15] >> 7) | (ctx->u.w[30 - 15] << ((8 * sizeof ctx->u.w[30 - 15]) - 7))) ^ ((ctx->u.w[30 - 15] >> 18) | (ctx->u.w[30 - 15] << ((8 * sizeof ctx->u.w[30 - 15]) - 18))) ^ (ctx->u.w[30 - 15] >> 3)) + ctx->u.w[30 - 16]);
              ctx->u.w[31] = ((((ctx->u.w[31 - 2] >> 17) | (ctx->u.w[31 - 2] << ((8 * sizeof ctx->u.w[31 - 2]) - 17))) ^ ((ctx->u.w[31 - 2] >> 19) | (ctx->u.w[31 - 2] << ((8 * sizeof ctx->u.w[31 - 2]) - 19))) ^ (ctx->u.w[31 - 2] >> 10)) + ctx->u.w[31 - 7] + (((ctx->u.w[31 - 15] >> 7) | (ctx->u.w[31 - 15] << ((8 * sizeof ctx->u.w[31 - 15]) - 7))) ^ ((ctx->u.w[31 - 15] >> 18) | (ctx->u.w[31 - 15] << ((8 * sizeof ctx->u.w[31 - 15]) - 18))) ^ (ctx->u.w[31 - 15] >> 3)) + ctx->u.w[31 - 16]);
              ctx->u.w[32] = ((((ctx->u.w[32 - 2] >> 17) | (ctx->u.w[32 - 2] << ((8 * sizeof ctx->u.w[32 - 2]) - 17))) ^ ((ctx->u.w[32 - 2] >> 19) | (ctx->u.w[32 - 2] << ((8 * sizeof ctx->u.w[32 - 2]) - 19))) ^ (ctx->u.w[32 - 2] >> 10)) + ctx->u.w[32 - 7] + (((ctx->u.w[32 - 15] >> 7) | (ctx->u.w[32 - 15] << ((8 * sizeof ctx->u.w[32 - 15]) - 7))) ^ ((ctx->u.w[32 - 15] >> 18) | (ctx->u.w[32 - 15] << ((8 * sizeof ctx->u.w[32 - 15]) - 18))) ^ (ctx->u.w[32 - 15] >> 3)) + ctx->u.w[32 - 16]);
              ctx->u.w[33] = ((((ctx->u.w[33 - 2] >> 17) | (ctx->u.w[33 - 2] << ((8 * sizeof ctx->u.w[33 - 2]) - 17))) ^ ((ctx->u.w[33 - 2] >> 19) | (ctx->u.w[33 - 2] << ((8 * sizeof ctx->u.w[33 - 2]) - 19))) ^ (ctx->u.w[33 - 2] >> 10)) + ctx->u.w[33 - 7] + (((ctx->u.w[33 - 15] >> 7) | (ctx->u.w[33 - 15] << ((8 * sizeof ctx->u.w[33 - 15]) - 7))) ^ ((ctx->u.w[33 - 15] >> 18) | (ctx->u.w[33 - 15] << ((8 * sizeof ctx->u.w[33 - 15]) - 18))) ^ (ctx->u.w[33 - 15] >> 3)) + ctx->u.w[33 - 16]);
              ctx->u.w[36] = ((((ctx->u.w[36 - 2] >> 17) | (ctx->u.w[36 - 2] << ((8 * sizeof ctx->u.w[36 - 2]) - 17))) ^ ((ctx->u.w[36 - 2] >> 19) | (ctx->u.w[36 - 2] << ((8 * sizeof ctx->u.w[36 - 2]) - 19))) ^ (ctx->u.w[36 - 2] >> 10)) + ctx->u.w[36 - 7] + (((ctx->u.w[36 - 15] >> 7) | (ctx->u.w[36 - 15] << ((8 * sizeof ctx->u.w[36 - 15]) - 7))) ^ ((ctx->u.w[36 - 15] >> 18) | (ctx->u.w[36 - 15] << ((8 * sizeof ctx->u.w[36 - 15]) - 18))) ^ (ctx->u.w[36 - 15] >> 3)) + ctx->u.w[36 - 16]);
              ctx->u.w[37] = ((((ctx->u.w[37 - 2] >> 17) | (ctx->u.w[37 - 2] << ((8 * sizeof ctx->u.w[37 - 2]) - 17))) ^ ((ctx->u.w[37 - 2] >> 19) | (ctx->u.w[37 - 2] << ((8 * sizeof ctx->u.w[37 - 2]) - 19))) ^ (ctx->u.w[37 - 2] >> 10)) + ctx->u.w[37 - 7] + (((ctx->u.w[37 - 15] >> 7) | (ctx->u.w[37 - 15] << ((8 * sizeof ctx->u.w[37 - 15]) - 7))) ^ ((ctx->u.w[37 - 15] >> 18) | (ctx->u.w[37 - 15] << ((8 * sizeof ctx->u.w[37 - 15]) - 18))) ^ (ctx->u.w[37 - 15] >> 3)) + ctx->u.w[37 - 16]);
              ctx->u.w[38] = ((((ctx->u.w[38 - 2] >> 17) | (ctx->u.w[38 - 2] << ((8 * sizeof ctx->u.w[38 - 2]) - 17))) ^ ((ctx->u.w[38 - 2] >> 19) | (ctx->u.w[38 - 2] << ((8 * sizeof ctx->u.w[38 - 2]) - 19))) ^ (ctx->u.w[38 - 2] >> 10)) + ctx->u.w[38 - 7] + (((ctx->u.w[38 - 15] >> 7) | (ctx->u.w[38 - 15] << ((8 * sizeof ctx->u.w[38 - 15]) - 7))) ^ ((ctx->u.w[38 - 15] >> 18) | (ctx->u.w[38 - 15] << ((8 * sizeof ctx->u.w[38 - 15]) - 18))) ^ (ctx->u.w[38 - 15] >> 3)) + ctx->u.w[38 - 16]);
              ctx->u.w[39] = ((((ctx->u.w[39 - 2] >> 17) | (ctx->u.w[39 - 2] << ((8 * sizeof ctx->u.w[39 - 2]) - 17))) ^ ((ctx->u.w[39 - 2] >> 19) | (ctx->u.w[39 - 2] << ((8 * sizeof ctx->u.w[39 - 2]) - 19))) ^ (ctx->u.w[39 - 2] >> 10)) + ctx->u.w[39 - 7] + (((ctx->u.w[39 - 15] >> 7) | (ctx->u.w[39 - 15] << ((8 * sizeof ctx->u.w[39 - 15]) - 7))) ^ ((ctx->u.w[39 - 15] >> 18) | (ctx->u.w[39 - 15] << ((8 * sizeof ctx->u.w[39 - 15]) - 18))) ^ (ctx->u.w[39 - 15] >> 3)) + ctx->u.w[39 - 16]);
              ctx->u.w[40] = ((((ctx->u.w[40 - 2] >> 17) | (ctx->u.w[40 - 2] << ((8 * sizeof ctx->u.w[40 - 2]) - 17))) ^ ((ctx->u.w[40 - 2] >> 19) | (ctx->u.w[40 - 2] << ((8 * sizeof ctx->u.w[40 - 2]) - 19))) ^ (ctx->u.w[40 - 2] >> 10)) + ctx->u.w[40 - 7] + (((ctx->u.w[40 - 15] >> 7) | (ctx->u.w[40 - 15] << ((8 * sizeof ctx->u.w[40 - 15]) - 7))) ^ ((ctx->u.w[40 - 15] >> 18) | (ctx->u.w[40 - 15] << ((8 * sizeof ctx->u.w[40 - 15]) - 18))) ^ (ctx->u.w[40 - 15] >> 3)) + ctx->u.w[40 - 16]);
              ctx->u.w[41] = ((((ctx->u.w[41 - 2] >> 17) | (ctx->u.w[41 - 2] << ((8 * sizeof ctx->u.w[41 - 2]) - 17))) ^ ((ctx->u.w[41 - 2] >> 19) | (ctx->u.w[41 - 2] << ((8 * sizeof ctx->u.w[41 - 2]) - 19))) ^ (ctx->u.w[41 - 2] >> 10)) + ctx->u.w[41 - 7] + (((ctx->u.w[41 - 15] >> 7) | (ctx->u.w[41 - 15] << ((8 * sizeof ctx->u.w[41 - 15]) - 7))) ^ ((ctx->u.w[41 - 15] >> 18) | (ctx->u.w[41 - 15] << ((8 * sizeof ctx->u.w[41 - 15]) - 18))) ^ (ctx->u.w[41 - 15] >> 3)) + ctx->u.w[41 - 16]);
              ctx->u.w[43] = ((((ctx->u.w[43 - 2] >> 17) | (ctx->u.w[43 - 2] << ((8 * sizeof ctx->u.w[43 - 2]) - 17))) ^ ((ctx->u.w[43 - 2] >> 19) | (ctx->u.w[43 - 2] << ((8 * sizeof ctx->u.w[43 - 2]) - 19))) ^ (ctx->u.w[43 - 2] >> 10)) + ctx->u.w[43 - 7] + (((ctx->u.w[43 - 15] >> 7) | (ctx->u.w[43 - 15] << ((8 * sizeof ctx->u.w[43 - 15]) - 7))) ^ ((ctx->u.w[43 - 15] >> 18) | (ctx->u.w[43 - 15] << ((8 * sizeof ctx->u.w[43 - 15]) - 18))) ^ (ctx->u.w[43 - 15] >> 3)) + ctx->u.w[43 - 16]);
              ctx->u.w[45] = ((((ctx->u.w[45 - 2] >> 17) | (ctx->u.w[45 - 2] << ((8 * sizeof ctx->u.w[45 - 2]) - 17))) ^ ((ctx->u.w[45 - 2] >> 19) | (ctx->u.w[45 - 2] << ((8 * sizeof ctx->u.w[45 - 2]) - 19))) ^ (ctx->u.w[45 - 2] >> 10)) + ctx->u.w[45 - 7] + (((ctx->u.w[45 - 15] >> 7) | (ctx->u.w[45 - 15] << ((8 * sizeof ctx->u.w[45 - 15]) - 7))) ^ ((ctx->u.w[45 - 15] >> 18) | (ctx->u.w[45 - 15] << ((8 * sizeof ctx->u.w[45 - 15]) - 18))) ^ (ctx->u.w[45 - 15] >> 3)) + ctx->u.w[45 - 16]);
              ctx->u.w[46] = ((((ctx->u.w[46 - 2] >> 17) | (ctx->u.w[46 - 2] << ((8 * sizeof ctx->u.w[46 - 2]) - 17))) ^ ((ctx->u.w[46 - 2] >> 19) | (ctx->u.w[46 - 2] << ((8 * sizeof ctx->u.w[46 - 2]) - 19))) ^ (ctx->u.w[46 - 2] >> 10)) + ctx->u.w[46 - 7] + (((ctx->u.w[46 - 15] >> 7) | (ctx->u.w[46 - 15] << ((8 * sizeof ctx->u.w[46 - 15]) - 7))) ^ ((ctx->u.w[46 - 15] >> 18) | (ctx->u.w[46 - 15] << ((8 * sizeof ctx->u.w[46 - 15]) - 18))) ^ (ctx->u.w[46 - 15] >> 3)) + ctx->u.w[46 - 16]);
              ctx->u.w[47] = ((((ctx->u.w[47 - 2] >> 17) | (ctx->u.w[47 - 2] << ((8 * sizeof ctx->u.w[47 - 2]) - 17))) ^ ((ctx->u.w[47 - 2] >> 19) | (ctx->u.w[47 - 2] << ((8 * sizeof ctx->u.w[47 - 2]) - 19))) ^ (ctx->u.w[47 - 2] >> 10)) + ctx->u.w[47 - 7] + (((ctx->u.w[47 - 15] >> 7) | (ctx->u.w[47 - 15] << ((8 * sizeof ctx->u.w[47 - 15]) - 7))) ^ ((ctx->u.w[47 - 15] >> 18) | (ctx->u.w[47 - 15] << ((8 * sizeof ctx->u.w[47 - 15]) - 18))) ^ (ctx->u.w[47 - 15] >> 3)) + ctx->u.w[47 - 16]);
              ctx->u.w[48] = ((((ctx->u.w[48 - 2] >> 17) | (ctx->u.w[48 - 2] << ((8 * sizeof ctx->u.w[48 - 2]) - 17))) ^ ((ctx->u.w[48 - 2] >> 19) | (ctx->u.w[48 - 2] << ((8 * sizeof ctx->u.w[48 - 2]) - 19))) ^ (ctx->u.w[48 - 2] >> 10)) + ctx->u.w[48 - 7] + (((ctx->u.w[48 - 15] >> 7) | (ctx->u.w[48 - 15] << ((8 * sizeof ctx->u.w[48 - 15]) - 7))) ^ ((ctx->u.w[48 - 15] >> 18) | (ctx->u.w[48 - 15] << ((8 * sizeof ctx->u.w[48 - 15]) - 18))) ^ (ctx->u.w[48 - 15] >> 3)) + ctx->u.w[48 - 16]);
              ctx->u.w[49] = ((((ctx->u.w[49 - 2] >> 17) | (ctx->u.w[49 - 2] << ((8 * sizeof ctx->u.w[49 - 2]) - 17))) ^ ((ctx->u.w[49 - 2] >> 19) | (ctx->u.w[49 - 2] << ((8 * sizeof ctx->u.w[49 - 2]) - 19))) ^ (ctx->u.w[49 - 2] >> 10)) + ctx->u.w[49 - 7] + (((ctx->u.w[49 - 15] >> 7) | (ctx->u.w[49 - 15] << ((8 * sizeof ctx->u.w[49 - 15]) - 7))) ^ ((ctx->u.w[49 - 15] >> 18) | (ctx->u.w[49 - 15] << ((8 * sizeof ctx->u.w[49 - 15]) - 18))) ^ (ctx->u.w[49 - 15] >> 3)) + ctx->u.w[49 - 16]);
              ctx->u.w[50] = ((((ctx->u.w[50 - 2] >> 17) | (ctx->u.w[50 - 2] << ((8 * sizeof ctx->u.w[50 - 2]) - 17))) ^ ((ctx->u.w[50 - 2] >> 19) | (ctx->u.w[50 - 2] << ((8 * sizeof ctx->u.w[50 - 2]) - 19))) ^ (ctx->u.w[50 - 2] >> 10)) + ctx->u.w[50 - 7] + (((ctx->u.w[50 - 15] >> 7) | (ctx->u.w[50 - 15] << ((8 * sizeof ctx->u.w[50 - 15]) - 7))) ^ ((ctx->u.w[50 - 15] >> 18) | (ctx->u.w[50 - 15] << ((8 * sizeof ctx->u.w[50 - 15]) - 18))) ^ (ctx->u.w[50 - 15] >> 3)) + ctx->u.w[50 - 16]);
              ctx->u.w[51] = ((((ctx->u.w[51 - 2] >> 17) | (ctx->u.w[51 - 2] << ((8 * sizeof ctx->u.w[51 - 2]) - 17))) ^ ((ctx->u.w[51 - 2] >> 19) | (ctx->u.w[51 - 2] << ((8 * sizeof ctx->u.w[51 - 2]) - 19))) ^ (ctx->u.w[51 - 2] >> 10)) + ctx->u.w[51 - 7] + (((ctx->u.w[51 - 15] >> 7) | (ctx->u.w[51 - 15] << ((8 * sizeof ctx->u.w[51 - 15]) - 7))) ^ ((ctx->u.w[51 - 15] >> 18) | (ctx->u.w[51 - 15] << ((8 * sizeof ctx->u.w[51 - 15]) - 18))) ^ (ctx->u.w[51 - 15] >> 3)) + ctx->u.w[51 - 16]);
              ctx->u.w[52] = ((((ctx->u.w[52 - 2] >> 17) | (ctx->u.w[52 - 2] << ((8 * sizeof ctx->u.w[52 - 2]) - 17))) ^ ((ctx->u.w[52 - 2] >> 19) | (ctx->u.w[52 - 2] << ((8 * sizeof ctx->u.w[52 - 2]) - 19))) ^ (ctx->u.w[52 - 2] >> 10)) + ctx->u.w[52 - 7] + (((ctx->u.w[52 - 15] >> 7) | (ctx->u.w[52 - 15] << ((8 * sizeof ctx->u.w[52 - 15]) - 7))) ^ ((ctx->u.w[52 - 15] >> 18) | (ctx->u.w[52 - 15] << ((8 * sizeof ctx->u.w[52 - 15]) - 18))) ^ (ctx->u.w[52 - 15] >> 3)) + ctx->u.w[52 - 16]);
              ctx->u.w[53] = ((((ctx->u.w[53 - 2] >> 17) | (ctx->u.w[53 - 2] << ((8 * sizeof ctx->u.w[53 - 2]) - 17))) ^ ((ctx->u.w[53 - 2] >> 19) | (ctx->u.w[53 - 2] << ((8 * sizeof ctx->u.w[53 - 2]) - 19))) ^ (ctx->u.w[53 - 2] >> 10)) + ctx->u.w[53 - 7] + (((ctx->u.w[53 - 15] >> 7) | (ctx->u.w[53 - 15] << ((8 * sizeof ctx->u.w[53 - 15]) - 7))) ^ ((ctx->u.w[53 - 15] >> 18) | (ctx->u.w[53 - 15] << ((8 * sizeof ctx->u.w[53 - 15]) - 18))) ^ (ctx->u.w[53 - 15] >> 3)) + ctx->u.w[53 - 16]);
              ctx->u.w[54] = ((((ctx->u.w[54 - 2] >> 17) | (ctx->u.w[54 - 2] << ((8 * sizeof ctx->u.w[54 - 2]) - 17))) ^ ((ctx->u.w[54 - 2] >> 19) | (ctx->u.w[54 - 2] << ((8 * sizeof ctx->u.w[54 - 2]) - 19))) ^ (ctx->u.w[54 - 2] >> 10)) + ctx->u.w[54 - 7] + (((ctx->u.w[54 - 15] >> 7) | (ctx->u.w[54 - 15] << ((8 * sizeof ctx->u.w[54 - 15]) - 7))) ^ ((ctx->u.w[54 - 15] >> 18) | (ctx->u.w[54 - 15] << ((8 * sizeof ctx->u.w[54 - 15]) - 18))) ^ (ctx->u.w[54 - 15] >> 3)) + ctx->u.w[54 - 16]);
              ctx->u.w[55] = ((((ctx->u.w[55 - 2] >> 17) | (ctx->u.w[55 - 2] << ((8 * sizeof ctx->u.w[55 - 2]) - 17))) ^ ((ctx->u.w[55 - 2] >> 19) | (ctx->u.w[55 - 2] << ((8 * sizeof ctx->u.w[55 - 2]) - 19))) ^ (ctx->u.w[55 - 2] >> 10)) + ctx->u.w[55 - 7] + (((ctx->u.w[55 - 15] >> 7) | (ctx->u.w[55 - 15] << ((8 * sizeof ctx->u.w[55 - 15]) - 7))) ^ ((ctx->u.w[55 - 15] >> 18) | (ctx->u.w[55 - 15] << ((8 * sizeof ctx->u.w[55 - 15]) - 18))) ^ (ctx->u.w[55 - 15] >> 3)) + ctx->u.w[55 - 16]);
              ctx->u.w[56] = ((((ctx->u.w[56 - 2] >> 17) | (ctx->u.w[56 - 2] << ((8 * sizeof ctx->u.w[56 - 2]) - 17))) ^ ((ctx->u.w[56 - 2] >> 19) | (ctx->u.w[56 - 2] << ((8 * sizeof ctx->u.w[56 - 2]) - 19))) ^ (ctx->u.w[56 - 2] >> 10)) + ctx->u.w[56 - 7] + (((ctx->u.w[56 - 15] >> 7) | (ctx->u.w[56 - 15] << ((8 * sizeof ctx->u.w[56 - 15]) - 7))) ^ ((ctx->u.w[56 - 15] >> 18) | (ctx->u.w[56 - 15] << ((8 * sizeof ctx->u.w[56 - 15]) - 18))) ^ (ctx->u.w[56 - 15] >> 3)) + ctx->u.w[56 - 16]);
              ctx->u.w[57] = ((((ctx->u.w[57 - 2] >> 17) | (ctx->u.w[57 - 2] << ((8 * sizeof ctx->u.w[57 - 2]) - 17))) ^ ((ctx->u.w[57 - 2] >> 19) | (ctx->u.w[57 - 2] << ((8 * sizeof ctx->u.w[57 - 2]) - 19))) ^ (ctx->u.w[57 - 2] >> 10)) + ctx->u.w[57 - 7] + (((ctx->u.w[57 - 15] >> 7) | (ctx->u.w[57 - 15] << ((8 * sizeof ctx->u.w[57 - 15]) - 7))) ^ ((ctx->u.w[57 - 15] >> 18) | (ctx->u.w[57 - 15] << ((8 * sizeof ctx->u.w[57 - 15]) - 18))) ^ (ctx->u.w[57 - 15] >> 3)) + ctx->u.w[57 - 16]);
              ctx->u.w[58] = ((((ctx->u.w[58 - 2] >> 17) | (ctx->u.w[58 - 2] << ((8 * sizeof ctx->u.w[58 - 2]) - 17))) ^ ((ctx->u.w[58 - 2] >> 19) | (ctx->u.w[58 - 2] << ((8 * sizeof ctx->u.w[58 - 2]) - 19))) ^ (ctx->u.w[58 - 2] >> 10)) + ctx->u.w[58 - 7] + (((ctx->u.w[58 - 15] >> 7) | (ctx->u.w[58 - 15] << ((8 * sizeof ctx->u.w[58 - 15]) - 7))) ^ ((ctx->u.w[58 - 15] >> 18) | (ctx->u.w[58 - 15] << ((8 * sizeof ctx->u.w[58 - 15]) - 18))) ^ (ctx->u.w[58 - 15] >> 3)) + ctx->u.w[58 - 16]);
              ctx->u.w[59] = ((((ctx->u.w[59 - 2] >> 17) | (ctx->u.w[59 - 2] << ((8 * sizeof ctx->u.w[59 - 2]) - 17))) ^ ((ctx->u.w[59 - 2] >> 19) | (ctx->u.w[59 - 2] << ((8 * sizeof ctx->u.w[59 - 2]) - 19))) ^ (ctx->u.w[59 - 2] >> 10)) + ctx->u.w[59 - 7] + (((ctx->u.w[59 - 15] >> 7) | (ctx->u.w[59 - 15] << ((8 * sizeof ctx->u.w[59 - 15]) - 7))) ^ ((ctx->u.w[59 - 15] >> 18) | (ctx->u.w[59 - 15] << ((8 * sizeof ctx->u.w[59 - 15]) - 18))) ^ (ctx->u.w[59 - 15] >> 3)) + ctx->u.w[59 - 16]);
              ctx->u.w[60] = ((((ctx->u.w[60 - 2] >> 17) | (ctx->u.w[60 - 2] << ((8 * sizeof ctx->u.w[60 - 2]) - 17))) ^ ((ctx->u.w[60 - 2] >> 19) | (ctx->u.w[60 - 2] << ((8 * sizeof ctx->u.w[60 - 2]) - 19))) ^ (ctx->u.w[60 - 2] >> 10)) + ctx->u.w[60 - 7] + (((ctx->u.w[60 - 15] >> 7) | (ctx->u.w[60 - 15] << ((8 * sizeof ctx->u.w[60 - 15]) - 7))) ^ ((ctx->u.w[60 - 15] >> 18) | (ctx->u.w[60 - 15] << ((8 * sizeof ctx->u.w[60 - 15]) - 18))) ^ (ctx->u.w[60 - 15] >> 3)) + ctx->u.w[60 - 16]);
          }
       }
  1. Using llvm-reduce by following https://discourse.llvm.org/t/suggestions-to-debug-a-forever-running-clang-compile-command/60420/6
$ clang-17 -O2 -c sha512.c.i -march=skylake -emit-llvm -S
$ llc sha512.c.ll # confirm this hangs
$ llvm-reduce --test=test2.sh sha512.c.ll

with

#!/usr/bin/env bash
timeout 30s llc "$@"

ret=$?

case ${ret} in
        0)
                # If it built fine, it's uninteresting.
                exit 1
                ;;
        124)
                # It timed out, yay!
                exit 0
                ;;
        *)
                # It failed in some way but not a timeout, not interesting.
                exit 1
                ;;
esac

This kind of worked. The result takes ~30s for llc on a fast machine w/ llvm 17 (takes 0.049s w/ llvm 16) which I think is illustrative enough and should show that it'll take orders of magnitude longer for the full thing. I know it's not ideal though.

; ModuleID = '<bc file>'
source_filename = "sha512.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: nounwind sspstrong memory(argmem: readwrite) uwtable
define void @SHA256_Compress_Generic(ptr noundef %ctx) #1 {
entry:
  %0 = load i32, ptr null, align 4
  %1 = tail call i32 asm "bswap $0", "=r,0,~{dirflag},~{fpsr},~{flags}"(i32 %0) #5
  %arrayidx14 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 3
  %2 = load i32, ptr %arrayidx14, align 4
  %3 = tail call i32 asm "bswap $0", "=r,0,~{dirflag},~{fpsr},~{flags}"(i32 %2) #5
  %4 = insertelement <2 x i32> zeroinitializer, i32 %1, i64 1
  %5 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %4, <2 x i32> %4, <2 x i32> <i32 15, i32 15>)
  %6 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %4, <2 x i32> %4, <2 x i32> <i32 13, i32 13>)
  %7 = xor <2 x i32> %5, %6
  %8 = lshr <2 x i32> %4, zeroinitializer
  %9 = xor <2 x i32> %7, %8
  %10 = insertelement <2 x i32> zeroinitializer, i32 %3, i64 0
  %11 = shufflevector <2 x i32> zeroinitializer, <2 x i32> %10, <2 x i32> <i32 1, i32 2>
  %12 = add <2 x i32> %11, %9
  %13 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %12, <2 x i32> %12, <2 x i32> <i32 15, i32 15>)
  %14 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %12, <2 x i32> %12, <2 x i32> <i32 13, i32 13>)
  %15 = xor <2 x i32> %13, %14
  %16 = lshr <2 x i32> %12, zeroinitializer
  %17 = xor <2 x i32> %15, %16
  %18 = add <2 x i32> %4, %17
  %19 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %18, <2 x i32> %18, <2 x i32> <i32 15, i32 15>)
  %20 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %18, <2 x i32> %18, <2 x i32> <i32 13, i32 13>)
  %21 = xor <2 x i32> %19, %20
  %22 = lshr <2 x i32> %18, <i32 10, i32 10>
  %23 = xor <2 x i32> %21, %22
  %24 = add <2 x i32> %4, %23
  %25 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %24, <2 x i32> %24, <2 x i32> <i32 15, i32 15>)
  %26 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %24, <2 x i32> %24, <2 x i32> <i32 13, i32 13>)
  %27 = xor <2 x i32> %25, %26
  %28 = lshr <2 x i32> %24, <i32 10, i32 10>
  %29 = xor <2 x i32> %27, %28
  %30 = shufflevector <2 x i32> %4, <2 x i32> %12, <2 x i32> <i32 1, i32 2>
  %31 = add <2 x i32> %30, %29
  %32 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %31, <2 x i32> %31, <2 x i32> <i32 15, i32 15>)
  %33 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %31, <2 x i32> %31, <2 x i32> <i32 13, i32 13>)
  %34 = xor <2 x i32> %32, %33
  %35 = lshr <2 x i32> %31, <i32 10, i32 10>
  %36 = xor <2 x i32> %34, %35
  %37 = shufflevector <2 x i32> %12, <2 x i32> zeroinitializer, <2 x i32> <i32 1, i32 2>
  %38 = add <2 x i32> %37, %36
  %arrayidx918 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 33
  store <2 x i32> %38, ptr %arrayidx918, align 4
  %arrayidx1012 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 35
  %39 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %38, <2 x i32> %38, <2 x i32> <i32 15, i32 15>)
  %40 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %38, <2 x i32> %38, <2 x i32> <i32 13, i32 13>)
  %41 = xor <2 x i32> %39, %40
  %42 = lshr <2 x i32> %38, <i32 10, i32 10>
  %43 = xor <2 x i32> %41, %42
  %44 = add <2 x i32> %37, %43
  store <2 x i32> zeroinitializer, ptr %arrayidx1012, align 4
  %arrayidx1106 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 37
  %45 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %44, <2 x i32> %44, <2 x i32> <i32 15, i32 15>)
  %46 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %44, <2 x i32> %44, <2 x i32> <i32 13, i32 13>)
  %47 = xor <2 x i32> %45, %46
  %48 = lshr <2 x i32> %44, <i32 10, i32 10>
  %49 = xor <2 x i32> %47, %48
  %50 = lshr <2 x i32> %24, zeroinitializer
  %51 = add <2 x i32> %50, %49
  store <2 x i32> %51, ptr %arrayidx1106, align 4
  %arrayidx1200 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 39
  %52 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %51, <2 x i32> %51, <2 x i32> <i32 15, i32 15>)
  %53 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %51, <2 x i32> %51, <2 x i32> <i32 13, i32 13>)
  %54 = xor <2 x i32> %52, %53
  %55 = lshr <2 x i32> %51, <i32 10, i32 10>
  %56 = xor <2 x i32> %54, %55
  %57 = shufflevector <2 x i32> %38, <2 x i32> zeroinitializer, <2 x i32> <i32 poison, i32 0>
  %58 = insertelement <2 x i32> %57, i32 0, i64 0
  %59 = add <2 x i32> %58, %56
  store <2 x i32> %59, ptr %arrayidx1200, align 4
  ret void

; uselistorder directives
  uselistorder <2 x i32> %4, { 7, 0, 1, 6, 5, 4, 3, 2 }
  uselistorder <2 x i32> %38, { 6, 5, 4, 3, 2, 1, 0 }
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.bswap.i64(i64) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.fshl.i32(i32, i32, i32) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.umin.i32(i32, i32) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.fshl.i64(i64, i64, i64) #2

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #3

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #4

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i32> @llvm.fshl.v2i32(<2 x i32>, <2 x i32>, <2 x i32>) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i64> @llvm.fshl.v2i64(<2 x i64>, <2 x i64>, <2 x i64>) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>) #2

; uselistorder directives
uselistorder ptr @llvm.fshl.v2i32, { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #1 = { nounwind sspstrong memory(argmem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="inline-asm" "stack-protector-buffer-size"="8" "target-cpu"="skylake" "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sgx,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #5 = { nounwind memory(none) }

@thesamesam thesamesam removed the needs-reduction Large reproducer that should be reduced into a simpler form label Sep 26, 2023
@thesamesam
Copy link
Member

cc @RKSimon

Bisect says af32e51:

af32e51a43fb4343f4c407bf1ee051ff78a57494 is the first bad commit
commit af32e51a43fb4343f4c407bf1ee051ff78a57494
Author: Simon Pilgrim <[email protected]>
Date:   Sat Jul 22 17:54:48 2023 +0100

    [X86] LowerRotate - manually expand rotate by splat constant patterns.

    Fixes issue identified on #63980 where the undef rotate amounts (during widening from v2i32 -> v4i32) were being constant folded to 0 when the shift amounts are created during expansion, losing the splat'd shift amounts.

 llvm/lib/Target/X86/X86ISelLowering.cpp         | 14 +++++++++++--
 llvm/test/CodeGen/X86/vector-fshl-rot-sub128.ll | 27 +++++++++----------------
 llvm/test/CodeGen/X86/vector-fshr-rot-sub128.ll | 27 +++++++++----------------
 3 files changed, 32 insertions(+), 36 deletions(-)
bisect found first bad commit

If I revert it on release/17.x, I get decent performance (although seemingly consistently a bit slower by half a second or so than clang 16).

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2023

@llvm/issue-subscribers-backend-x86

When using clang 17.0.1 in Gentoo and compiling [dev-libs/nss](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS) an infinite loop is detected in sha512.c compilation unit (reported [downstream Gentoo bug report](https://bugs.gentoo.org/914657)) which is not reproducible using Clang 16.

An initial testing revealed that march=native (also tested march=skylake and march=alderlake triggering the problem) might be causing the trouble.

In order to ease the test process, I attach the preprocessed output for that compile unit which I expect can reproduce this issue easily (remove the .txt extension)

sha512-preprocessed.c.txt

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 1, 2023

Finally worked this out - we're missing a oneuse limit in combineConcatVectorOps - fix incoming

@RKSimon RKSimon closed this as completed in 04b403d Oct 1, 2023
@RKSimon RKSimon reopened this Oct 1, 2023
@Endilll Endilll removed the clang Clang issues not falling into any other category label Oct 2, 2023
@thesamesam
Copy link
Member

/cherry-pick 04b403d

@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2023

Failed to cherry-pick: 04b403d

https://github.com/llvm/llvm-project/actions/runs/6389691511

Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:

/branch <user>/<repo>/<branch>

@thesamesam
Copy link
Member

@RKSimon Could you handle the backport? Cheers.

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 4, 2023

/branch RKSimon/llvm-project/PR67333

@llvmbot
Copy link
Member

llvmbot commented Oct 4, 2023

/pull-request llvm/llvm-project-release-prs#724

@tru tru moved this from Needs Triage to Needs Review in LLVM Release Status Oct 5, 2023
tru pushed a commit that referenced this issue Oct 10, 2023
We could maybe extend this by allowing the lowest subop to have multiple uses and extract the lowest subvector result of the concatenated op, but let's just get the fix in first.

Fixes #67333
@tru tru moved this from Needs Review to Done in LLVM Release Status Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 hang Compiler hang (infinite loop) regression
Projects
Development

Successfully merging a pull request may close this issue.

6 participants