[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333

StormBytePP · 2023-09-25T14:02:06Z

When using clang 17.0.1 in Gentoo and compiling dev-libs/nss an infinite loop is detected in sha512.c compilation unit (reported downstream Gentoo bug report) which is not reproducible using Clang 16.

An initial testing revealed that march=native (also tested march=skylake and march=alderlake triggering the problem) might be causing the trouble.

In order to ease the test process, I attach the preprocessed output for that compile unit which I expect can reproduce this issue easily (remove the .txt extension)

sha512-preprocessed.c.txt

thesamesam · 2023-09-25T19:34:58Z

I can reproduce it with:

clang -O2 -march=skylake sha512.i

If I attach gdb to the process after it's been running for a little while:

0x00007f54f0f4cab2 in llvm::APInt::maskBit (bitPosition=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1900
1900        return 1ULL << whichBit(bitPosition);
(gdb) bt
#0  0x00007f54f0f4cab2 in llvm::APInt::maskBit (bitPosition=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1900
#1  llvm::APInt::operator[] (bitPosition=<optimized out>, this=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:1017
#2  llvm::APInt::isSignBitSet (this=0x7ffefaa604e0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/APInt.h:319
#3  llvm::KnownBits::isNegative (this=0x7ffefaa604d0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/Support/KnownBits.h:96
#4  llvm::KnownBits::computeForAddSub (Add=<optimized out>, NSW=NSW@entry=false, LHS=..., RHS=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Support/KnownBits.cpp:72
#5  0x00007f54f1c5bfa8 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=2, AssumeSingleUse=false)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:2740
#6  0x00007f54f1c5e41d in llvm::TargetLowering::SimplifyDemandedBits (this=this@entry=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=1,
    AssumeSingleUse=false) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1225
#7  0x00007f54f502dc6d in llvm::X86TargetLowering::SimplifyDemandedBitsForTargetNode (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=0)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Target/X86/X86ISelLowering.cpp:44365
#8  0x00007f54f1c60533 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., OriginalDemandedBits=..., OriginalDemandedElts=..., Known=..., TLO=..., Depth=0, AssumeSingleUse=false)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:2754
#9  0x00007f54f1c6bf57 in llvm::TargetLowering::SimplifyDemandedBits (this=0x565551705ff0, Op=..., DemandedBits=..., Known=..., TLO=..., Depth=0, AssumeSingleUse=false)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:646
#10 0x00007f54f1c6c0f2 in llvm::TargetLowering::SimplifyDemandedBits (this=this@entry=0x565551705ff0, Op=..., DemandedBits=..., DCI=...)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:608
#11 0x00007f54f5135377 in combineVectorShiftImm (N=<optimized out>, DAG=..., DCI=..., Subtarget=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:1139
#12 0x00007f54f19d47c4 in (anonymous namespace)::DAGCombiner::combine (this=this@entry=0x7ffefaa63130, N=N@entry=0x56555191d900)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:2049
#13 0x00007f54f19d62f9 in (anonymous namespace)::DAGCombiner::Run (AtLevel=llvm::AfterLegalizeDAG, this=0x7ffefaa63130)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1827
#14 llvm::SelectionDAG::Combine (this=<optimized out>, Level=Level@entry=llvm::AfterLegalizeDAG, AA=<optimized out>, OptLevel=<optimized out>)
    at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:27592
#15 0x00007f54f1bfbb63 in llvm::SelectionDAGISel::CodeGenAndEmitDAG (this=0x565551829cb0) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:925
#16 0x00007f54f1c00fc0 in llvm::SelectionDAGISel::SelectAllBasicBlocks (this=0x565551829cb0, Fn=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1700
#17 0x00007f54f1c027d6 in llvm::SelectionDAGISel::runOnMachineFunction (this=this@entry=0x565551829cb0, mf=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:482
#18 0x00007f54f4fd4619 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction (this=0x565551829cb0, MF=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:191
#19 0x00007f54f1547cd4 in llvm::MachineFunctionPass::runOnFunction (this=0x565551829cb0, F=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/CodeGen/MachineFunctionPass.cpp:91
#20 0x00007f54f11bdb13 in llvm::FPPassManager::runOnFunction (this=0x565551826a40, F=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1435
#21 0x00007f54f11bdd51 in llvm::FPPassManager::runOnModule (this=0x565551826a40, M=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1481
#22 0x00007f54f11be7f4 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=<optimized out>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:1550
#23 llvm::legacy::PassManagerImpl::run (this=0x5655517a7120, M=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/IR/LegacyPassManager.cpp:535
#24 0x00007f54fb97774a in (anonymous namespace)::EmitAssemblyHelper::RunCodegenPipeline (DwoOS=<synthetic pointer>std::unique_ptr<llvm::ToolOutputFile> = {...},
    OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}, Action=clang::Backend_EmitObj, this=0x7ffefaa64b60) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1115
#25 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}, Action=clang::Backend_EmitObj, this=0x7ffefaa64b60)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1137
#26 clang::EmitBackendOutput (Diags=..., HeaderOpts=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=M@entry=0x5655513bb820, Action=clang::Backend_EmitObj, VFS=...,
    OS=std::unique_ptr<llvm::raw_pwrite_stream> = {...}) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/BackendUtil.cpp:1299
#27 0x00007f54fbe13f15 in clang::BackendConsumer::HandleTranslationUnit (this=0x5655513b64d0, C=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/CodeGen/CodeGenAction.cpp:386
#28 0x00007f54fa41aa55 in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=<optimized out>) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Parse/ParseAST.cpp:176
#29 0x00007f54fca51be9 in clang::FrontendAction::Execute (this=this@entry=0x5655513b6b60) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Frontend/FrontendAction.cpp:1059
#30 0x00007f54fc9de17b in clang::CompilerInstance::ExecuteAction (this=this@entry=0x5655513ada00, Act=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Frontend/CompilerInstance.cpp:1053
#31 0x00007f54fcaebb2b in clang::ExecuteCompilerInvocation (Clang=Clang@entry=0x5655513ada00) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:272
#32 0x0000565550c63035 in cc1_main (Argv=..., Argv0=0x5655513a4f70 "/usr/lib/llvm/17/bin/clang-17", MainAddr=MainAddr@entry=0x565550c5c270 <GetExecutablePath[abi:cxx11](char const*, bool)>)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/cc1_main.cpp:249
#33 0x0000565550c5bcab in ExecuteCC1Tool (ArgV=..., ToolContext=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/driver.cpp:366
#34 0x00007f54fc5d826d in llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::operator()(llvm::SmallVectorImpl<char const*>&) const (params#0=..., this=<optimized out>)
    at /usr/lib/llvm/17/include/llvm/ADT/STLFunctionalExtras.h:68
#35 operator() (__closure=0x7ffefaa65f40) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Job.cpp:440
#36 llvm::function_ref<void()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef> >, std::string*, bool*) const::<lambda()> >(intptr_t) (
    callable=callable@entry=140733103628048) at /usr/lib/llvm/17/include/llvm/ADT/STLFunctionalExtras.h:45
#37 0x00007f54f0f19b5e in llvm::function_ref<void ()>::operator()() const (this=<synthetic pointer>) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
#38 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (this=this@entry=0x7ffefaa65ef0, Fn=...) at /usr/src/debug/sys-devel/llvm-17.0.1/llvm/lib/Support/CrashRecoveryContext.cpp:426
#39 0x00007f54fc5dac70 in clang::driver::CC1Command::Execute (this=0x565551345560, Redirects=..., ErrMsg=<optimized out>, ExecutionFailed=<optimized out>)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Job.cpp:440
#40 0x00007f54fc598def in clang::driver::Compilation::ExecuteCommand (this=0x5655513a7550, C=..., FailingCommand=@0x7ffefaa66470: 0x0, LogOnly=<optimized out>)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Compilation.cpp:199
#41 0x00007f54fc5995f6 in clang::driver::Compilation::ExecuteJobs (this=this@entry=0x5655513a7550, Jobs=..., FailingCommands=..., LogOnly=LogOnly@entry=false)
    at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Compilation.cpp:253
#42 0x00007f54fc5a87a4 in clang::driver::Driver::ExecuteCompilation (this=this@entry=0x7ffefaa668f0, C=..., FailingCommands=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/lib/Driver/Driver.cpp:1903
#43 0x0000565550c600b0 in clang_main (Argc=<optimized out>, Argv=<optimized out>, ToolContext=...) at /usr/src/debug/sys-devel/clang-17.0.1/clang/tools/driver/driver.cpp:542
#44 0x0000565550c59cc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/sys-devel/clang-17.0.1/x/y/clang-abi_x86_64.amd64/tools/driver/clang-driver.cpp:15

thesamesam · 2023-09-26T01:52:01Z

I've reduced it two ways.

Using cvise on the C reproducer:

#!/bin/sh
set -x

# The clang-16 one should build fine (and quickly, but don't bother checking that yet).
clang-16 -O2 -Werror=return-type -march=skylake -c sha512.c.i -S -o /dev/null || exit 1

timeout 45s clang-17 -O2 -Werror=return-type -march=skylake -c sha512.c.i -S -o /dev/null
ret=$?

case ${ret} in
        0)
                # If it built fine, it's uninteresting.
                exit 1
                ;;
        124)
                # It timed out, yay!
                exit 0
                ;;
        *)
                # It failed in some way but not a timeout, not interesting.
                exit 1
                ;;
esac

This gives the following which takes 1m5s w/ clang 17 on a fast machine (it completes in 0.087s w/ clang 16) which is hopefully representative enough:

    typedef unsigned int PRUint32;
        typedef struct SHA256ContextStr SHA256Context;
        typedef struct {
       }
        mp_int;
        struct SHA256ContextStr {
           union {
              PRUint32 w[64];
          }
       u;
       };
                  static const PRUint32 K256[64] __attribute__((aligned(16))) = {
           0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,     0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,     0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,     0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,     0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,     0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,     0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,     0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,     0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,     0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,     0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,     0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,     0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,     0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,     0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,     0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
          void SHA256_Begin(SHA256Context *ctx) {
           {
              ctx->u.w[27] = ((((ctx->u.w[27 - 2] >> 17) | (ctx->u.w[27 - 2] << ((8 * sizeof ctx->u.w[27 - 2]) - 17))) ^ ((ctx->u.w[27 - 2] >> 19) | (ctx->u.w[27 - 2] << ((8 * sizeof ctx->u.w[27 - 2]) - 19))) ^ (ctx->u.w[27 - 2] >> 10)) + ctx->u.w[27 - 7] + (((ctx->u.w[27 - 15] >> 7) | (ctx->u.w[27 - 15] << ((8 * sizeof ctx->u.w[27 - 15]) - 7))) ^ ((ctx->u.w[27 - 15] >> 18) | (ctx->u.w[27 - 15] << ((8 * sizeof ctx->u.w[27 - 15]) - 18))) ^ (ctx->u.w[27 - 15] >> 3)) + ctx->u.w[27 - 16]);
              ctx->u.w[28] = ((((ctx->u.w[28 - 2] >> 17) | (ctx->u.w[28 - 2] << ((8 * sizeof ctx->u.w[28 - 2]) - 17))) ^ ((ctx->u.w[28 - 2] >> 19) | (ctx->u.w[28 - 2] << ((8 * sizeof ctx->u.w[28 - 2]) - 19))) ^ (ctx->u.w[28 - 2] >> 10)) + ctx->u.w[28 - 7] + (((ctx->u.w[28 - 15] >> 7) | (ctx->u.w[28 - 15] << ((8 * sizeof ctx->u.w[28 - 15]) - 7))) ^ ((ctx->u.w[28 - 15] >> 18) | (ctx->u.w[28 - 15] << ((8 * sizeof ctx->u.w[28 - 15]) - 18))) ^ (ctx->u.w[28 - 15] >> 3)) + ctx->u.w[28 - 16]);
              ctx->u.w[29] = ((((ctx->u.w[29 - 2] >> 17) | (ctx->u.w[29 - 2] << ((8 * sizeof ctx->u.w[29 - 2]) - 17))) ^ ((ctx->u.w[29 - 2] >> 19) | (ctx->u.w[29 - 2] << ((8 * sizeof ctx->u.w[29 - 2]) - 19))) ^ (ctx->u.w[29 - 2] >> 10)) + ctx->u.w[29 - 7] + (((ctx->u.w[29 - 15] >> 7) | (ctx->u.w[29 - 15] << ((8 * sizeof ctx->u.w[29 - 15]) - 7))) ^ ((ctx->u.w[29 - 15] >> 18) | (ctx->u.w[29 - 15] << ((8 * sizeof ctx->u.w[29 - 15]) - 18))) ^ (ctx->u.w[29 - 15] >> 3)) + ctx->u.w[29 - 16]);
              ctx->u.w[30] = ((((ctx->u.w[30 - 2] >> 17) | (ctx->u.w[30 - 2] << ((8 * sizeof ctx->u.w[30 - 2]) - 17))) ^ ((ctx->u.w[30 - 2] >> 19) | (ctx->u.w[30 - 2] << ((8 * sizeof ctx->u.w[30 - 2]) - 19))) ^ (ctx->u.w[30 - 2] >> 10)) + ctx->u.w[30 - 7] + (((ctx->u.w[30 - 15] >> 7) | (ctx->u.w[30 - 15] << ((8 * sizeof ctx->u.w[30 - 15]) - 7))) ^ ((ctx->u.w[30 - 15] >> 18) | (ctx->u.w[30 - 15] << ((8 * sizeof ctx->u.w[30 - 15]) - 18))) ^ (ctx->u.w[30 - 15] >> 3)) + ctx->u.w[30 - 16]);
              ctx->u.w[31] = ((((ctx->u.w[31 - 2] >> 17) | (ctx->u.w[31 - 2] << ((8 * sizeof ctx->u.w[31 - 2]) - 17))) ^ ((ctx->u.w[31 - 2] >> 19) | (ctx->u.w[31 - 2] << ((8 * sizeof ctx->u.w[31 - 2]) - 19))) ^ (ctx->u.w[31 - 2] >> 10)) + ctx->u.w[31 - 7] + (((ctx->u.w[31 - 15] >> 7) | (ctx->u.w[31 - 15] << ((8 * sizeof ctx->u.w[31 - 15]) - 7))) ^ ((ctx->u.w[31 - 15] >> 18) | (ctx->u.w[31 - 15] << ((8 * sizeof ctx->u.w[31 - 15]) - 18))) ^ (ctx->u.w[31 - 15] >> 3)) + ctx->u.w[31 - 16]);
              ctx->u.w[32] = ((((ctx->u.w[32 - 2] >> 17) | (ctx->u.w[32 - 2] << ((8 * sizeof ctx->u.w[32 - 2]) - 17))) ^ ((ctx->u.w[32 - 2] >> 19) | (ctx->u.w[32 - 2] << ((8 * sizeof ctx->u.w[32 - 2]) - 19))) ^ (ctx->u.w[32 - 2] >> 10)) + ctx->u.w[32 - 7] + (((ctx->u.w[32 - 15] >> 7) | (ctx->u.w[32 - 15] << ((8 * sizeof ctx->u.w[32 - 15]) - 7))) ^ ((ctx->u.w[32 - 15] >> 18) | (ctx->u.w[32 - 15] << ((8 * sizeof ctx->u.w[32 - 15]) - 18))) ^ (ctx->u.w[32 - 15] >> 3)) + ctx->u.w[32 - 16]);
              ctx->u.w[33] = ((((ctx->u.w[33 - 2] >> 17) | (ctx->u.w[33 - 2] << ((8 * sizeof ctx->u.w[33 - 2]) - 17))) ^ ((ctx->u.w[33 - 2] >> 19) | (ctx->u.w[33 - 2] << ((8 * sizeof ctx->u.w[33 - 2]) - 19))) ^ (ctx->u.w[33 - 2] >> 10)) + ctx->u.w[33 - 7] + (((ctx->u.w[33 - 15] >> 7) | (ctx->u.w[33 - 15] << ((8 * sizeof ctx->u.w[33 - 15]) - 7))) ^ ((ctx->u.w[33 - 15] >> 18) | (ctx->u.w[33 - 15] << ((8 * sizeof ctx->u.w[33 - 15]) - 18))) ^ (ctx->u.w[33 - 15] >> 3)) + ctx->u.w[33 - 16]);
              ctx->u.w[36] = ((((ctx->u.w[36 - 2] >> 17) | (ctx->u.w[36 - 2] << ((8 * sizeof ctx->u.w[36 - 2]) - 17))) ^ ((ctx->u.w[36 - 2] >> 19) | (ctx->u.w[36 - 2] << ((8 * sizeof ctx->u.w[36 - 2]) - 19))) ^ (ctx->u.w[36 - 2] >> 10)) + ctx->u.w[36 - 7] + (((ctx->u.w[36 - 15] >> 7) | (ctx->u.w[36 - 15] << ((8 * sizeof ctx->u.w[36 - 15]) - 7))) ^ ((ctx->u.w[36 - 15] >> 18) | (ctx->u.w[36 - 15] << ((8 * sizeof ctx->u.w[36 - 15]) - 18))) ^ (ctx->u.w[36 - 15] >> 3)) + ctx->u.w[36 - 16]);
              ctx->u.w[37] = ((((ctx->u.w[37 - 2] >> 17) | (ctx->u.w[37 - 2] << ((8 * sizeof ctx->u.w[37 - 2]) - 17))) ^ ((ctx->u.w[37 - 2] >> 19) | (ctx->u.w[37 - 2] << ((8 * sizeof ctx->u.w[37 - 2]) - 19))) ^ (ctx->u.w[37 - 2] >> 10)) + ctx->u.w[37 - 7] + (((ctx->u.w[37 - 15] >> 7) | (ctx->u.w[37 - 15] << ((8 * sizeof ctx->u.w[37 - 15]) - 7))) ^ ((ctx->u.w[37 - 15] >> 18) | (ctx->u.w[37 - 15] << ((8 * sizeof ctx->u.w[37 - 15]) - 18))) ^ (ctx->u.w[37 - 15] >> 3)) + ctx->u.w[37 - 16]);
              ctx->u.w[38] = ((((ctx->u.w[38 - 2] >> 17) | (ctx->u.w[38 - 2] << ((8 * sizeof ctx->u.w[38 - 2]) - 17))) ^ ((ctx->u.w[38 - 2] >> 19) | (ctx->u.w[38 - 2] << ((8 * sizeof ctx->u.w[38 - 2]) - 19))) ^ (ctx->u.w[38 - 2] >> 10)) + ctx->u.w[38 - 7] + (((ctx->u.w[38 - 15] >> 7) | (ctx->u.w[38 - 15] << ((8 * sizeof ctx->u.w[38 - 15]) - 7))) ^ ((ctx->u.w[38 - 15] >> 18) | (ctx->u.w[38 - 15] << ((8 * sizeof ctx->u.w[38 - 15]) - 18))) ^ (ctx->u.w[38 - 15] >> 3)) + ctx->u.w[38 - 16]);
              ctx->u.w[39] = ((((ctx->u.w[39 - 2] >> 17) | (ctx->u.w[39 - 2] << ((8 * sizeof ctx->u.w[39 - 2]) - 17))) ^ ((ctx->u.w[39 - 2] >> 19) | (ctx->u.w[39 - 2] << ((8 * sizeof ctx->u.w[39 - 2]) - 19))) ^ (ctx->u.w[39 - 2] >> 10)) + ctx->u.w[39 - 7] + (((ctx->u.w[39 - 15] >> 7) | (ctx->u.w[39 - 15] << ((8 * sizeof ctx->u.w[39 - 15]) - 7))) ^ ((ctx->u.w[39 - 15] >> 18) | (ctx->u.w[39 - 15] << ((8 * sizeof ctx->u.w[39 - 15]) - 18))) ^ (ctx->u.w[39 - 15] >> 3)) + ctx->u.w[39 - 16]);
              ctx->u.w[40] = ((((ctx->u.w[40 - 2] >> 17) | (ctx->u.w[40 - 2] << ((8 * sizeof ctx->u.w[40 - 2]) - 17))) ^ ((ctx->u.w[40 - 2] >> 19) | (ctx->u.w[40 - 2] << ((8 * sizeof ctx->u.w[40 - 2]) - 19))) ^ (ctx->u.w[40 - 2] >> 10)) + ctx->u.w[40 - 7] + (((ctx->u.w[40 - 15] >> 7) | (ctx->u.w[40 - 15] << ((8 * sizeof ctx->u.w[40 - 15]) - 7))) ^ ((ctx->u.w[40 - 15] >> 18) | (ctx->u.w[40 - 15] << ((8 * sizeof ctx->u.w[40 - 15]) - 18))) ^ (ctx->u.w[40 - 15] >> 3)) + ctx->u.w[40 - 16]);
              ctx->u.w[41] = ((((ctx->u.w[41 - 2] >> 17) | (ctx->u.w[41 - 2] << ((8 * sizeof ctx->u.w[41 - 2]) - 17))) ^ ((ctx->u.w[41 - 2] >> 19) | (ctx->u.w[41 - 2] << ((8 * sizeof ctx->u.w[41 - 2]) - 19))) ^ (ctx->u.w[41 - 2] >> 10)) + ctx->u.w[41 - 7] + (((ctx->u.w[41 - 15] >> 7) | (ctx->u.w[41 - 15] << ((8 * sizeof ctx->u.w[41 - 15]) - 7))) ^ ((ctx->u.w[41 - 15] >> 18) | (ctx->u.w[41 - 15] << ((8 * sizeof ctx->u.w[41 - 15]) - 18))) ^ (ctx->u.w[41 - 15] >> 3)) + ctx->u.w[41 - 16]);
              ctx->u.w[43] = ((((ctx->u.w[43 - 2] >> 17) | (ctx->u.w[43 - 2] << ((8 * sizeof ctx->u.w[43 - 2]) - 17))) ^ ((ctx->u.w[43 - 2] >> 19) | (ctx->u.w[43 - 2] << ((8 * sizeof ctx->u.w[43 - 2]) - 19))) ^ (ctx->u.w[43 - 2] >> 10)) + ctx->u.w[43 - 7] + (((ctx->u.w[43 - 15] >> 7) | (ctx->u.w[43 - 15] << ((8 * sizeof ctx->u.w[43 - 15]) - 7))) ^ ((ctx->u.w[43 - 15] >> 18) | (ctx->u.w[43 - 15] << ((8 * sizeof ctx->u.w[43 - 15]) - 18))) ^ (ctx->u.w[43 - 15] >> 3)) + ctx->u.w[43 - 16]);
              ctx->u.w[45] = ((((ctx->u.w[45 - 2] >> 17) | (ctx->u.w[45 - 2] << ((8 * sizeof ctx->u.w[45 - 2]) - 17))) ^ ((ctx->u.w[45 - 2] >> 19) | (ctx->u.w[45 - 2] << ((8 * sizeof ctx->u.w[45 - 2]) - 19))) ^ (ctx->u.w[45 - 2] >> 10)) + ctx->u.w[45 - 7] + (((ctx->u.w[45 - 15] >> 7) | (ctx->u.w[45 - 15] << ((8 * sizeof ctx->u.w[45 - 15]) - 7))) ^ ((ctx->u.w[45 - 15] >> 18) | (ctx->u.w[45 - 15] << ((8 * sizeof ctx->u.w[45 - 15]) - 18))) ^ (ctx->u.w[45 - 15] >> 3)) + ctx->u.w[45 - 16]);
              ctx->u.w[46] = ((((ctx->u.w[46 - 2] >> 17) | (ctx->u.w[46 - 2] << ((8 * sizeof ctx->u.w[46 - 2]) - 17))) ^ ((ctx->u.w[46 - 2] >> 19) | (ctx->u.w[46 - 2] << ((8 * sizeof ctx->u.w[46 - 2]) - 19))) ^ (ctx->u.w[46 - 2] >> 10)) + ctx->u.w[46 - 7] + (((ctx->u.w[46 - 15] >> 7) | (ctx->u.w[46 - 15] << ((8 * sizeof ctx->u.w[46 - 15]) - 7))) ^ ((ctx->u.w[46 - 15] >> 18) | (ctx->u.w[46 - 15] << ((8 * sizeof ctx->u.w[46 - 15]) - 18))) ^ (ctx->u.w[46 - 15] >> 3)) + ctx->u.w[46 - 16]);
              ctx->u.w[47] = ((((ctx->u.w[47 - 2] >> 17) | (ctx->u.w[47 - 2] << ((8 * sizeof ctx->u.w[47 - 2]) - 17))) ^ ((ctx->u.w[47 - 2] >> 19) | (ctx->u.w[47 - 2] << ((8 * sizeof ctx->u.w[47 - 2]) - 19))) ^ (ctx->u.w[47 - 2] >> 10)) + ctx->u.w[47 - 7] + (((ctx->u.w[47 - 15] >> 7) | (ctx->u.w[47 - 15] << ((8 * sizeof ctx->u.w[47 - 15]) - 7))) ^ ((ctx->u.w[47 - 15] >> 18) | (ctx->u.w[47 - 15] << ((8 * sizeof ctx->u.w[47 - 15]) - 18))) ^ (ctx->u.w[47 - 15] >> 3)) + ctx->u.w[47 - 16]);
              ctx->u.w[48] = ((((ctx->u.w[48 - 2] >> 17) | (ctx->u.w[48 - 2] << ((8 * sizeof ctx->u.w[48 - 2]) - 17))) ^ ((ctx->u.w[48 - 2] >> 19) | (ctx->u.w[48 - 2] << ((8 * sizeof ctx->u.w[48 - 2]) - 19))) ^ (ctx->u.w[48 - 2] >> 10)) + ctx->u.w[48 - 7] + (((ctx->u.w[48 - 15] >> 7) | (ctx->u.w[48 - 15] << ((8 * sizeof ctx->u.w[48 - 15]) - 7))) ^ ((ctx->u.w[48 - 15] >> 18) | (ctx->u.w[48 - 15] << ((8 * sizeof ctx->u.w[48 - 15]) - 18))) ^ (ctx->u.w[48 - 15] >> 3)) + ctx->u.w[48 - 16]);
              ctx->u.w[49] = ((((ctx->u.w[49 - 2] >> 17) | (ctx->u.w[49 - 2] << ((8 * sizeof ctx->u.w[49 - 2]) - 17))) ^ ((ctx->u.w[49 - 2] >> 19) | (ctx->u.w[49 - 2] << ((8 * sizeof ctx->u.w[49 - 2]) - 19))) ^ (ctx->u.w[49 - 2] >> 10)) + ctx->u.w[49 - 7] + (((ctx->u.w[49 - 15] >> 7) | (ctx->u.w[49 - 15] << ((8 * sizeof ctx->u.w[49 - 15]) - 7))) ^ ((ctx->u.w[49 - 15] >> 18) | (ctx->u.w[49 - 15] << ((8 * sizeof ctx->u.w[49 - 15]) - 18))) ^ (ctx->u.w[49 - 15] >> 3)) + ctx->u.w[49 - 16]);
              ctx->u.w[50] = ((((ctx->u.w[50 - 2] >> 17) | (ctx->u.w[50 - 2] << ((8 * sizeof ctx->u.w[50 - 2]) - 17))) ^ ((ctx->u.w[50 - 2] >> 19) | (ctx->u.w[50 - 2] << ((8 * sizeof ctx->u.w[50 - 2]) - 19))) ^ (ctx->u.w[50 - 2] >> 10)) + ctx->u.w[50 - 7] + (((ctx->u.w[50 - 15] >> 7) | (ctx->u.w[50 - 15] << ((8 * sizeof ctx->u.w[50 - 15]) - 7))) ^ ((ctx->u.w[50 - 15] >> 18) | (ctx->u.w[50 - 15] << ((8 * sizeof ctx->u.w[50 - 15]) - 18))) ^ (ctx->u.w[50 - 15] >> 3)) + ctx->u.w[50 - 16]);
              ctx->u.w[51] = ((((ctx->u.w[51 - 2] >> 17) | (ctx->u.w[51 - 2] << ((8 * sizeof ctx->u.w[51 - 2]) - 17))) ^ ((ctx->u.w[51 - 2] >> 19) | (ctx->u.w[51 - 2] << ((8 * sizeof ctx->u.w[51 - 2]) - 19))) ^ (ctx->u.w[51 - 2] >> 10)) + ctx->u.w[51 - 7] + (((ctx->u.w[51 - 15] >> 7) | (ctx->u.w[51 - 15] << ((8 * sizeof ctx->u.w[51 - 15]) - 7))) ^ ((ctx->u.w[51 - 15] >> 18) | (ctx->u.w[51 - 15] << ((8 * sizeof ctx->u.w[51 - 15]) - 18))) ^ (ctx->u.w[51 - 15] >> 3)) + ctx->u.w[51 - 16]);
              ctx->u.w[52] = ((((ctx->u.w[52 - 2] >> 17) | (ctx->u.w[52 - 2] << ((8 * sizeof ctx->u.w[52 - 2]) - 17))) ^ ((ctx->u.w[52 - 2] >> 19) | (ctx->u.w[52 - 2] << ((8 * sizeof ctx->u.w[52 - 2]) - 19))) ^ (ctx->u.w[52 - 2] >> 10)) + ctx->u.w[52 - 7] + (((ctx->u.w[52 - 15] >> 7) | (ctx->u.w[52 - 15] << ((8 * sizeof ctx->u.w[52 - 15]) - 7))) ^ ((ctx->u.w[52 - 15] >> 18) | (ctx->u.w[52 - 15] << ((8 * sizeof ctx->u.w[52 - 15]) - 18))) ^ (ctx->u.w[52 - 15] >> 3)) + ctx->u.w[52 - 16]);
              ctx->u.w[53] = ((((ctx->u.w[53 - 2] >> 17) | (ctx->u.w[53 - 2] << ((8 * sizeof ctx->u.w[53 - 2]) - 17))) ^ ((ctx->u.w[53 - 2] >> 19) | (ctx->u.w[53 - 2] << ((8 * sizeof ctx->u.w[53 - 2]) - 19))) ^ (ctx->u.w[53 - 2] >> 10)) + ctx->u.w[53 - 7] + (((ctx->u.w[53 - 15] >> 7) | (ctx->u.w[53 - 15] << ((8 * sizeof ctx->u.w[53 - 15]) - 7))) ^ ((ctx->u.w[53 - 15] >> 18) | (ctx->u.w[53 - 15] << ((8 * sizeof ctx->u.w[53 - 15]) - 18))) ^ (ctx->u.w[53 - 15] >> 3)) + ctx->u.w[53 - 16]);
              ctx->u.w[54] = ((((ctx->u.w[54 - 2] >> 17) | (ctx->u.w[54 - 2] << ((8 * sizeof ctx->u.w[54 - 2]) - 17))) ^ ((ctx->u.w[54 - 2] >> 19) | (ctx->u.w[54 - 2] << ((8 * sizeof ctx->u.w[54 - 2]) - 19))) ^ (ctx->u.w[54 - 2] >> 10)) + ctx->u.w[54 - 7] + (((ctx->u.w[54 - 15] >> 7) | (ctx->u.w[54 - 15] << ((8 * sizeof ctx->u.w[54 - 15]) - 7))) ^ ((ctx->u.w[54 - 15] >> 18) | (ctx->u.w[54 - 15] << ((8 * sizeof ctx->u.w[54 - 15]) - 18))) ^ (ctx->u.w[54 - 15] >> 3)) + ctx->u.w[54 - 16]);
              ctx->u.w[55] = ((((ctx->u.w[55 - 2] >> 17) | (ctx->u.w[55 - 2] << ((8 * sizeof ctx->u.w[55 - 2]) - 17))) ^ ((ctx->u.w[55 - 2] >> 19) | (ctx->u.w[55 - 2] << ((8 * sizeof ctx->u.w[55 - 2]) - 19))) ^ (ctx->u.w[55 - 2] >> 10)) + ctx->u.w[55 - 7] + (((ctx->u.w[55 - 15] >> 7) | (ctx->u.w[55 - 15] << ((8 * sizeof ctx->u.w[55 - 15]) - 7))) ^ ((ctx->u.w[55 - 15] >> 18) | (ctx->u.w[55 - 15] << ((8 * sizeof ctx->u.w[55 - 15]) - 18))) ^ (ctx->u.w[55 - 15] >> 3)) + ctx->u.w[55 - 16]);
              ctx->u.w[56] = ((((ctx->u.w[56 - 2] >> 17) | (ctx->u.w[56 - 2] << ((8 * sizeof ctx->u.w[56 - 2]) - 17))) ^ ((ctx->u.w[56 - 2] >> 19) | (ctx->u.w[56 - 2] << ((8 * sizeof ctx->u.w[56 - 2]) - 19))) ^ (ctx->u.w[56 - 2] >> 10)) + ctx->u.w[56 - 7] + (((ctx->u.w[56 - 15] >> 7) | (ctx->u.w[56 - 15] << ((8 * sizeof ctx->u.w[56 - 15]) - 7))) ^ ((ctx->u.w[56 - 15] >> 18) | (ctx->u.w[56 - 15] << ((8 * sizeof ctx->u.w[56 - 15]) - 18))) ^ (ctx->u.w[56 - 15] >> 3)) + ctx->u.w[56 - 16]);
              ctx->u.w[57] = ((((ctx->u.w[57 - 2] >> 17) | (ctx->u.w[57 - 2] << ((8 * sizeof ctx->u.w[57 - 2]) - 17))) ^ ((ctx->u.w[57 - 2] >> 19) | (ctx->u.w[57 - 2] << ((8 * sizeof ctx->u.w[57 - 2]) - 19))) ^ (ctx->u.w[57 - 2] >> 10)) + ctx->u.w[57 - 7] + (((ctx->u.w[57 - 15] >> 7) | (ctx->u.w[57 - 15] << ((8 * sizeof ctx->u.w[57 - 15]) - 7))) ^ ((ctx->u.w[57 - 15] >> 18) | (ctx->u.w[57 - 15] << ((8 * sizeof ctx->u.w[57 - 15]) - 18))) ^ (ctx->u.w[57 - 15] >> 3)) + ctx->u.w[57 - 16]);
              ctx->u.w[58] = ((((ctx->u.w[58 - 2] >> 17) | (ctx->u.w[58 - 2] << ((8 * sizeof ctx->u.w[58 - 2]) - 17))) ^ ((ctx->u.w[58 - 2] >> 19) | (ctx->u.w[58 - 2] << ((8 * sizeof ctx->u.w[58 - 2]) - 19))) ^ (ctx->u.w[58 - 2] >> 10)) + ctx->u.w[58 - 7] + (((ctx->u.w[58 - 15] >> 7) | (ctx->u.w[58 - 15] << ((8 * sizeof ctx->u.w[58 - 15]) - 7))) ^ ((ctx->u.w[58 - 15] >> 18) | (ctx->u.w[58 - 15] << ((8 * sizeof ctx->u.w[58 - 15]) - 18))) ^ (ctx->u.w[58 - 15] >> 3)) + ctx->u.w[58 - 16]);
              ctx->u.w[59] = ((((ctx->u.w[59 - 2] >> 17) | (ctx->u.w[59 - 2] << ((8 * sizeof ctx->u.w[59 - 2]) - 17))) ^ ((ctx->u.w[59 - 2] >> 19) | (ctx->u.w[59 - 2] << ((8 * sizeof ctx->u.w[59 - 2]) - 19))) ^ (ctx->u.w[59 - 2] >> 10)) + ctx->u.w[59 - 7] + (((ctx->u.w[59 - 15] >> 7) | (ctx->u.w[59 - 15] << ((8 * sizeof ctx->u.w[59 - 15]) - 7))) ^ ((ctx->u.w[59 - 15] >> 18) | (ctx->u.w[59 - 15] << ((8 * sizeof ctx->u.w[59 - 15]) - 18))) ^ (ctx->u.w[59 - 15] >> 3)) + ctx->u.w[59 - 16]);
              ctx->u.w[60] = ((((ctx->u.w[60 - 2] >> 17) | (ctx->u.w[60 - 2] << ((8 * sizeof ctx->u.w[60 - 2]) - 17))) ^ ((ctx->u.w[60 - 2] >> 19) | (ctx->u.w[60 - 2] << ((8 * sizeof ctx->u.w[60 - 2]) - 19))) ^ (ctx->u.w[60 - 2] >> 10)) + ctx->u.w[60 - 7] + (((ctx->u.w[60 - 15] >> 7) | (ctx->u.w[60 - 15] << ((8 * sizeof ctx->u.w[60 - 15]) - 7))) ^ ((ctx->u.w[60 - 15] >> 18) | (ctx->u.w[60 - 15] << ((8 * sizeof ctx->u.w[60 - 15]) - 18))) ^ (ctx->u.w[60 - 15] >> 3)) + ctx->u.w[60 - 16]);
          }
       }

Using llvm-reduce by following https://discourse.llvm.org/t/suggestions-to-debug-a-forever-running-clang-compile-command/60420/6

$ clang-17 -O2 -c sha512.c.i -march=skylake -emit-llvm -S
$ llc sha512.c.ll # confirm this hangs
$ llvm-reduce --test=test2.sh sha512.c.ll

with

#!/usr/bin/env bash
timeout 30s llc "$@"

ret=$?

case ${ret} in
        0)
                # If it built fine, it's uninteresting.
                exit 1
                ;;
        124)
                # It timed out, yay!
                exit 0
                ;;
        *)
                # It failed in some way but not a timeout, not interesting.
                exit 1
                ;;
esac

This kind of worked. The result takes ~30s for llc on a fast machine w/ llvm 17 (takes 0.049s w/ llvm 16) which I think is illustrative enough and should show that it'll take orders of magnitude longer for the full thing. I know it's not ideal though.

; ModuleID = '<bc file>'
source_filename = "sha512.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #0

; Function Attrs: nounwind sspstrong memory(argmem: readwrite) uwtable
define void @SHA256_Compress_Generic(ptr noundef %ctx) #1 {
entry:
  %0 = load i32, ptr null, align 4
  %1 = tail call i32 asm "bswap $0", "=r,0,~{dirflag},~{fpsr},~{flags}"(i32 %0) #5
  %arrayidx14 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 3
  %2 = load i32, ptr %arrayidx14, align 4
  %3 = tail call i32 asm "bswap $0", "=r,0,~{dirflag},~{fpsr},~{flags}"(i32 %2) #5
  %4 = insertelement <2 x i32> zeroinitializer, i32 %1, i64 1
  %5 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %4, <2 x i32> %4, <2 x i32> <i32 15, i32 15>)
  %6 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %4, <2 x i32> %4, <2 x i32> <i32 13, i32 13>)
  %7 = xor <2 x i32> %5, %6
  %8 = lshr <2 x i32> %4, zeroinitializer
  %9 = xor <2 x i32> %7, %8
  %10 = insertelement <2 x i32> zeroinitializer, i32 %3, i64 0
  %11 = shufflevector <2 x i32> zeroinitializer, <2 x i32> %10, <2 x i32> <i32 1, i32 2>
  %12 = add <2 x i32> %11, %9
  %13 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %12, <2 x i32> %12, <2 x i32> <i32 15, i32 15>)
  %14 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %12, <2 x i32> %12, <2 x i32> <i32 13, i32 13>)
  %15 = xor <2 x i32> %13, %14
  %16 = lshr <2 x i32> %12, zeroinitializer
  %17 = xor <2 x i32> %15, %16
  %18 = add <2 x i32> %4, %17
  %19 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %18, <2 x i32> %18, <2 x i32> <i32 15, i32 15>)
  %20 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %18, <2 x i32> %18, <2 x i32> <i32 13, i32 13>)
  %21 = xor <2 x i32> %19, %20
  %22 = lshr <2 x i32> %18, <i32 10, i32 10>
  %23 = xor <2 x i32> %21, %22
  %24 = add <2 x i32> %4, %23
  %25 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %24, <2 x i32> %24, <2 x i32> <i32 15, i32 15>)
  %26 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %24, <2 x i32> %24, <2 x i32> <i32 13, i32 13>)
  %27 = xor <2 x i32> %25, %26
  %28 = lshr <2 x i32> %24, <i32 10, i32 10>
  %29 = xor <2 x i32> %27, %28
  %30 = shufflevector <2 x i32> %4, <2 x i32> %12, <2 x i32> <i32 1, i32 2>
  %31 = add <2 x i32> %30, %29
  %32 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %31, <2 x i32> %31, <2 x i32> <i32 15, i32 15>)
  %33 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %31, <2 x i32> %31, <2 x i32> <i32 13, i32 13>)
  %34 = xor <2 x i32> %32, %33
  %35 = lshr <2 x i32> %31, <i32 10, i32 10>
  %36 = xor <2 x i32> %34, %35
  %37 = shufflevector <2 x i32> %12, <2 x i32> zeroinitializer, <2 x i32> <i32 1, i32 2>
  %38 = add <2 x i32> %37, %36
  %arrayidx918 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 33
  store <2 x i32> %38, ptr %arrayidx918, align 4
  %arrayidx1012 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 35
  %39 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %38, <2 x i32> %38, <2 x i32> <i32 15, i32 15>)
  %40 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %38, <2 x i32> %38, <2 x i32> <i32 13, i32 13>)
  %41 = xor <2 x i32> %39, %40
  %42 = lshr <2 x i32> %38, <i32 10, i32 10>
  %43 = xor <2 x i32> %41, %42
  %44 = add <2 x i32> %37, %43
  store <2 x i32> zeroinitializer, ptr %arrayidx1012, align 4
  %arrayidx1106 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 37
  %45 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %44, <2 x i32> %44, <2 x i32> <i32 15, i32 15>)
  %46 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %44, <2 x i32> %44, <2 x i32> <i32 13, i32 13>)
  %47 = xor <2 x i32> %45, %46
  %48 = lshr <2 x i32> %44, <i32 10, i32 10>
  %49 = xor <2 x i32> %47, %48
  %50 = lshr <2 x i32> %24, zeroinitializer
  %51 = add <2 x i32> %50, %49
  store <2 x i32> %51, ptr %arrayidx1106, align 4
  %arrayidx1200 = getelementptr inbounds [64 x i32], ptr %ctx, i64 0, i64 39
  %52 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %51, <2 x i32> %51, <2 x i32> <i32 15, i32 15>)
  %53 = tail call <2 x i32> @llvm.fshl.v2i32(<2 x i32> %51, <2 x i32> %51, <2 x i32> <i32 13, i32 13>)
  %54 = xor <2 x i32> %52, %53
  %55 = lshr <2 x i32> %51, <i32 10, i32 10>
  %56 = xor <2 x i32> %54, %55
  %57 = shufflevector <2 x i32> %38, <2 x i32> zeroinitializer, <2 x i32> <i32 poison, i32 0>
  %58 = insertelement <2 x i32> %57, i32 0, i64 0
  %59 = add <2 x i32> %58, %56
  store <2 x i32> %59, ptr %arrayidx1200, align 4
  ret void

; uselistorder directives
  uselistorder <2 x i32> %4, { 7, 0, 1, 6, 5, 4, 3, 2 }
  uselistorder <2 x i32> %38, { 6, 5, 4, 3, 2, 1, 0 }
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.bswap.i64(i64) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.fshl.i32(i32, i32, i32) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.umin.i32(i32, i32) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.fshl.i64(i64, i64, i64) #2

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #3

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #4

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i32> @llvm.fshl.v2i32(<2 x i32>, <2 x i32>, <2 x i32>) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <2 x i64> @llvm.fshl.v2i64(<2 x i64>, <2 x i64>, <2 x i64>) #2

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>) #2

; uselistorder directives
uselistorder ptr @llvm.fshl.v2i32, { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #1 = { nounwind sspstrong memory(argmem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="inline-asm" "stack-protector-buffer-size"="8" "target-cpu"="skylake" "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sgx,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #3 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #5 = { nounwind memory(none) }

thesamesam · 2023-09-26T04:57:05Z

cc @RKSimon

Bisect says af32e51:

af32e51a43fb4343f4c407bf1ee051ff78a57494 is the first bad commit
commit af32e51a43fb4343f4c407bf1ee051ff78a57494
Author: Simon Pilgrim <[email protected]>
Date:   Sat Jul 22 17:54:48 2023 +0100

    [X86] LowerRotate - manually expand rotate by splat constant patterns.

    Fixes issue identified on #63980 where the undef rotate amounts (during widening from v2i32 -> v4i32) were being constant folded to 0 when the shift amounts are created during expansion, losing the splat'd shift amounts.

 llvm/lib/Target/X86/X86ISelLowering.cpp         | 14 +++++++++++--
 llvm/test/CodeGen/X86/vector-fshl-rot-sub128.ll | 27 +++++++++----------------
 llvm/test/CodeGen/X86/vector-fshr-rot-sub128.ll | 27 +++++++++----------------
 3 files changed, 32 insertions(+), 36 deletions(-)
bisect found first bad commit

If I revert it on release/17.x, I get decent performance (although seemingly consistently a bit slower by half a second or so than clang 16).

llvmbot · 2023-09-26T04:57:26Z

@llvm/issue-subscribers-backend-x86

When using clang 17.0.1 in Gentoo and compiling [dev-libs/nss](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS) an infinite loop is detected in sha512.c compilation unit (reported [downstream Gentoo bug report](https://bugs.gentoo.org/914657)) which is not reproducible using Clang 16.

An initial testing revealed that march=native (also tested march=skylake and march=alderlake triggering the problem) might be causing the trouble.

In order to ease the test process, I attach the preprocessed output for that compile unit which I expect can reproduce this issue easily (remove the .txt extension)

sha512-preprocessed.c.txt

RKSimon · 2023-10-01T13:00:56Z

Finally worked this out - we're missing a oneuse limit in combineConcatVectorOps - fix incoming

thesamesam · 2023-10-03T06:55:57Z

/cherry-pick 04b403d

llvmbot · 2023-10-03T07:03:32Z

Failed to cherry-pick: 04b403d

https://github.com/llvm/llvm-project/actions/runs/6389691511

Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:

/branch <user>/<repo>/<branch>

thesamesam · 2023-10-03T17:44:50Z

@RKSimon Could you handle the backport? Cheers.

RKSimon · 2023-10-04T12:39:35Z

/branch RKSimon/llvm-project/PR67333

llvmbot · 2023-10-04T12:49:33Z

/pull-request llvm/llvm-project-release-prs#724

We could maybe extend this by allowing the lowest subop to have multiple uses and extract the lowest subvector result of the concatenated op, but let's just get the fix in first. Fixes #67333

github-actions bot added the clang Clang issues not falling into any other category label Sep 25, 2023

StormBytePP changed the title ~~[Clang 17.0.1] [Regression] Infinite loop in compile unit~~ [clang 17.0.1] [regression] Infinite loop in compile unit Sep 25, 2023

shafik added the needs-reduction Large reproducer that should be reduced into a simpler form label Sep 25, 2023

thesamesam added the regression label Sep 25, 2023

thesamesam changed the title ~~[clang 17.0.1] [regression] Infinite loop in compile unit~~ [clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake Sep 25, 2023

thesamesam added the hang Compiler hang (infinite loop) label Sep 25, 2023

thesamesam removed the needs-reduction Large reproducer that should be reduced into a simpler form label Sep 26, 2023

thesamesam added the backend:X86 label Sep 26, 2023

RKSimon self-assigned this Sep 26, 2023

thesamesam added this to the LLVM 17.0.X Release milestone Sep 27, 2023

github-project-automation bot added this to LLVM Release Status Sep 27, 2023

github-project-automation bot moved this to Needs Triage in LLVM Release Status Sep 27, 2023

RKSimon closed this as completed in 04b403d Oct 1, 2023

RKSimon reopened this Oct 1, 2023

Endilll removed the clang Clang issues not falling into any other category label Oct 2, 2023

llvmbot added the release:cherry-pick-failed label Oct 3, 2023

llvmbot mentioned this issue Oct 4, 2023

PR for llvm/llvm-project#67333 llvm/llvm-project-release-prs#724

Merged

llvmbot removed the release:cherry-pick-failed label Oct 4, 2023

tru moved this from Needs Triage to Needs Review in LLVM Release Status Oct 5, 2023

tru closed this as completed in llvm/llvm-project-release-prs#724 Oct 10, 2023

tru moved this from Needs Review to Done in LLVM Release Status Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333

[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333

StormBytePP commented Sep 25, 2023 •

edited

Loading

thesamesam commented Sep 25, 2023 •

edited

Loading

Uh oh!

thesamesam commented Sep 26, 2023 •

edited

Loading

Uh oh!

thesamesam commented Sep 26, 2023

Uh oh!

llvmbot commented Sep 26, 2023

Uh oh!

RKSimon commented Oct 1, 2023

Uh oh!

thesamesam commented Oct 3, 2023

Uh oh!

llvmbot commented Oct 3, 2023

Uh oh!

thesamesam commented Oct 3, 2023

Uh oh!

RKSimon commented Oct 4, 2023

Uh oh!

llvmbot commented Oct 4, 2023

Uh oh!

[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333

[clang 17.0.1] [regression] Hangs when compiling NSS with -O2 -march=skylake #67333

Comments

StormBytePP commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

thesamesam commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thesamesam commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thesamesam commented Sep 26, 2023

Uh oh!

llvmbot commented Sep 26, 2023

Uh oh!

RKSimon commented Oct 1, 2023

Uh oh!

thesamesam commented Oct 3, 2023

Uh oh!

llvmbot commented Oct 3, 2023

Uh oh!

thesamesam commented Oct 3, 2023

Uh oh!

RKSimon commented Oct 4, 2023

Uh oh!

llvmbot commented Oct 4, 2023

Uh oh!

StormBytePP commented Sep 25, 2023 •

edited

Loading

thesamesam commented Sep 25, 2023 •

edited

Loading

thesamesam commented Sep 26, 2023 •

edited

Loading