Skip to content

fatal error: error in backend: Failed to evaluate function length in SEH unwind info #66912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kalibera opened this issue Sep 20, 2023 · 22 comments · Fixed by #67894 or llvm/llvm-project-release-prs#722

Comments

@kalibera
Copy link

I've encountered a crash of clang from LLVM 17.01 when trying to build openblas 0.3.24 for Windows/aarch64 on a Linux machine. The crash files and full output with stacktrace are attached in crash.zip

The output includes:

fatal error: error in backend: Failed to evaluate function length in SEH unwind info

#5 0x00005579fe4a4f70 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x1c72f70)
#8 0x00005579fe2627dd llvm::MCStreamer::emitWinCFIEndProc(llvm::SMLoc) (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x1a307dd)
#11 0x00005579ff1be469 llvm::AsmPrinter::emitFunctionBody() (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x298c469)
#14 0x00005579fdff2eea llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x17c0eea)

@EugeneZelenko EugeneZelenko added mc Machine (object) code platform:windows crash Prefer [crash-on-valid] or [crash-on-invalid] and removed new issue labels Sep 20, 2023
@kalibera
Copy link
Author

cc'ing @mstorsjo

@mstorsjo
Copy link
Member

I can reproduce the error with clang 15 but not with 14, so either it's a regression, or the issue just happened to go unnoticed before.

@mstorsjo
Copy link
Member

CC @efriedma-quic

I reduced the testcase to the following:

void a();
void b() {
  for (;;)
    a();
}

Compiled with the following command: clang -target aarch64-windows -O2 -mtune=cortex-a53 -c repro.c

The issue is that there's an alignment inserted in the middle of a function, which makes it impossible for the SEH unwind info generation to measure the size of the function (as it hasn't been laid out yet, so we don't know how large the alignment gap ends up). In this case, the output is:

$ clang -target aarch64-windows -O2 -mtune=cortex-a53 -S -o - repro.c
        .globl  b                               // -- Begin function b
        .p2align        4
b:                                      // @b
.seh_proc b
// %bb.0:                               // %entry
        str     x30, [sp, #-16]!                // 8-byte Folded Spill
        .seh_save_reg_x x30, 16
        .seh_endprologue
        .p2align        4, , 8
.LBB0_1:                                // %for.cond
                                        // =>This Inner Loop Header: Depth=1
        bl      a
        b       .LBB0_1
        .seh_endfunclet
        .seh_endproc

Here, the .p2align is what's causing the breakage.

I bisected this down to the commit 7d67671:

commit 7d676714fbf2dd4bcdd15bd63d6b43b467ceccb1
Author: Nicholas Guy <[email protected]>
Date:   Tue Mar 22 16:35:12 2022 +0000

    [AArch64] Set MaxBytesForLoopAlignment for more targets
    
    Differential Revision: https://reviews.llvm.org/D122566

 llvm/lib/Target/AArch64/AArch64Subtarget.cpp        | 18 +++++++++++++++++-
 .../AArch64/aarch64-p2align-max-bytes-neoverse.ll   | 21 +++++++++++++++++----
 llvm/test/CodeGen/AArch64/merge-store-dependency.ll |  2 ++
 .../CodeGen/AArch64/preferred-function-alignment.ll |  2 +-
 4 files changed, 37 insertions(+), 6 deletions(-)

@llvmbot
Copy link
Member

llvmbot commented Sep 21, 2023

@llvm/issue-subscribers-backend-aarch64

I've encountered a crash of clang from LLVM 17.01 when trying to build openblas 0.3.24 for Windows/aarch64 on a Linux machine. The crash files and full output with stacktrace are attached in [crash.zip](https://github.com/llvm/llvm-project/files/12674259/crash.zip)

The output includes:

fatal error: error in backend: Failed to evaluate function length in SEH unwind info

#5 0x00005579fe4a4f70 llvm::report_fatal_error(llvm::Twine const&, bool) (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x1c72f70)
#8 0x00005579fe2627dd llvm::MCStreamer::emitWinCFIEndProc(llvm::SMLoc) (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x1a307dd)
#11 0x00005579ff1be469 llvm::AsmPrinter::emitFunctionBody() (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x298c469)
#14 0x00005579fdff2eea llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/clang+0x17c0eea)

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Sep 21, 2023

@NickGuy-Arm

Edit: Though now I realise this landed about a year ago. Maybe they'll have some thoughts anyway.

@efriedma-quic
Copy link
Collaborator

This has also been reported as #47432 and #64504. The underlying backend limitation is known, but hard to fix... it requires a complicated rearrangement of the code to delay computing the length of a function until assembler relaxation. (I think there's some discussion of this elsewhere, but I can't seem to find it.)

@efriedma-quic
Copy link
Collaborator

See discussion on https://reviews.llvm.org/D61095

@mstorsjo
Copy link
Member

This has also been reported as #47432 and #64504. The underlying backend limitation is known, but hard to fix... it requires a complicated rearrangement of the code to delay computing the length of a function until assembler relaxation. (I think there's some discussion of this elsewhere, but I can't seem to find it.)

Indeed... What's the practical way around it to recommend here? Should we avoid enabling loop alignment in the aarch64 backend, when SEH generation is enabled?

@hmartinez82
Copy link

🆙
Saw the same error when building with -mcpu=cortex-a55 in MSYS2's CLANGARM64 environment

@efriedma-quic
Copy link
Collaborator

Should we avoid enabling loop alignment in the aarch64 backend, when SEH generation is enabled?

This probably makes sense. (Given the interaction with debug info, it might make sense to just check if the target in Windows.)

@mstorsjo
Copy link
Member

Should we avoid enabling loop alignment in the aarch64 backend, when SEH generation is enabled?

This probably makes sense.

Ok, great, I'll try to make an effort to do that then.

(Given the interaction with debug info, it might make sense to just check if the target in Windows.)

Can you elaborate on what interaction with debug info you're thinking of here, so I can formulate a good comment explaining the code in the end?

mstorsjo added a commit to mstorsjo/llvm-project that referenced this issue Sep 30, 2023
This should fix llvm#66912. When emitting SEH unwind info, we need to
be able to calculate the exact length of functions before alignments
are fixed. Until that limitation is overcome, just disable all
loop alignment on Windows targets.
@mstorsjo
Copy link
Member

I've got a potential fix for this in #67894.

@kalibera
Copy link
Author

kalibera commented Oct 2, 2023

I've got a potential fix for this in #67894.

Thanks, it fixes the problem for me - openblas now compiles without crashing (tested in LLVM 17.01, with the patch applied).

@hmartinez82
Copy link

I've got a potential fix for this in #67894.

Thanks, it fixes the problem for me - openblas now compiles without crashing (tested in LLVM 17.01, with the patch applied).

@kalibera were you able to let OpenBLAS build all the kernels when building for Windows on ARM? I had to remove dynamic arch and force it to be Cortext-A53 only like here: https://github.com/msys2/MINGW-packages/pull/11954/files#diff-ff01e0a0f990bc4bae590e2d4bd2ddbfe71c0ede3550881075673905df875b66R80

@kalibera
Copy link
Author

kalibera commented Oct 2, 2023

@kalibera were you able to let OpenBLAS build all the kernels when building for Windows on ARM? I had to remove dynamic arch and force it to be Cortext-A53 only like here: https://github.com/msys2/MINGW-packages/pull/11954/files#diff-ff01e0a0f990bc4bae590e2d4bd2ddbfe71c0ede3550881075673905df875b66R80

No, I am building like you - https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/openblas.mk. With DYNAMIC_ARCH I get "No rule to make target 'dynamic_arm64.obj'" (which is not related to this bug report).

mstorsjo added a commit that referenced this issue Oct 2, 2023
This should fix #66912. When emitting SEH unwind info, we need to be
able to calculate the exact length of functions before alignments are
fixed. Until that limitation is overcome, just disable all loop
alignment on Windows targets.
@mstorsjo
Copy link
Member

mstorsjo commented Oct 2, 2023

Reopening for backporting the fix for the regression. (The issue with loop alignment probably did exist for other -mtune parameters before - but -mtune=cortex-a53 did use to work, and in 17.x it regressed into hitting this bug.)

@mstorsjo mstorsjo removed the crash Prefer [crash-on-valid] or [crash-on-invalid] label Oct 2, 2023
@mstorsjo
Copy link
Member

mstorsjo commented Oct 2, 2023

/cherry-pick 6ae36c0

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

/branch llvm/llvm-project-release-prs/issue66912

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

/pull-request llvm/llvm-project-release-prs#722

@tru tru moved this from Needs Triage to Needs Review in LLVM Release Status Oct 3, 2023
@tru tru moved this from Needs Review to Done in LLVM Release Status Oct 10, 2023
tru pushed a commit that referenced this issue Oct 10, 2023
This should fix #66912. When emitting SEH unwind info, we need to be
able to calculate the exact length of functions before alignments are
fixed. Until that limitation is overcome, just disable all loop
alignment on Windows targets.

(cherry picked from commit 6ae36c0)
@hmartinez82 hmartinez82 marked this as a duplicate of #122707 Jan 13, 2025
@hmartinez82
Copy link

hmartinez82 commented Jan 13, 2025

Just documenting here that with 19.1.6 we can still repro this issue with as little as this code (built with clang -c):

int f(int i) {
    int result;
    __asm__ (
        ".align 5 \n"
        "add %w0, %w1, #41"
        : "=r" (result)
        : "r" (i)
        :
    );
    return result;
}

@mstorsjo
Copy link
Member

Just documenting here that with 19.1.6 we can still repro this issue with as little as this code (built with clang -c):

int f(int i) {
int result;
asm (
".align 5 \n"
"add %w0, %w1, #41"
: "=r" (result)
: "r" (i)
:
);
return result;
}

Yes, that'll probably trigger it.

The issue is twofold; if we have things within the function that makes it impossible to calculate the length of it ahead of time, like an .align directive, we hit this issue. That bit is not resolved.

The fix above, in #67894 / 6ae36c0, makes sure that we don't generate .align due to -mcpu= and similar, when targeting Windows.

Perhaps it would be better to file a new separate issue that case where you explicitly do use .align.

@hmartinez82 hmartinez82 marked this as not a duplicate of #122707 Jan 13, 2025
@hmartinez82
Copy link

hmartinez82 commented Jan 13, 2025

Done. New ticket created at #122707

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
7 participants