Skip to content

fatal error when building the OpenBLAS ThunderX2 and some other dynamic kernels for Windows on ARM #69454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hmartinez82 opened this issue Oct 18, 2023 · 12 comments
Labels
backend:AArch64 crash Prefer [crash-on-valid] or [crash-on-invalid] duplicate Resolved as duplicate

Comments

@hmartinez82
Copy link

I gave it another try on building OpenBLAS with DYNAMIC_ARCH=ON (see https://github.com/msys2/MINGW-packages/blob/b07f75117ed0cfe6e1f000de7caaffb433cb8c80/mingw-w64-openblas/PKGBUILD#L72) for Windows on ARM after patching Clang with #67894
Is it possible that I also have to add another patch for supporting the whole fix or this something new related only to ThunderX2?

Clang is still crashing.

[9616/15543] Building C object kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj
FAILED: kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj
C:\msys64\clangarm64\bin\clang.exe  -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24/lapack-netlib/LAPACKE/include -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24 -IC:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel_config/THUNDERX2T99 -Wno-unused-function -Wno-unused-variable  -DHAVE_C11 -DMS_ABI -fopenmp=libomp -DUSE_OPENMP -Wall -DF_INTERFACE_GFORT -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DNO_AFFINITY -DVERSION="\"0.3.24\"" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -DBUILD_KERNEL -DTABLE_NAME=gotoblas_THUNDERX2T99  -DTS=_THUNDERX2T99 -MD -MT kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -MF kernel\CMakeFiles\kernel_THUNDERX2T99.dir\CMakeFiles\dcopy_k_THUNDERX2T99.c.obj.d -o kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -c C:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/dcopy_k_THUNDERX2T99.c
fatal error: error in backend: Failed to evaluate function length in SEH unwind info
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: C:\\msys64\\clangarm64\\bin\\clang.exe -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24/lapack-netlib/LAPACKE/include -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24 -IC:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel_config/THUNDERX2T99 -Wno-unused-function -Wno-unused-variable -DHAVE_C11 -DMS_ABI -fopenmp=libomp -DUSE_OPENMP -Wall -DF_INTERFACE_GFORT -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DNO_AFFINITY -DVERSION=\"0.3.24\" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -DBUILD_KERNEL -DTABLE_NAME=gotoblas_THUNDERX2T99 -DTS=_THUNDERX2T99 -MD -MT kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -MF kernel\\CMakeFiles\\kernel_THUNDERX2T99.dir\\CMakeFiles\\dcopy_k_THUNDERX2T99.c.obj.d -o kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -c C:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/dcopy_k_THUNDERX2T99.c
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'C:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/dcopy_k_THUNDERX2T99.c'.
4.      Running pass 'AArch64 Assembly Printer' on function '@dcopy_k_THUNDERX2T99'
Exception Code: 0xE0000046
 #0 0x00007fff9ff06334 (C:\Windows\System32\KERNELBASE.dll+0x76334)
 #1 0x00007ffef8f85ad0 llvm::CrashRecoveryContext::HandleExit(int) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x85ad0)
 #2 0x00007ffef9029ff4 llvm::sys::Process::Exit(int, bool) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x129ff4)
 #3 0x00007ff7519b6fe8 llvm::InitializeAllAsmPrinters() (C:\msys64\clangarm64\bin\clang.exe+0x6fe8)
 #4 0x00007ffef8f9323c llvm::report_fatal_error(llvm::Twine const&, bool) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x9323c)
 #5 0x00007ffef8f9311c llvm::report_fatal_error(char const*, bool) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x9311c)
 #6 0x00007ffefa5e8f4c llvm::Win64EH::ARM64UnwindEmitter::Emit(llvm::MCStreamer&) const (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x16e8f4c)
 #7 0x00007ffefa5de8d4 llvm::MCStreamer::emitWinCFIEndProc(llvm::SMLoc) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x16de8d4)
 #8 0x00007ffef9938478 llvm::WinException::endFuncletImpl() (C:\msys64\clangarm64\bin\libLLVM-17.dll+0xa38478)
 #9 0x00007ffef99380a8 llvm::WinException::endFunction(llvm::MachineFunction const*) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0xa380a8)
#10 0x00007ffef98d848c llvm::AsmPrinter::emitFunctionBody() (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x9d848c)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 17.0.2
Target: aarch64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/msys64/clangarm64/bin
clang: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: C:/msys64/tmp/dcopy_k_THUNDERX2T99-ce9e97.c
clang: note: diagnostic msg: C:/msys64/tmp/dcopy_k_THUNDERX2T99-ce9e97.sh
clang: note: diagnostic msg:

********************
ninja: build stopped: subcommand failed.
==> ERROR: A failure occurred in build().
    Aborting...
@hmartinez82
Copy link
Author

dcopy_k_THUNDERX2T99-ce9e97.zip
Here are the two files that Clang says to provide

@EugeneZelenko EugeneZelenko added backend:AArch64 llvm:asmparser crash Prefer [crash-on-valid] or [crash-on-invalid] and removed new issue llvm:asmparser labels Oct 18, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 18, 2023

@llvm/issue-subscribers-backend-aarch64

Author: Hernan Martinez (hmartinez82)

I gave it another try on building OpenBLAS with DYNAMIC_ARCH=ON (see https://github.com/msys2/MINGW-packages/blob/b07f75117ed0cfe6e1f000de7caaffb433cb8c80/mingw-w64-openblas/PKGBUILD#L72) for Windows on ARM after patching Clang with https://github.com//pull/67894 Is it possible that I also have to add another patch for supporting the whole fix or this something new related only to ThunderX2?

Clang is still crashing.

[9616/15543] Building C object kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj
FAILED: kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj
C:\msys64\clangarm64\bin\clang.exe  -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24/lapack-netlib/LAPACKE/include -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24 -IC:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel_config/THUNDERX2T99 -Wno-unused-function -Wno-unused-variable  -DHAVE_C11 -DMS_ABI -fopenmp=libomp -DUSE_OPENMP -Wall -DF_INTERFACE_GFORT -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DNO_AFFINITY -DVERSION="\"0.3.24\"" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -DBUILD_KERNEL -DTABLE_NAME=gotoblas_THUNDERX2T99  -DTS=_THUNDERX2T99 -MD -MT kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -MF kernel\CMakeFiles\kernel_THUNDERX2T99.dir\CMakeFiles\dcopy_k_THUNDERX2T99.c.obj.d -o kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -c C:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/dcopy_k_THUNDERX2T99.c
fatal error: error in backend: Failed to evaluate function length in SEH unwind info
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: C:\\msys64\\clangarm64\\bin\\clang.exe -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24/lapack-netlib/LAPACKE/include -IC:/M_P/mingw-w64-openblas/src/OpenBLAS-0.3.24 -IC:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel_config/THUNDERX2T99 -Wno-unused-function -Wno-unused-variable -DHAVE_C11 -DMS_ABI -fopenmp=libomp -DUSE_OPENMP -Wall -DF_INTERFACE_GFORT -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DNO_AFFINITY -DVERSION=\"0.3.24\" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -DBUILD_KERNEL -DTABLE_NAME=gotoblas_THUNDERX2T99 -DTS=_THUNDERX2T99 -MD -MT kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -MF kernel\\CMakeFiles\\kernel_THUNDERX2T99.dir\\CMakeFiles\\dcopy_k_THUNDERX2T99.c.obj.d -o kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/dcopy_k_THUNDERX2T99.c.obj -c C:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/dcopy_k_THUNDERX2T99.c
1.      &lt;eof&gt; parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'C:/M_P/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/dcopy_k_THUNDERX2T99.c'.
4.      Running pass 'AArch64 Assembly Printer' on function '@<!-- -->dcopy_k_THUNDERX2T99'
Exception Code: 0xE0000046
 #<!-- -->0 0x00007fff9ff06334 (C:\Windows\System32\KERNELBASE.dll+0x76334)
 #<!-- -->1 0x00007ffef8f85ad0 llvm::CrashRecoveryContext::HandleExit(int) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x85ad0)
 #<!-- -->2 0x00007ffef9029ff4 llvm::sys::Process::Exit(int, bool) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x129ff4)
 #<!-- -->3 0x00007ff7519b6fe8 llvm::InitializeAllAsmPrinters() (C:\msys64\clangarm64\bin\clang.exe+0x6fe8)
 #<!-- -->4 0x00007ffef8f9323c llvm::report_fatal_error(llvm::Twine const&amp;, bool) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x9323c)
 #<!-- -->5 0x00007ffef8f9311c llvm::report_fatal_error(char const*, bool) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x9311c)
 #<!-- -->6 0x00007ffefa5e8f4c llvm::Win64EH::ARM64UnwindEmitter::Emit(llvm::MCStreamer&amp;) const (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x16e8f4c)
 #<!-- -->7 0x00007ffefa5de8d4 llvm::MCStreamer::emitWinCFIEndProc(llvm::SMLoc) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x16de8d4)
 #<!-- -->8 0x00007ffef9938478 llvm::WinException::endFuncletImpl() (C:\msys64\clangarm64\bin\libLLVM-17.dll+0xa38478)
 #<!-- -->9 0x00007ffef99380a8 llvm::WinException::endFunction(llvm::MachineFunction const*) (C:\msys64\clangarm64\bin\libLLVM-17.dll+0xa380a8)
#<!-- -->10 0x00007ffef98d848c llvm::AsmPrinter::emitFunctionBody() (C:\msys64\clangarm64\bin\libLLVM-17.dll+0x9d848c)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 17.0.2
Target: aarch64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/msys64/clangarm64/bin
clang: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: C:/msys64/tmp/dcopy_k_THUNDERX2T99-ce9e97.c
clang: note: diagnostic msg: C:/msys64/tmp/dcopy_k_THUNDERX2T99-ce9e97.sh
clang: note: diagnostic msg:

********************
ninja: build stopped: subcommand failed.
==&gt; ERROR: A failure occurred in build().
    Aborting...

@martin-frbg
Copy link

Is there something I could do on the OpenBLAS side to help move this along, or some workaround that would allow me to supply a suitable estimate for whatever clang needs to compute the function length here ? The error persists with 19.1

@hmartinez82
Copy link
Author

hmartinez82 commented Jan 11, 2025

Turns out that it's actually these cores that are failing to build:

  • THUNDERX2T99
  • THUNDERX3T110
  • NEOVERSEN1
  • NEOVERSEV1
  • NEOVERSEN2
  • ARMV8SVE
  • A64FX

and working with these cores:

  • CORTEXA53
  • CORTEXA57
  • CORTEXA72
  • CORTEXA73
  • CORTEXA76
  • CORTEXX1
  • THUNDERX
  • TSV110
  • EMAG8180

Here's a table of working and crashing compiler flags:

working crashing
-march=armv8-a -mtune=cortex-a53 -march=armv8.1-a -mtune=thunderx2t99
-march=armv8-a -mtune=cortex-a57 -march=armv8.3-a -mtune=thunderx2t99
-march=armv8-a -mtune=cortex-a72 -march=armv8.2-a -mtune=neoverse-n1
-march=armv8-a -mtune=cortex-a73 -march=armv8.4-a+sve -mtune=cortex-x1
-march=armv8.2-a -mtune=cortex-a76 -march=armv8.2-a+sve+bf16 -mtune=cortex-a72
-march=armv8.2-a -mtune=cortex-x1 -march=armv8-a+sve
-march=armv8-a -mtune=thunderx -march=armv8.2-a+sve -mtune=a64fx
-march=armv8.2-a -mtune=tsv110 -march=armv8.4-a+sve
-march=armv8-a

@hmartinez82 hmartinez82 changed the title fatal error when building the OpenBLAS ThunderX2 dynamic kernel for Windows on ARM fatal error when building the OpenBLAS ThunderX2 and some other dynamic kernels for Windows on ARM Jan 11, 2025
@hmartinez82
Copy link
Author

hmartinez82 commented Jan 11, 2025

@DavidSpickett @efriedma-quic Do you think you guys can try another look at this?
I'm now trying this with 19.1.6 1.5 years later

[2/579] Building C object kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/sasum_k_THUNDERX2T99.c.obj
FAILED: kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/sasum_k_THUNDERX2T99.c.obj
C:\msys64\clangarm64\bin\clang.exe  -IC:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/OpenBLAS-0.3.28/lapack-netlib/LAPACKE/include -IC:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/OpenBLAS-0.3.28 -IC:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/build-CLANGARM64-32/kernel_config/THUNDERX2T99 -DHAVE_C11 -DMS_ABI -fopenmp=libomp -DUSE_OPENMP -Wall -DF_INTERFACE_GFORT -DGEMM_GEMV_FORWARD -DDYNAMIC_ARCH -DDYNAMIC_LIST -DDYN_CORTEXA53 -DDYN_THUNDERX2T99 -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DNO_AFFINITY -DVERSION="\"0.3.28\"" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -DBUILD_KERNEL -DTABLE_NAME=gotoblas_THUNDERX2T99  -DTS=_THUNDERX2T99 -MD -MT kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/sasum_k_THUNDERX2T99.c.obj -MF kernel\CMakeFiles\kernel_THUNDERX2T99.dir\CMakeFiles\sasum_k_THUNDERX2T99.c.obj.d -o kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/sasum_k_THUNDERX2T99.c.obj -c C:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/sasum_k_THUNDERX2T99.c
fatal error: error in backend: Failed to evaluate function length in SEH unwind info
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: C:\\msys64\\clangarm64\\bin\\clang.exe -IC:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/OpenBLAS-0.3.28/lapack-netlib/LAPACKE/include -IC:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/OpenBLAS-0.3.28 -IC:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/build-CLANGARM64-32/kernel_config/THUNDERX2T99 -DHAVE_C11 -DMS_ABI -fopenmp=libomp -DUSE_OPENMP -Wall -DF_INTERFACE_GFORT -DGEMM_GEMV_FORWARD -DDYNAMIC_ARCH -DDYNAMIC_LIST -DDYN_CORTEXA53 -DDYN_THUNDERX2T99 -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=64 -DMAX_PARALLEL_NUMBER=1 -DNO_AFFINITY -DVERSION=\"0.3.28\" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 -O3 -DNDEBUG -DBUILD_KERNEL -DTABLE_NAME=gotoblas_THUNDERX2T99 -DTS=_THUNDERX2T99 -MD -MT kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/sasum_k_THUNDERX2T99.c.obj -MF kernel\\CMakeFiles\\kernel_THUNDERX2T99.dir\\CMakeFiles\\sasum_k_THUNDERX2T99.c.obj.d -o kernel/CMakeFiles/kernel_THUNDERX2T99.dir/CMakeFiles/sasum_k_THUNDERX2T99.c.obj -c C:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/sasum_k_THUNDERX2T99.c
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'C:/Dev/Github/MINGW-packages/mingw-w64-openblas/src/build-CLANGARM64-32/kernel/CMakeFiles/sasum_k_THUNDERX2T99.c'.
4.      Running pass 'AArch64 Assembly Printer' on function '@sasum_k_THUNDERX2T99'
Exception Code: 0xE0000046
#0 0x00007ffe5a646248 (C:\Windows\System32\KERNELBASE.dll+0xb6248)
#1 0xc73d7ffd70a8d7d8
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 19.1.6
Target: aarch64-w64-windows-gnu
Thread model: posix
InstalledDir: C:/msys64/clangarm64/bin
clang: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: C:/msys64/tmp/sasum_k_THUNDERX2T99-ee3aec.c
clang: note: diagnostic msg: C:/msys64/tmp/sasum_k_THUNDERX2T99-ee3aec.sh
clang: note: diagnostic msg:

********************
ninja: build stopped: subcommand failed.

sasum_k_THUNDERX2T99-ee3aec.zip

I also got the whole file preprocessed with -E. This is the output, and indeed, there's a function called sasum_k_THUNDERX2T99 towards the end of the file.

preprocessed.zip

@martin-frbg
Copy link

@hmartinez82 I'm not sure if your tables are useful in this context - it is most likely not the presence or absence of mtune flags but simply the reuse of originally Thunderx2t99-specific BLAS kernels (with instruction sequences that clang fails to handle) for later cpus

@martin-frbg
Copy link

the commonality here seems to be the use of .align directives in the affected assembly files. I guess we could try removing (#ifdef'ing) them to see if the code compiles - though I'm not yet sure what that would do to performance.

@hmartinez82
Copy link
Author

hmartinez82 commented Jan 13, 2025

@martin-frbg That's an awesome finding. I'll create a much simpler Bug report!

I can repro this issue with a much simpler code, building with clang -c

int f(int i) {
    int result;
    __asm__ (
        ".align 5 \n"
        "add %w0, %w1, #41"
        : "=r" (result)
        : "r" (i)
        :
    );
    return result;
}

@martin-frbg
Copy link

I see now that one of the developer comments on the earlier #66912 (which led to the patch you mentioned in your initial message here) already put that problem down to "there's an alignment in the middle of a function that's keeping us from calculating the length, so stop including alignments for now in code we generate". So it appears to be a known limitation compared to gcc et al., only now it is hand-coded assembly that has the "ugly" instruction so the "don't do that then" approach inside the compiler is of no help

@martin-frbg
Copy link

I just need to check what drawbacks the workaround has - I expect the code will still work with the alignments protected by an #ifndef __clang__ but I have no idea how much of the performance advantage of the ThunderX2T99 implementation over the current alternatives will be lost (if any)

@EugeneZelenko EugeneZelenko added the duplicate Resolved as duplicate label Jan 13, 2025
@hmartinez82
Copy link
Author

Is there something I could do on the OpenBLAS side to help move this along, or some workaround that would allow me to supply a suitable estimate for whatever clang needs to compute the function length here ? The error persists with 19.1

@martin-frbg Yes. If you move the inline assembly to standalone .S file, then it should work. All the other .S files compile without issue.

@martin-frbg
Copy link

Is there something I could do on the OpenBLAS side to help move this along, or some workaround that would allow me to supply a suitable estimate for whatever clang needs to compute the function length here ? The error persists with 19.1

@martin-frbg Yes. If you move the inline assembly to standalone .S file, then it should work. All the other .S files compile without issue.

Thanks - not quite what I asked for, however. Guess one needs to wait until #47432 finally gets addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 crash Prefer [crash-on-valid] or [crash-on-invalid] duplicate Resolved as duplicate
Projects
None yet
Development

No branches or pull requests

4 participants