-
Notifications
You must be signed in to change notification settings - Fork 579
Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @swolchok @digantdesai - do you know? |
there's conditional compilation for this in the PyTorch version of this file. they need to be put back in sync or ideally, refactored and shared since we now have support for sharing code with PyTorch core. (and per @malfet we should be attempting to detect whether the compiler will actually support this stuff at CMake time rather than hardcoding compiler versions.) |
I am busy with other things right now, but I am very likely the person to fix this. |
@swolchok I assume this issue might be related to Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508: "For older Arm GCC (<13.1 IIRC) we need to use __fp16 and include <arm_fp16.h>, but for newer Arm GCC, _Float16 is available. " |
I'm not entirely sure what the best way to share the code for these matmul kernels is. Our current model of code sharing has two buckets:
I am having trouble with this code because it does not really make sense to put matmul kernels in a header; the only reason to do it is code sharing via bucket (2). Some options that occur to me: a) clean up the CPUBlas code in PyTorch and put whatever we need to share in headers, even if it doesn't really make sense to put it in a header otherwise. (bucket (2) above) I don't think anybody is in favor of (c) at this point, but I'm unsure how to choose between (a) and (b). (b) feels ickier, but I suspect it may actually be more harmless than (a). (Another consideration is that updating the PyTorch pin can be very slow. I've been waiting for 3 weeks on the current bump. I don't think that this is a difference between (a) and (b), though.) @mergennachin / @kimishpatel / @iseeyuan , do you have any thoughts? |
(a) would result in the following decision tree for sharing things:
(b) would result in the following decision tree for sharing things:
As an ongoing process, (b) sure looks less likely to cause silly things to happen. |
b makes more sense when code-to-be-shared doesnt belong to header and we force it that can lead to other consequences. If we can establish automated way to mirroring from pytorch core that would be acceptable. Although absence of that wont really break us. The other thing we would have to ensure is no one submits a PR that touches those files |
We have a CI job that can do that. |
@spalatinate #10868 should fix this. can you give it a go once CI comes back green (EDIT: it's green) please? |
I was able to build with cmake using CXX=clang++-14 CC=clang-14 on my personal RPi 5. closing this out. re-open if it's still broken in the same way or file another issue for different build problems. |
(by the way, why use clang-14? there seems to be a clang-19 package available on my Pi) |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
As discussed with @kirklandsign in Issue #8508, I am opening a separate one here.
I was trying to build executorch locally on my RPi5. It worked fine using the Clang compiler (version 14.0.6) and the release/0.4 branch. Now, with the release/0.5 and main branch, I am running into the error below. I guess it is related to the Clang compiler because when I switch to g++/gcc building executorch works just fine.
Versions
Collecting environment information...
PyTorch version: 2.7.0.dev20250131+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 12 (bookworm) (aarch64)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: 14.0.6
CMake version: version 3.31.6
Libc version: glibc-2.36
Python version: 3.10.0 (default, Mar 3 2022, 09:51:40) [GCC 10.2.0] (64-bit runtime)
Python platform: Linux-6.6.74+rpt-rpi-v8-aarch64-with-glibc2.36
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A76
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r4p1
CPU(s) scaling MHz: 100%
CPU max MHz: 2400,0000
CPU min MHz: 1500,0000
BogoMIPS: 108,00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache: 256 KiB (4 instances)
L1i cache: 256 KiB (4 instances)
L2 cache: 2 MiB (4 instances)
L3 cache: 2 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] executorch==0.6.0a0+542480c
[pip3] numpy==2.2.3
[pip3] torch==2.7.0.dev20250131+cpu
[pip3] torchao==0.10.0+git7d879462
[pip3] torchaudio==2.6.0.dev20250131
[pip3] torchgen==0.0.1
[pip3] torchsr==1.0.4
[pip3] torchvision==0.22.0.dev20250131
[conda] executorch 0.6.0a0+542480c pypi_0 pypi
[conda] numpy 2.2.3 pypi_0 pypi
[conda] torch 2.7.0.dev20250131+cpu pypi_0 pypi
[conda] torchao 0.10.0+git7d879462 pypi_0 pypi
[conda] torchaudio 2.6.0.dev20250131 pypi_0 pypi
[conda] torchgen 0.0.1 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.22.0.dev20250131 pypi_0 pypi
cc @larryliu0820 @lucylq
The text was updated successfully, but these errors were encountered: