Skip to content

[SYCL][NativeCPU] Build libclc target-independently. #17408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 9, 2025

Conversation

hvdijk
Copy link
Contributor

@hvdijk hvdijk commented Mar 12, 2025

This change creates a Clang NativeCPU target to be used for creating LLVM IR for NativeCPU, builds libclc with this target, and does the initial processing of NativeCPU device code with this target.

This allows the libclc to be independent of the concrete host target, allowing libclc to be re-used across different host targets, and across different ABIs for the same host target.

This change creates a Clang NativeCPU target to be used for creating
LLVM IR for NativeCPU, builds libclc with this target, and does the
initial processing of NativeCPU device code with this target.

This allows the libclc to be independent of the concrete host target,
allowing libclc to be re-used across different host targets, and across
different ABIs for the same host target.
@hvdijk hvdijk requested review from a team as code owners March 12, 2025 13:33
@hvdijk hvdijk requested a review from frasercrmck March 12, 2025 13:33
@hvdijk hvdijk temporarily deployed to WindowsCILock March 12, 2025 14:02 — with GitHub Actions Inactive
@hvdijk hvdijk temporarily deployed to WindowsCILock March 12, 2025 14:02 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Mar 12, 2025

How is native_cpu different from any other virtual target that clang already supports?

@hvdijk
Copy link
Contributor Author

hvdijk commented Mar 12, 2025

How is native_cpu different from any other virtual target that clang already supports?

There are two main targets that take most properties from the host like NativeCPU: NVPTX and SPIR. SPIR was what I went for initially and I found that that contains a lot of logic that does not work for NativeCPU. All the driver handling to handle SPIR-V files does not apply to NativeCPU, there are issues with the non-zero default address space of SPIR, the default spir_func ABI being incompatible with the cdecl ABI for host architectures causes LLVM to assume a lot of calls as unreachable, we do not want to define the same predefined macros, and we want to be able to embed SPIR-V and NativeCPU device code in the same binary which requires a separate triple. NVPTX would avoid some of these issues, but not all, and has its own set of issues with builtins that we do not support, needing a specific GPU, and I think it may cause maintenance issues as we should not NVPTX maintainers to take NativeCPU into account when making changes to their target. The only way around that I saw was to add a new target.

For completeness: the other targets have at least some level of support in LLVM, at least by being recognized for llvm::Triple. To keep this PR simpler and avoid problems with LLVM pulldowns, I kept this one strictly limited to Clang, so that is an additional difference, but that is one that does not affect how the target works.

@sarnex
Copy link
Contributor

sarnex commented Mar 12, 2025

Maybe a dumb question but can we not just use the exact host target? Do we need a special one that is essentially a wrapper around the host target?

@hvdijk
Copy link
Contributor Author

hvdijk commented Mar 12, 2025

Using the exact host target is what we do now. It results in libclc being compiled exactly for host, with target-dependent bitcode that cannot be used with other host targets or with other host ABIs. For instance, on x86_64, -mavx affects the vector ABI, so currently, when libclc is built without -mavx, and a SYCL application is built with -mavx, things break. With this PR, that works.

Likewise, when cross-compiling from e.g. x86_64 to AArch64, the host compiler will only contain a libclc and libdevice for x86_64, not for AArch64, and additional options (which are not yet available) are needed to find the AArch64 version of libclc and libdevice. With this PR, that is no longer an issue, because the same libclc and libdevice works for both.

@bader
Copy link
Contributor

bader commented Mar 12, 2025

LLVM IR by design is target dependent (e.g. https://www.youtube.com/watch?v=8RVRQrf15yY talk starts a few things to watch out for when changing the target for LLVM IR). I guess your patch works because you tuned new NativeCPU target to handle retargeting of existing libclc code for the host architectures you tested.
Can we guarantee that future libclc code changes won't make LLVM produced for NativeCPU target incompatible with the host?
Is NativeCPU IR portable to the hosts other than x86 and ARM (if LLVM IR doesn't not guarantee that)?
This sounds like a footgun which can be a headache for the maintainers.

@frasercrmck, do you think this PR is an upstream-able solution?

* shuffle_abi tested the pass that has been removed, so the test can go too.
* Other tests are adjusted as appropriate.
* Remove AArch64 special case in libclc.
@hvdijk hvdijk temporarily deployed to WindowsCILock March 12, 2025 22:57 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Mar 12, 2025

LLVM IR by design is target dependent (e.g. https://www.youtube.com/watch?v=8RVRQrf15yY talk starts a few things to watch out for when changing the target for LLVM IR).

I am aware, and this PR does not aim to make NativeCPU LLVM IR target-independent in general. It is intentional that almost all properties are picked up from the host, and that means that anything that depends on those properties will result in target-dependent IR. But that is not the sort of code that we have in libclc.

I guess your patch works because you tuned new NativeCPU target to handle retargeting of existing libclc code for the host architectures you tested. Can we guarantee that future libclc code changes won't make LLVM produced for NativeCPU target incompatible with the host?

This does result in LLVM IR that has not been translated in accordance with the platform ABI, so in that sense, it already is incompatible with host. But that is okay, these are not points where we need to comply with the platform ABI: this is all internal to LLVM. This affects functions defined in LLVM, called by LLVM, where it works as long as LLVM is internally consistent. The point where we need to be sufficiently compatible is where we enter the device code, and where we call external library functions. The former we can handle because all kernels are wrapped in a work item loop and any compatibility concerns can be addressed there (currently I am not aware of any issue), and the latter we don't do other than through LLVM builtins. Those LLVM builtins are only processed once we are back using the "real" target.

If future libclc changes do turn out to cause problems for specific functions, we can write NativeCPU-specific versions of them. We do this already and this PR actually reduces the need for NativeCPU-specific versions: we currently have a custom fma implementation in there because the generic version results in a compiler crash. After this PR, I plan to submit a follow-up PR to remove that custom fma implementation because the generic version now just works.

Is NativeCPU IR portable to the hosts other than x86 and ARM (if LLVM IR doesn't not guarantee that)?

We already include riscv64 in our testing as well; this works for that too.

This sounds like a footgun which can be a headache for the maintainers.

Note that it is in my capacity as a NativeCPU maintainer that I am submitting this. :)

* Hopefully explain better how the NativeCPU compilation process works.
* Add a check to clang/test/Driver/sycl-native-cpu.cpp that we suppress
  warnings about module target mismatch.
@hvdijk
Copy link
Contributor Author

hvdijk commented Mar 25, 2025

ping @intel/dpcpp-tools-reviewers @intel/dpcpp-cfe-reviewers

@hvdijk hvdijk temporarily deployed to WindowsCILock March 25, 2025 12:32 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Apr 1, 2025

ping @intel/dpcpp-tools-reviewers

@hvdijk
Copy link
Contributor Author

hvdijk commented Apr 7, 2025

ping again @intel/dpcpp-tools-reviewers

@hvdijk
Copy link
Contributor Author

hvdijk commented Apr 8, 2025

Updated to resolve a conflict with the latest pulldown. @intel/dpcpp-tools-reviewers @sarnex It's getting close to a month, the updates to deal with conflicts are a hassle, plus this is holding up other PRs, could I please get this reviewed?

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools lgtm

@hvdijk hvdijk temporarily deployed to WindowsCILock April 8, 2025 14:53 — with GitHub Actions Inactive
@hvdijk hvdijk temporarily deployed to WindowsCILock April 8, 2025 15:57 — with GitHub Actions Inactive
@hvdijk hvdijk temporarily deployed to WindowsCILock April 8, 2025 15:57 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Apr 9, 2025

@intel/llvm-gatekeepers This is ready to be merged, thanks

@kbenzie kbenzie merged commit e811c53 into intel:sycl Apr 9, 2025
24 checks passed
hvdijk added a commit to hvdijk/dpcppllvm that referenced this pull request Jul 23, 2025
In intel#17408, NativeCPU became a target in order to be able to pick the ABI
for its own libclc functions consistently, without having targets affect
this. This was, and is, required to be able to use libclc independent of
target and target options. However, it breaks some calls into libc.
Therefore, this PR allows the calling convention to be explicitly
specified, ensures it is specified for any libclc functions, and ensures
it is not specified for any libc functions.

Fixes the SYCL-E2E acos, cmath, and exp-std-complex tests.
hvdijk added a commit to hvdijk/dpcppllvm that referenced this pull request Jul 25, 2025
In intel#17408, NativeCPU became a target in order to be able to pick the ABI
for its own libclc functions consistently, without having targets affect
this. This was, and is, required to be able to use libclc independent of
target and target options. However, it breaks some calls into libc.
Therefore, this PR allows the calling convention to be explicitly
specified, ensures it is specified for any libclc functions, and ensures
it is not specified for any libc functions.

Fixes the SYCL-E2E acos, cmath, and exp-std-complex tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants