-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL][NativeCPU] Build libclc target-independently. #17408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change creates a Clang NativeCPU target to be used for creating LLVM IR for NativeCPU, builds libclc with this target, and does the initial processing of NativeCPU device code with this target. This allows the libclc to be independent of the concrete host target, allowing libclc to be re-used across different host targets, and across different ABIs for the same host target.
How is |
There are two main targets that take most properties from the host like NativeCPU: NVPTX and SPIR. SPIR was what I went for initially and I found that that contains a lot of logic that does not work for NativeCPU. All the driver handling to handle SPIR-V files does not apply to NativeCPU, there are issues with the non-zero default address space of SPIR, the default For completeness: the other targets have at least some level of support in LLVM, at least by being recognized for |
Maybe a dumb question but can we not just use the exact host target? Do we need a special one that is essentially a wrapper around the host target? |
Using the exact host target is what we do now. It results in libclc being compiled exactly for host, with target-dependent bitcode that cannot be used with other host targets or with other host ABIs. For instance, on x86_64, Likewise, when cross-compiling from e.g. x86_64 to AArch64, the host compiler will only contain a libclc and libdevice for x86_64, not for AArch64, and additional options (which are not yet available) are needed to find the AArch64 version of libclc and libdevice. With this PR, that is no longer an issue, because the same libclc and libdevice works for both. |
LLVM IR by design is target dependent (e.g. https://www.youtube.com/watch?v=8RVRQrf15yY talk starts a few things to watch out for when changing the target for LLVM IR). I guess your patch works because you tuned new NativeCPU target to handle retargeting of existing libclc code for the host architectures you tested. @frasercrmck, do you think this PR is an upstream-able solution? |
* shuffle_abi tested the pass that has been removed, so the test can go too. * Other tests are adjusted as appropriate. * Remove AArch64 special case in libclc.
I am aware, and this PR does not aim to make NativeCPU LLVM IR target-independent in general. It is intentional that almost all properties are picked up from the host, and that means that anything that depends on those properties will result in target-dependent IR. But that is not the sort of code that we have in libclc.
This does result in LLVM IR that has not been translated in accordance with the platform ABI, so in that sense, it already is incompatible with host. But that is okay, these are not points where we need to comply with the platform ABI: this is all internal to LLVM. This affects functions defined in LLVM, called by LLVM, where it works as long as LLVM is internally consistent. The point where we need to be sufficiently compatible is where we enter the device code, and where we call external library functions. The former we can handle because all kernels are wrapped in a work item loop and any compatibility concerns can be addressed there (currently I am not aware of any issue), and the latter we don't do other than through LLVM builtins. Those LLVM builtins are only processed once we are back using the "real" target. If future libclc changes do turn out to cause problems for specific functions, we can write NativeCPU-specific versions of them. We do this already and this PR actually reduces the need for NativeCPU-specific versions: we currently have a custom
We already include riscv64 in our testing as well; this works for that too.
Note that it is in my capacity as a NativeCPU maintainer that I am submitting this. :) |
* Hopefully explain better how the NativeCPU compilation process works. * Add a check to clang/test/Driver/sycl-native-cpu.cpp that we suppress warnings about module target mismatch.
…r on the correct line
ping @intel/dpcpp-tools-reviewers @intel/dpcpp-cfe-reviewers |
ping @intel/dpcpp-tools-reviewers |
ping again @intel/dpcpp-tools-reviewers |
Updated to resolve a conflict with the latest pulldown. @intel/dpcpp-tools-reviewers @sarnex It's getting close to a month, the updates to deal with conflicts are a hassle, plus this is holding up other PRs, could I please get this reviewed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools lgtm
…) now needs to check for normalized triple.
@intel/llvm-gatekeepers This is ready to be merged, thanks |
In intel#17408, NativeCPU became a target in order to be able to pick the ABI for its own libclc functions consistently, without having targets affect this. This was, and is, required to be able to use libclc independent of target and target options. However, it breaks some calls into libc. Therefore, this PR allows the calling convention to be explicitly specified, ensures it is specified for any libclc functions, and ensures it is not specified for any libc functions. Fixes the SYCL-E2E acos, cmath, and exp-std-complex tests.
In intel#17408, NativeCPU became a target in order to be able to pick the ABI for its own libclc functions consistently, without having targets affect this. This was, and is, required to be able to use libclc independent of target and target options. However, it breaks some calls into libc. Therefore, this PR allows the calling convention to be explicitly specified, ensures it is specified for any libclc functions, and ensures it is not specified for any libc functions. Fixes the SYCL-E2E acos, cmath, and exp-std-complex tests.
This change creates a Clang NativeCPU target to be used for creating LLVM IR for NativeCPU, builds libclc with this target, and does the initial processing of NativeCPU device code with this target.
This allows the libclc to be independent of the concrete host target, allowing libclc to be re-used across different host targets, and across different ABIs for the same host target.