Description
The current SYCL run-time implementation is reliant on SPIRV which is a problem if you have a device that doesn't support SPIRV just yet and relies on the old built-ins from OpenCL/SPIR, e.g. get_global_id rather than GlobalInvocationId. This is a bit of a problem for us as one of our ideal use cases is to output LLVM-IR for the device side code and feed it into our other tools which make use of SPIR/OpenCL C built-ins. As currently the IR will contain the SPIRV calls which need to be worked around on our end. It also may be interesting to support both in the long term under the concept that you wish to target multiple devices on the one platform and one does not support SPIRV.
So an initial question is, is this relevant to the current SYCL implementation? Or is this perhaps just something that's relevant to our aims at Xilinx and can thus be ignored (which is fair enough).
If the answer is that it is relevant then the question is how best to implement this and I have came up with some preliminary ideas to get the ball rolling:
-
Solution we currently have/are working on, include OpenCL C built-in functions inside the cl::__spirv namespace and optionally define them based on a pre-processor macro (at the moment we have only added the ones we are interested in/that are used in the implementation as a minimalist approach, just including the opencl-c.h inside of a namespace is another approach though). An LLVM pass removes the namespace mangling from the name and you end up with the identical SPIR built-in mangling on the device. This seems to avoid host conflicts with same named user functions so far. Device conflicts are a little more tricky as you have to rename the mangling of the user defined function you have a conflict with during the pass after you've removed the namespace manglings from all built-ins.
The pass could perhaps be turned off and on for SPIRV based on the define that optionally includes OpenCL/SPIR built-ins. The define could work similarly to SYCL__DEVICE_ONLY in that the driver defines it based on some input arguments or it could be user specified on compilation.
The pass could in theory be part of the Reflower I suppose, but it may be putting too much emphasis on the Reflower at that point. Also having old OpenCL built-ins in the cl::__spirv namespace isn't really ideal if it's not actually SPIRV as it's misleading, so having them in a cl::__spir namespace and perhaps a SPIR folder is a better end goal.
Note: The original idea for this was hopefully not to have to include any additional SPIR/OpenCL built-ins into the run-time API and to try and impact the compiler as little as possible through having a pass that just converts SPIRV intrinsics to SPIR intrinsics. However some of the cl::__spirv built-ins are trivial to convert in this manner (math.hpp) and others are a little more complex (spirv_vars.hpp) and opt phases can impact how they're translated to SPIR manglings in the pass. So unfortunately addition of the OpenCL C built-ins in someway are a requirement in this solution. At least if we want to keep the pass simple and more maintainable (keeping the pass too linked to SPIRV components in the run-time that may change frequently seems like a nightmare). -
Variation of the above except that we define intermediate/placeholder calls that will get replaced by SPIR/SPIRV built-ins by a later pass. This concept of placeholder calls to swap between built-in's may not be great, as it probably falls prey to the same arguments as the accessor class at the moment (string comparisons/brittle).
-
Another mangling approach along the lines of the above except based on some variation of llvm/lib/Target/AMDGPU/AMDGPULibFunc.h/cpp except implemented for SYCL.h/.cpp (also similar to the Reflower I suppose, so perhaps its just idea 2 with extra steps..), placeholder function is used and mapped to either SPIR or SPIRV based on the target triple or a flag. Unsure how feasible this approach is as certain things in SPIRV don't function the same as in SPIR e.g. get_global_id is a function call but GlobalInvocationId in SPIRV seems to be more akin to CUDA's threadId, so the resulting IR is different.
-
Sema based approach much like SemaSYCL just now. Placeholder built-in's/calls that have there AST modified by a TreeTransform to be either a SPIR or SPIRV style built-in.
I welcome any ideas/input/feedback on this as there is without a doubt better ways of doing this and flaws in the above cursory ideas. In the above approaches I tunnel visioned a little on our specific needs (e.g. not that bothered about speed of compilation, just need the correct mangled SPIR builtin names in the final LLVM-IR so we can link against our own libraries). I'm also happy to clarify any of the information above if the brief descriptions/spewing of thoughts are unclear!
As an aside: Perhaps a generalized way to offload/map to a specific devices built-ins could be an interesting direction. Alexey also mentioned some interest in a unified format for SPIR intrinsics in LLVM-IR as SPIR 1.2/2.0 is a little lacking. One example of a SPIR problem mentioned in idea 1, is that SPIR 1.2/2.0 manglings can cause conflicts with user defined functions of the same name if they're not handled correctly.