You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
In CUDA, a kernel functor can be copied to a CUDA symbol, which can be marked __const__. This enables the compiler to perform various optimisations including LICM & CSE. DPC++ does not currently support this feature; kernel arguments are always passed into global device memory. This means that significant optimisation opportunities are lost.
An example of the use of CUDA Symbols from Kokkos:
// Copy functor asynchronously from there to constant memory on the devicecudaMemcpyToSymbolAsync(kokkos_impl_cuda_constant_memory_buffer, staging,
sizeof(DriverType), 0, cudaMemcpyHostToDevice,
cudaStream_t(cuda_instance->m_stream));
// Invoke the driver function on the device
(base_t::
get_kernel_func())<<<grid, block, shmem, cuda_instance->m_stream>>>();
We have observed up to 20% improvement in kernel performance from Kokkos' CUDA backend when using symbols, and expect similar improvements in Kokkos' SYCL backend.
Describe the solution you would like
It would be useful for a SYCL user to be able to mark specific kernel functions to have their arguments loaded into a CUDA symbol in constant memory. This feature could be enabled for the SYCL CUDA backend via changes to the Clang driver & SYCL Runtime, and the addition of a new kernel attribute.
Kernel Class
A new attribute on the kernel class (e.g. sycl::kernel_constant_mem) would inform the Clang driver that the given kernel should read parameters from a CUDA symbol in the constant memory space as opposed to standard kernel parameters in the global memory space.
Driver
The driver should take kernel functions marked sycl::kernel_constant_mem and replace e.g. ld.param instructions with ld.const instructions which point to offsets in a single constant memory symbol.
LLVM Pass
The LLVM pass which performs this conversion should occur as early as possible. Specifically, it should occur before optimisation passes which depend on the const-ness of the kernel parameters.
The driver is also responsible for allocating this single constant device symbol with a stable name which is known to the SYCL runtime.
An additional flag to the compiler (e.g. -fsycl-use-constant-symbols) informs the compiler whether to perform this LLVM pass & symbol allocation.
SYCL Runtime
If the -fsycl-use-constant-symbols flag was enabled, and the SYCL Runtime launches a kernel marked with sycl::kernel_constant_mem, it should first copy the flattened functor to the constant CUDA symbol allocated by the driver. The runtime need not perform any modifications to the kernel, nor does it need to store duplicate 'constant' & 'non-constant' versions, because this is handled by the compiler.
Notes
The proposed solution involves a single constant memory allocation which consumes the entirety of the device's constant memory. When the runtime handles multiple sycl::kernel_constant_mem kernels, the constant symbol is reused.
The text was updated successfully, but these errors were encountered:
That sounds like a useful feature.
It would be preferable to use a compile-time kernel property à la#4280 or a decorating function, instead of adding yet another attribute.
Is your feature request related to a problem? Please describe
In CUDA, a kernel functor can be copied to a CUDA symbol, which can be marked __const__. This enables the compiler to perform various optimisations including LICM & CSE. DPC++ does not currently support this feature; kernel arguments are always passed into global device memory. This means that significant optimisation opportunities are lost.
An example of the use of CUDA Symbols from Kokkos:
We have observed up to 20% improvement in kernel performance from Kokkos' CUDA backend when using symbols, and expect similar improvements in Kokkos' SYCL backend.
Describe the solution you would like
It would be useful for a SYCL user to be able to mark specific kernel functions to have their arguments loaded into a CUDA symbol in constant memory. This feature could be enabled for the SYCL CUDA backend via changes to the Clang driver & SYCL Runtime, and the addition of a new kernel attribute.
Kernel Class
A new attribute on the kernel class (e.g.
sycl::kernel_constant_mem
) would inform the Clang driver that the given kernel should read parameters from a CUDA symbol in the constant memory space as opposed to standard kernel parameters in the global memory space.Driver
The driver should take kernel functions marked
sycl::kernel_constant_mem
and replace e.g.ld.param
instructions withld.const
instructions which point to offsets in a single constant memory symbol.LLVM Pass
The LLVM pass which performs this conversion should occur as early as possible. Specifically, it should occur before optimisation passes which depend on the const-ness of the kernel parameters.
The driver is also responsible for allocating this single constant device symbol with a stable name which is known to the SYCL runtime.
An additional flag to the compiler (e.g.
-fsycl-use-constant-symbols
) informs the compiler whether to perform this LLVM pass & symbol allocation.SYCL Runtime
If the
-fsycl-use-constant-symbols
flag was enabled, and the SYCL Runtime launches a kernel marked withsycl::kernel_constant_mem
, it should first copy the flattened functor to the constant CUDA symbol allocated by the driver. The runtime need not perform any modifications to the kernel, nor does it need to store duplicate 'constant' & 'non-constant' versions, because this is handled by the compiler.Notes
The proposed solution involves a single constant memory allocation which consumes the entirety of the device's constant memory. When the runtime handles multiple
sycl::kernel_constant_mem
kernels, the constant symbol is reused.The text was updated successfully, but these errors were encountered: