Skip to content

[SYCL][RTC] Use program manager #16316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 13, 2025
Merged

[SYCL][RTC] Use program manager #16316

merged 9 commits into from
Jan 13, 2025

Conversation

jopperm
Copy link
Contributor

@jopperm jopperm commented Dec 10, 2024

The idea is to assign a per-compilation prefix (e.g., rtc_42$) to all offload entries in the sycl_device_binaries datastructure before feeding them into ProgramManager::addImages, resulting in unique kernel_ids as far as the PM is concerned. When querying the PM for the device images to construct the kernel bundle, I look for kernel IDs starting with the current prefix, which should reliably return only the device images corresponding to current compilation request.

Note that the actual kernel names don't change, i.e. __sycl_kernel_foo keeps that name even though the PM might know it as rtc_42$__sycl_kernel_foo. The prefix is stored inside the bundle, and prepended to the requested kernel name in the ext_onapi_[has|get]_kernel(string) methods.

Kernel objects are also obtained via the program manager. Compared to creating the UR kernel from the selected device image's UR program directly, this approach ensures eliminated arguments are handled correctly. Hence, I was able to drop the previously mandatory -fno-sycl-dead-args-optimization from the pipeline.

The compilation pipeline and extended kernel bundle now support multiple device images. To test this, I added support for the -fsycl-device-code-split= option, and apply it to one of the compilations in the E2E test.

The persistent cache is circumvented for now for the sycl_jit language (lack of suitable on-disk format), but should be brought back in the future.

@jopperm jopperm self-assigned this Dec 10, 2024
@jopperm jopperm changed the title [SYCL][RTC] Experimental use of program manager to build device images [SYCL][RTC] Experimental use of program manager Dec 16, 2024
Signed-off-by: Julian Oppermann <[email protected]>
@jopperm jopperm marked this pull request as ready for review December 17, 2024 08:37
@jopperm jopperm requested review from a team as code owners December 17, 2024 08:37
@jopperm jopperm changed the title [SYCL][RTC] Experimental use of program manager [SYCL][RTC] Use program manager Dec 17, 2024
Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from one minor nit.

@sommerlukas
Copy link
Contributor

@jopperm
Copy link
Contributor Author

jopperm commented Dec 20, 2024

Do we still need this comment?

No, I'll drop it!

Signed-off-by: Julian Oppermann <[email protected]>
Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

Signed-off-by: Julian Oppermann <[email protected]>
@jopperm
Copy link
Contributor Author

jopperm commented Jan 8, 2025

A bit of additional context for the last commit which marks the constructed kernel bundle as interop:

// Mark this bundle explicitly as "interop" to ensure that its kernels are
// enqueued with the info from the kernel object passed by the application,
// cf. `enqueueImpKernel` in `commands.cpp`. While runtime-compiled kernels
// loaded via the program manager have `kernel_id`s, they can't be looked up
// from the (unprefixed) kernel name.
MIsInterop = true;

I'm doing this to get into the second branch here (there are more occurrences of this pattern in the codebase):

// Use kernel_bundle if available unless it is interop.
// Interop bundles can't be used in the first branch, because the kernels
// in interop kernel bundles (if any) do not have kernel_id
// and can therefore not be looked up, but since they are self-contained
// they can simply be launched directly.
if (KernelBundleImplPtr && !KernelBundleImplPtr->isInterop()) {
kernel_id KernelID =
detail::ProgramManager::getInstance().getSYCLKernelID(KernelName);
kernel SyclKernel =
KernelBundleImplPtr->get_kernel(KernelID, KernelBundleImplPtr);
SyclKernelImpl = detail::getSyclObjImpl(SyclKernel);
Kernel = SyclKernelImpl->getHandleRef();
DeviceImageImpl = SyclKernelImpl->getDeviceImage();
Program = DeviceImageImpl->get_ur_program_ref();
EliminatedArgMask = SyclKernelImpl->getKernelArgMask();
KernelMutex = SyclKernelImpl->getCacheMutex();
} else if (nullptr != MSyclKernel) {
assert(MSyclKernel->get_info<info::kernel::context>() ==
Queue->get_context());
Kernel = MSyclKernel->getHandleRef();
Program = MSyclKernel->getProgramRef();
// Non-cacheable kernels use mutexes from kernel_impls.
// TODO this can still result in a race condition if multiple SYCL
// kernels are created with the same native handle. To address this,
// we need to either store and use a ur_native_handle_t -> mutex map or
// reuse and return existing SYCL kernels from make_native to avoid
// their duplication in such cases.
KernelMutex = &MSyclKernel->getNoncacheableEnqueueMutex();
EliminatedArgMask = MSyclKernel->getKernelArgMask();
} else {

We can't take the first branch because we can't look up the correct kernel_id from the kernel name without adding the RTC-specific prefix. At a first glance, it doesn't seem feasible to also use the prefix in context of kernel objects, handler, commands, etc., because, IIUC, names there are tied to the actual function names, and hence shouldn't be modified. (The overall idea in this PR is to use unique names wrt. the PM's bookkeeping, but don't change the compiled programs.)

Note that the kernel bundle constructed before this PR was also marked as interop (by choice of the parent constructor), hence my commit just restores the situation as before. I'm not a 100% sure I understand all implications of the bundle being interop or not, so please let me know if I'm doing this wrong 🙏

Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me.

@sommerlukas sommerlukas merged commit 49fd770 into intel:sycl Jan 13, 2025
17 checks passed
sommerlukas pushed a commit that referenced this pull request Mar 14, 2025
Adds limited support for device globals in runtime-compiled SYCL code.
The application interacts with the globals via three new methods on
`kernel_bundle`:

```c++
bool ext_oneapi_has_device_global(const std::string &name);
void *ext_oneapi_get_device_global_address(const std::string &name, const device &dev); // return a USM pointer suitable for queue::memcpy etc.
size_t ext_oneapi_get_device_global_size(const std::string &name);
```

This PR uses the same trick as #16316, i.e. prepending a
kernel-bundle-specific prefix to the names of device globals to make
them distinguishable for the program manager.

Limitations:
- Device globals inside a namespace are unsupported due to insufficient
name mangling.
- Device globals with the `device_image_scope` property cannot be
read/written from the host, because the runtime currently cannot expose
USM pointers for them. A workaround is using explicit kernels to
read/write the global's value into a USM buffer.

---------

Signed-off-by: Julian Oppermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants