diff --git a/sycl/doc/design/MappingHostAddressesToDeviceEntities.md b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md new file mode 100644 index 0000000000000..3d22c3bf53392 --- /dev/null +++ b/sycl/doc/design/MappingHostAddressesToDeviceEntities.md @@ -0,0 +1,377 @@ +# Mapping host variables to compiler-generated info + +[SYCL 2020][sycl-2020-spec] specification and some extensions such as +[SYCL_INTEL_device_global][device-global-ext-spec] imply that the implementation +has the capability to somehow map addresses of a host objects to their +counterparts in device programs. + +[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html +[device-global-ext-spec]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc> + +For example, in order to implement specialization constants on top of SPIR-V, we +need to be able to map addresses of `specialization_id` variables to numeric +IDs of corresponding specialization constants at SPIR-V level. + +Another example is device global [implementation][device-global-design], where +in order to communicate a value of `device_global` variable between host and +device we need to map its host address to a symbolic name/identifier and some +other info like the size of an underlying type of a device global, which is used +at PI layer and below. + +[device-global-design]: + +This design document describes a generic way how to map the address of any SYCL +object defined in a namespace scope to its unique symbolic ID. Please note that +this document doesn't try to map the address to something other than a unique +symbolic ID: other required information is usually generated by the device +compiler and communicated to the runtime by device image properties. Unique +symbolic ID which can be obtained from mapping mechanism described in this +design document could be used as a key in those properties to propagate +additional information using existing mechanisms. + +So, the overall process is: +- (optionally) device compiler generates property set/s which provide mapping + "unique symbolic ID" -> "various information required by DPC++ RT". + Note: The presence and the format of those property set is defined case by + case for each feature +- device or host compiler generates mapping + "address of a host variable" -> "unique symbolic ID" (as described below by + this document) +- DPC++ RT uses these two mappings to obtain required information + +This design document describes two approaches for how the mapping of +"address of a host variable" -> "unique symbolic ID" can be generated: +the first one with integration footer and another one with modification of the +DPC++ host compiler. + +Both approaches have their pros and cons and they are expected to be implemented +and exist in the implementation at the same time. Only one of them will be +used at a time, depending on whether a 3rd-party host compiler is used or not. + +Integration footer can be used with 3rd-party host compilers. This, however +requires appending to a translation unit provided by a user, which could affect +debug information. Since there are no compilers that support appending a file at +the end (similar to `-include`), appending is done by generating a temporary +input file using concatenation of the original input and integration footer. + +Such replacement of the main translation unit causes the following issues: +- debug information about the source file might be incorrect, leading to + problems with gdb `l` command and code coverage tools +- checksum of host and device source files becomes different which causes device + code debugging to be completely broken in some environments (such as MS Visual + Studio, for example) + +Modifying DPC++ host compiler allows to avoid issues with debuggers and code +coverage tools, but that is not an option if a user wants to compile host part +of an app with a 3rd-party host compiler. + +The sections below describe the implementation design of both approaches in more +detail. Note that there are few components which should be modified regardless +of which approach is in use. + +## Common front-end part + +DPC++ FE should support the following attribute: +`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. The attribute accepts +a string literal and should be applied to types (like `device_global` or +`specialization_id`). + +Presence of the attribute instructs the compiler to perform the following +things: +- emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of + type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` + attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a + unique string identifier of the variable it is attached to. The rules for + creating this string are the same as for `__builtin_sycl_unique_stable_id` and + the same algorithm can be used when generating the string for the attribute +- emit `sycl-uid-kind` LLVM IR attribute alongside `sycl-unique-id`, which + contains the `kind` string passed via + `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute + +To illustrate, here is a SYCL code snippet: + +``` +template +class + // Note: the attribute usage will be guarded by macro to be only applied when + // DPC++ compiler is used to avoid generating warnings. That is described + // later in the doc + [[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] + specialization_id { + // ... +}; + +specialization_id spec_const(38); + +// some code which uses spec_const within a SYCL Kernel Function +``` + +After processed by DPC++ compiler, it will result in the following LLVM IR: + +``` +%class.specialization_id = type { i32 } +@spec_const = dso_local global %class.specialization_id { i32 38 } #0 + +attributes #0 = { "sycl-unique-id"="string returned by __builtin_sycl_unique_id(spec_const)" "sycl-uid-kind"="specialization_id" } +``` + +The new attribute should not be used for any semantic checking and its +sole purpose is to generate necessary LLVM IR attributes. If some feature +requires some semantic checks, then a separate attribute should be introduced +to perform them: for example see `[[__sycl_detail__::device_global]]` in +[device global design doc][device-global-design]. + +Note about `kind` argument: it should not be parsed by the compiler in any way +and it should be simply propagated as-is through the compiler stack to be used +later at runtime. + +When DPC++ compiler is used as both host and device compiler, then the attribute +should be respected by both host and device compiler passes and LLVM IR +attributes should appear in LLVM IR for both host and device code. When DPC++ +compiler is only used as a device compiler, then we don't expect the attribute +to be handled on host. + +Another thing we need from DPC++ FE host compiler is to define a special macro, +which will allow to distinguish it from other host compilers. That is needed to +apply the aforementioned attribute conditionally to avoid spamming users with +warnings about unknown attributes. + +**NOTE:** Alternatively we could simply set a macro which tells us whether or +not integration footer is enabled in the compiler driver instead of creating +a special macro for differentiating our own host compiler. + +The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined +when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag). + +## Common headers part + +Header files should be modified by adding the new attributes to types +declarations, objects of which we will need in our mapping. Again, +`device_global` and `specialization_id` are examples here: + +``` +template +class +#if defined(__SYCL_DEVICE_ONLY__) || defined(__INTEL_SYCL_HOST_COMPILER__) + [[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] +#endif +specialization_id { +// ... +}; +``` + +## Common runtime part + +The runtime should implement the following function, which will be called from +a code generated by the compiler (see the next section): + +``` +void __register_uniquely_identifiable_object( + void *Address, const char* UniqueID, const char *Kind); +``` + +The function accepts the following arguments: +- `Address` is an address of a variable, which exists in an application on host +- `UniqueID` is a unique symbolic ID, which corresponds to that variable +- `Kind` is a string which corresponds to `kind` argument passed to + `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute attached + to the type of the variable identified by `Address`. It can be used to + distinguish different entities like `specialization_id` and `device_global`: + for example they could be stored in different maps to speed up certain + operations with them. + +The compiler guarantees that the function will be called zero or more times +(depending on the amount of uniquely identifiable objects found in a program) +_before_ application's `main()` function and _before_ any other global +constructor defined in the same translation unit: this is needed to allow usages +of `specialization_id` and `device_global` variables from user-defined global +constructors. + +## Compiler driver part + +The compiler driver is the component which is responsible for selecting the +approach we are taking and the decision is made based on whether or not +3rd-party host compiler is in use. + +If `-fsycl-host-compiler` option is present, the compiler driver chooses the +integration footer approach: +- it supplies device compilation step with `-fsycl-int-footer` option to + instruct device compiler to emit an integration footer +- it appends the integration footer to user-provided translation unit before + passing it to a host compiler + +Otherwise, if `-fsycl-host-compiler` is not present, then the compiler driver +chooses another approach by simply doing nothing related to integration footer: +- `-fsycl-int-footer` is **not** passed to device compiler +- user-provided translation unit is passes as-is to host compiler + +## Integration footer approach + +When this approach is used, not only is an extra file (integration footer) +generated, but the integration header is also modified: FE compiler generates a +definition of a namespace scope variable of type +`__sycl_device_global_registration` whose sole purpose is to run its constructor +before the application's `main()` (and any other global constructor defined in +a user-provided translation unit) function: + +``` +namespace sycl::detail { +namespace { + +class __sycl_device_global_registration { + public: + __sycl_device_global_registration() noexcept; +}; +__sycl_device_global_registration __sycl_device_global_registrar; + +} // namespace (unnamed) +} // namespace sycl::detail +``` + +Examples below are written for the following code snippet: + +``` +#include + +static sycl::device_global Foo; +namespace inner { + sycl::device_global Bar; +} // namespace inner + +// ... +``` + +The integration footer generated by the compiler contains the definition of the +constructor, which calls a function in the DPC++ runtime, which registers +needed mappings: + +``` +namespace sycl::detail { +namespace { + +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { + __register_uniquely_identifiable_object( + &::Foo, + /* same string returned from __builtin_sycl_unique_stable_id(::Foo) */, + "device_global"); + __register_uniquely_identifiable_object( + &::inner::Bar, + /* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */, + "device_global"); +} + +} // namespace (unnamed) +} // namespace sycl::detail +``` + +Note: the integration footer is only populated with the registration object when +integration footer is enabled. Body of the registration object constructor can +be empty if there are no uniquely identifiable objects found in a translation +unit and FE is free to completely omit registration object generation in that +case as well. + +### Handling shadowed variables + +The example above shows a simple case where the user's device global variables +can all be uniquely referenced via fully qualified lookup (e.g. +`::inner::Bar`). However, it is possible for users to construct applications +where this is not the case, for example: + +``` +sycl::device_global FuBar; +namespace { + sycl::device_global FuBar; +} +``` + +In this example, the `FuBar` variable in the global namespace shadows a +variable with the same name in the unnamed namespace. The integration footer +can reference the variable in the global namespace as `::FuBar`, but there is +no way to reference the variable in the unnamed namespace using fully qualified +lookup. + +Such programs are still legal, though. The integration footer can support +cases like this by defining a shim function that returns a reference to the +shadowed device global: + +``` +namespace { +namespace __sycl_detail { + +static constexpr decltype(FuBar) &__shim_1() { + return FuBar; // References 'FuBar' in the unnamed namespace +} + +} // namespace __sycl_detail +} // namespace (unnamed) + +namespace sycl::detail { + +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { + __register_uniquely_identifiable_object( + &::FuBar, + /* same string returned from __builtin_sycl_unique_stable_id(::FuBar) */, + "device_global"); + __register_uniquely_identifiable_object( + &::__sycl_detail::__shim_1(), + /* same string returned from __builtin_sycl_unique_stable_id(::(unnamed)::FuBar) */, + "device_global"); +} + +} // namespace sycl::detail +``` + +The `__shim_1()` function is defined in the same namespace as the second +`FuBar` device global, so it can reference the variable through unqualified +name lookup. Furthermore, the name of the shim function is globally unique, so +it is guaranteed not to be shadowed by any other name in the translation unit. +This problem with variable shadowing is also a problem for the integration +footer we use for specialization constants. See the [specialization constant +design document][spec-constants-design] for more details on this topic. + +[spec-constants-design]: + +## Using a modified DPC++ as single source compiler + +With this approach, we simply schedule one more pass in the optimization +pipeline, which should be executed regardless of the optimization level, because +it is required for proper functioning of some features. + +The pass has functionality similar to the integration footer, i.e. it emits a +global constructor which in turn calls `__register_uniquely_identifiable_object` +to provide the runtime with required mapping information. + +Unlike with the integration footer approach, no separate file is being +generated. This preserves all source files mapping and checksums to be in place +and correct. + +Generated constructor function should have internal linkage to avoid possible +names clashes and multiple definition errors later at link stage. + +Generated constructor contains a call to +`__register_uniquely_identifiable_object` for each global variable which has +`sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those +attributes into the corresponding arguments of the function. + +### Handling shadowed variables + +Unlike with the integration footer the problem with shadowed variables doesn't +really exists with the modified DPC++ host compiler, because it is compiler's +responsibility to uniquely identify shadowed variables at LLVM IR level +and we are simply re-using what is already there. + +For example, for the following code snippet: + +``` +sycl::device_global FuBar; +namespace { + sycl::device_global FuBar; +} +``` + +The following IR is generated by our host compiler: + +``` +@FuBar = dso_local global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8 +@_ZN12_GLOBAL__N_15FuBarE = internal global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8 +``` diff --git a/sycl/doc/index.rst b/sycl/doc/index.rst index f67434e7faaa4..cd9fdd5cdaf66 100644 --- a/sycl/doc/index.rst +++ b/sycl/doc/index.rst @@ -28,6 +28,7 @@ Design Documents for the oneAPI DPC++ Compiler Clang Documentation Clang API Reference design/CompilerAndRuntimeDesign + design/MappingHostAddressesToDeviceEntities design/KernelParameterPassing design/PluginInterface design/SpecializationConstants