-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][DOC] Design document for new mechanism of host -> device objects mapping #5910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b4a90d3
e6938d8
323579e
7373409
6c7a9be
9758b95
5a1c630
5b69048
ca7857d
90d8cd1
aff7c4b
0d57670
b83994c
2f43211
e963911
96c0e37
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,377 @@ | ||
# Mapping host variables to compiler-generated info | ||
|
||
[SYCL 2020][sycl-2020-spec] specification and some extensions such as | ||
[SYCL_INTEL_device_global][device-global-ext-spec] imply that the implementation | ||
has the capability to somehow map addresses of a host objects to their | ||
counterparts in device programs. | ||
|
||
[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html | ||
[device-global-ext-spec]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc> | ||
|
||
For example, in order to implement specialization constants on top of SPIR-V, we | ||
need to be able to map addresses of `specialization_id` variables to numeric | ||
IDs of corresponding specialization constants at SPIR-V level. | ||
|
||
Another example is device global [implementation][device-global-design], where | ||
in order to communicate a value of `device_global` variable between host and | ||
device we need to map its host address to a symbolic name/identifier and some | ||
other info like the size of an underlying type of a device global, which is used | ||
at PI layer and below. | ||
|
||
[device-global-design]: <DeviceGlobal.md> | ||
|
||
This design document describes a generic way how to map the address of any SYCL | ||
object defined in a namespace scope to its unique symbolic ID. Please note that | ||
this document doesn't try to map the address to something other than a unique | ||
symbolic ID: other required information is usually generated by the device | ||
compiler and communicated to the runtime by device image properties. Unique | ||
symbolic ID which can be obtained from mapping mechanism described in this | ||
design document could be used as a key in those properties to propagate | ||
additional information using existing mechanisms. | ||
|
||
So, the overall process is: | ||
- (optionally) device compiler generates property set/s which provide mapping | ||
"unique symbolic ID" -> "various information required by DPC++ RT". | ||
Note: The presence and the format of those property set is defined case by | ||
case for each feature | ||
- device or host compiler generates mapping | ||
"address of a host variable" -> "unique symbolic ID" (as described below by | ||
this document) | ||
- DPC++ RT uses these two mappings to obtain required information | ||
|
||
This design document describes two approaches for how the mapping of | ||
"address of a host variable" -> "unique symbolic ID" can be generated: | ||
the first one with integration footer and another one with modification of the | ||
DPC++ host compiler. | ||
|
||
Both approaches have their pros and cons and they are expected to be implemented | ||
and exist in the implementation at the same time. Only one of them will be | ||
used at a time, depending on whether a 3rd-party host compiler is used or not. | ||
|
||
Integration footer can be used with 3rd-party host compilers. This, however | ||
requires appending to a translation unit provided by a user, which could affect | ||
debug information. Since there are no compilers that support appending a file at | ||
the end (similar to `-include`), appending is done by generating a temporary | ||
input file using concatenation of the original input and integration footer. | ||
|
||
Such replacement of the main translation unit causes the following issues: | ||
- debug information about the source file might be incorrect, leading to | ||
problems with gdb `l` command and code coverage tools | ||
- checksum of host and device source files becomes different which causes device | ||
code debugging to be completely broken in some environments (such as MS Visual | ||
Studio, for example) | ||
|
||
Modifying DPC++ host compiler allows to avoid issues with debuggers and code | ||
coverage tools, but that is not an option if a user wants to compile host part | ||
of an app with a 3rd-party host compiler. | ||
|
||
The sections below describe the implementation design of both approaches in more | ||
detail. Note that there are few components which should be modified regardless | ||
of which approach is in use. | ||
|
||
## Common front-end part | ||
|
||
DPC++ FE should support the following attribute: | ||
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. The attribute accepts | ||
a string literal and should be applied to types (like `device_global` or | ||
`specialization_id`). | ||
|
||
Presence of the attribute instructs the compiler to perform the following | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add examples of usage and IR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added in 90d8cd1 |
||
things: | ||
- emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line is also a bit hard to understand. WDYT about - |
||
type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` | ||
attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a | ||
unique string identifier of the variable it is attached to. The rules for | ||
creating this string are the same as for `__builtin_sycl_unique_stable_id` and | ||
the same algorithm can be used when generating the string for the attribute | ||
- emit `sycl-uid-kind` LLVM IR attribute alongside `sycl-unique-id`, which | ||
contains the `kind` string passed via | ||
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute | ||
|
||
To illustrate, here is a SYCL code snippet: | ||
|
||
``` | ||
template <typename T> | ||
class | ||
// Note: the attribute usage will be guarded by macro to be only applied when | ||
// DPC++ compiler is used to avoid generating warnings. That is described | ||
// later in the doc | ||
[[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] | ||
specialization_id { | ||
// ... | ||
}; | ||
|
||
specialization_id<int> spec_const(38); | ||
AlexeySachkov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// some code which uses spec_const within a SYCL Kernel Function | ||
``` | ||
|
||
After processed by DPC++ compiler, it will result in the following LLVM IR: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean LLVM IR for device? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean for both host and device: there is a paragraph a bit below, which says that. Please let me know how to better rearrange and/or rephrase that to make it more clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should mention at the very beginning of attributes description that they trigger emission of new data for both host and device code. This a unique situation among SYCL attributes, so I kind of assumed that it will happen only for device code. |
||
|
||
``` | ||
%class.specialization_id = type { i32 } | ||
@spec_const = dso_local global %class.specialization_id { i32 38 } #0 | ||
|
||
attributes #0 = { "sycl-unique-id"="string returned by __builtin_sycl_unique_id(spec_const)" "sycl-uid-kind"="specialization_id" } | ||
``` | ||
|
||
The new attribute should not be used for any semantic checking and its | ||
sole purpose is to generate necessary LLVM IR attributes. If some feature | ||
requires some semantic checks, then a separate attribute should be introduced | ||
to perform them: for example see `[[__sycl_detail__::device_global]]` in | ||
[device global design doc][device-global-design]. | ||
|
||
Note about `kind` argument: it should not be parsed by the compiler in any way | ||
and it should be simply propagated as-is through the compiler stack to be used | ||
later at runtime. | ||
|
||
When DPC++ compiler is used as both host and device compiler, then the attribute | ||
should be respected by both host and device compiler passes and LLVM IR | ||
attributes should appear in LLVM IR for both host and device code. When DPC++ | ||
compiler is only used as a device compiler, then we don't expect the attribute | ||
to be handled on host. | ||
|
||
Another thing we need from DPC++ FE host compiler is to define a special macro, | ||
elizabethandrews marked this conversation as resolved.
Show resolved
Hide resolved
|
||
which will allow to distinguish it from other host compilers. That is needed to | ||
apply the aforementioned attribute conditionally to avoid spamming users with | ||
elizabethandrews marked this conversation as resolved.
Show resolved
Hide resolved
|
||
warnings about unknown attributes. | ||
|
||
**NOTE:** Alternatively we could simply set a macro which tells us whether or | ||
not integration footer is enabled in the compiler driver instead of creating | ||
a special macro for differentiating our own host compiler. | ||
|
||
The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know if we can use 'INTEL' in the name if we're proposing this for syclos, especially if we want to upstream this at some point. I'm also not sure if we need senior management input for using INTEL in macro name. I guess its less of a concern if its not a documented macro, but it might be better to just have a macro which indicates footer is included or something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've just remembered that there is a AFAIK, it is not yet supported, but now it seems to be the time There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, the spec says literally nothing about the number of CFE runs and based on the spec wording, DPC++ seems to be a single source compiler:
Anyway, I'm perfectly fine with using another macro introduced by ourselves, we just need to be able to detect that we are in integration footer mode, which will imply that we are in 3rd-party host compiler mode, which will mean do not use an attribute to avoid the warning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding of spec matches @AlexeySachkov's. Unless 'single-source compiler' has standardized meaning I am unaware of, it sounds like
If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We need this macro available in regular SYCL headers, so I think that we will end up with setting something internal ( |
||
when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag). | ||
|
||
## Common headers part | ||
|
||
Header files should be modified by adding the new attributes to types | ||
declarations, objects of which we will need in our mapping. Again, | ||
`device_global` and `specialization_id` are examples here: | ||
|
||
``` | ||
template <typename T> | ||
class | ||
#if defined(__SYCL_DEVICE_ONLY__) || defined(__INTEL_SYCL_HOST_COMPILER__) | ||
[[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] | ||
#endif | ||
specialization_id { | ||
// ... | ||
}; | ||
``` | ||
|
||
## Common runtime part | ||
|
||
The runtime should implement the following function, which will be called from | ||
a code generated by the compiler (see the next section): | ||
|
||
``` | ||
void __register_uniquely_identifiable_object( | ||
void *Address, const char* UniqueID, const char *Kind); | ||
``` | ||
|
||
The function accepts the following arguments: | ||
- `Address` is an address of a variable, which exists in an application on host | ||
- `UniqueID` is a unique symbolic ID, which corresponds to that variable | ||
- `Kind` is a string which corresponds to `kind` argument passed to | ||
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute attached | ||
to the type of the variable identified by `Address`. It can be used to | ||
distinguish different entities like `specialization_id` and `device_global`: | ||
for example they could be stored in different maps to speed up certain | ||
operations with them. | ||
|
||
The compiler guarantees that the function will be called zero or more times | ||
(depending on the amount of uniquely identifiable objects found in a program) | ||
_before_ application's `main()` function and _before_ any other global | ||
constructor defined in the same translation unit: this is needed to allow usages | ||
of `specialization_id` and `device_global` variables from user-defined global | ||
constructors. | ||
|
||
## Compiler driver part | ||
|
||
The compiler driver is the component which is responsible for selecting the | ||
approach we are taking and the decision is made based on whether or not | ||
3rd-party host compiler is in use. | ||
|
||
If `-fsycl-host-compiler` option is present, the compiler driver chooses the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What will happen if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one: I simply don't think that it worth to implement compiler detection here, because even There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't know how this option works either. Maybe It is worth to double check how does it work in corner case I mentioned. Maybe it would be good to mention how this works as well. |
||
integration footer approach: | ||
- it supplies device compilation step with `-fsycl-int-footer` option to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mdtoguchi - possible change to how |
||
instruct device compiler to emit an integration footer | ||
- it appends the integration footer to user-provided translation unit before | ||
passing it to a host compiler | ||
|
||
Otherwise, if `-fsycl-host-compiler` is not present, then the compiler driver | ||
chooses another approach by simply doing nothing related to integration footer: | ||
- `-fsycl-int-footer` is **not** passed to device compiler | ||
- user-provided translation unit is passes as-is to host compiler | ||
|
||
## Integration footer approach | ||
|
||
When this approach is used, not only is an extra file (integration footer) | ||
generated, but the integration header is also modified: FE compiler generates a | ||
definition of a namespace scope variable of type | ||
`__sycl_device_global_registration` whose sole purpose is to run its constructor | ||
before the application's `main()` (and any other global constructor defined in | ||
a user-provided translation unit) function: | ||
|
||
``` | ||
namespace sycl::detail { | ||
namespace { | ||
|
||
class __sycl_device_global_registration { | ||
AlexeySachkov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public: | ||
__sycl_device_global_registration() noexcept; | ||
}; | ||
__sycl_device_global_registration __sycl_device_global_registrar; | ||
|
||
} // namespace (unnamed) | ||
} // namespace sycl::detail | ||
``` | ||
|
||
Examples below are written for the following code snippet: | ||
|
||
``` | ||
#include <sycl/sycl.hpp> | ||
|
||
static sycl::device_global<int> Foo; | ||
namespace inner { | ||
sycl::device_global<double[2]> Bar; | ||
} // namespace inner | ||
|
||
// ... | ||
``` | ||
|
||
The integration footer generated by the compiler contains the definition of the | ||
constructor, which calls a function in the DPC++ runtime, which registers | ||
needed mappings: | ||
|
||
``` | ||
namespace sycl::detail { | ||
namespace { | ||
|
||
__sycl_device_global_registration::__sycl_device_global_registration() noexcept { | ||
__register_uniquely_identifiable_object( | ||
&::Foo, | ||
/* same string returned from __builtin_sycl_unique_stable_id(::Foo) */, | ||
"device_global"); | ||
Fznamznon marked this conversation as resolved.
Show resolved
Hide resolved
|
||
__register_uniquely_identifiable_object( | ||
&::inner::Bar, | ||
/* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */, | ||
"device_global"); | ||
} | ||
|
||
} // namespace (unnamed) | ||
} // namespace sycl::detail | ||
``` | ||
|
||
Note: the integration footer is only populated with the registration object when | ||
integration footer is enabled. Body of the registration object constructor can | ||
be empty if there are no uniquely identifiable objects found in a translation | ||
unit and FE is free to completely omit registration object generation in that | ||
case as well. | ||
|
||
### Handling shadowed variables | ||
|
||
The example above shows a simple case where the user's device global variables | ||
can all be uniquely referenced via fully qualified lookup (e.g. | ||
`::inner::Bar`). However, it is possible for users to construct applications | ||
where this is not the case, for example: | ||
|
||
``` | ||
sycl::device_global<int> FuBar; | ||
namespace { | ||
sycl::device_global<int> FuBar; | ||
} | ||
``` | ||
|
||
In this example, the `FuBar` variable in the global namespace shadows a | ||
variable with the same name in the unnamed namespace. The integration footer | ||
can reference the variable in the global namespace as `::FuBar`, but there is | ||
no way to reference the variable in the unnamed namespace using fully qualified | ||
lookup. | ||
|
||
Such programs are still legal, though. The integration footer can support | ||
cases like this by defining a shim function that returns a reference to the | ||
shadowed device global: | ||
|
||
``` | ||
namespace { | ||
namespace __sycl_detail { | ||
|
||
static constexpr decltype(FuBar) &__shim_1() { | ||
return FuBar; // References 'FuBar' in the unnamed namespace | ||
} | ||
|
||
} // namespace __sycl_detail | ||
} // namespace (unnamed) | ||
|
||
namespace sycl::detail { | ||
|
||
__sycl_device_global_registration::__sycl_device_global_registration() noexcept { | ||
__register_uniquely_identifiable_object( | ||
&::FuBar, | ||
/* same string returned from __builtin_sycl_unique_stable_id(::FuBar) */, | ||
"device_global"); | ||
__register_uniquely_identifiable_object( | ||
&::__sycl_detail::__shim_1(), | ||
/* same string returned from __builtin_sycl_unique_stable_id(::(unnamed)::FuBar) */, | ||
"device_global"); | ||
} | ||
|
||
} // namespace sycl::detail | ||
``` | ||
|
||
The `__shim_1()` function is defined in the same namespace as the second | ||
`FuBar` device global, so it can reference the variable through unqualified | ||
name lookup. Furthermore, the name of the shim function is globally unique, so | ||
it is guaranteed not to be shadowed by any other name in the translation unit. | ||
This problem with variable shadowing is also a problem for the integration | ||
footer we use for specialization constants. See the [specialization constant | ||
design document][spec-constants-design] for more details on this topic. | ||
|
||
[spec-constants-design]: <SYCL2020-SpecializationConstants.md> | ||
|
||
## Using a modified DPC++ as single source compiler | ||
|
||
With this approach, we simply schedule one more pass in the optimization | ||
pipeline, which should be executed regardless of the optimization level, because | ||
it is required for proper functioning of some features. | ||
|
||
The pass has functionality similar to the integration footer, i.e. it emits a | ||
global constructor which in turn calls `__register_uniquely_identifiable_object` | ||
to provide the runtime with required mapping information. | ||
|
||
Unlike with the integration footer approach, no separate file is being | ||
generated. This preserves all source files mapping and checksums to be in place | ||
and correct. | ||
|
||
Generated constructor function should have internal linkage to avoid possible | ||
names clashes and multiple definition errors later at link stage. | ||
|
||
Generated constructor contains a call to | ||
`__register_uniquely_identifiable_object` for each global variable which has | ||
`sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those | ||
AlexeySachkov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
attributes into the corresponding arguments of the function. | ||
|
||
### Handling shadowed variables | ||
|
||
Unlike with the integration footer the problem with shadowed variables doesn't | ||
really exists with the modified DPC++ host compiler, because it is compiler's | ||
responsibility to uniquely identify shadowed variables at LLVM IR level | ||
and we are simply re-using what is already there. | ||
|
||
For example, for the following code snippet: | ||
|
||
``` | ||
sycl::device_global<int> FuBar; | ||
namespace { | ||
sycl::device_global<int> FuBar; | ||
} | ||
``` | ||
|
||
The following IR is generated by our host compiler: | ||
|
||
``` | ||
@FuBar = dso_local global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8 | ||
@_ZN12_GLOBAL__N_15FuBarE = internal global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify what 'some other info' is. This is bit vague
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an example in 5b69048, but I don't think that we need to put all the details into this doc:
device_global
implementation design is covered by a separate document