Skip to content

[SYCL][DOC] Design document for new mechanism of host -> device objects mapping #5910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
377 changes: 377 additions & 0 deletions sycl/doc/design/MappingHostAddressesToDeviceEntities.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,377 @@
# Mapping host variables to compiler-generated info

[SYCL 2020][sycl-2020-spec] specification and some extensions such as
[SYCL_INTEL_device_global][device-global-ext-spec] imply that the implementation
has the capability to somehow map addresses of a host objects to their
counterparts in device programs.

[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html
[device-global-ext-spec]: <../extensions/proposed/sycl_ext_oneapi_device_global.asciidoc>

For example, in order to implement specialization constants on top of SPIR-V, we
need to be able to map addresses of `specialization_id` variables to numeric
IDs of corresponding specialization constants at SPIR-V level.

Another example is device global [implementation][device-global-design], where
in order to communicate a value of `device_global` variable between host and
device we need to map its host address to a symbolic name/identifier and some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify what 'some other info' is. This is bit vague

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an example in 5b69048, but I don't think that we need to put all the details into this doc: device_global implementation design is covered by a separate document

other info like the size of an underlying type of a device global, which is used
at PI layer and below.

[device-global-design]: <DeviceGlobal.md>

This design document describes a generic way how to map the address of any SYCL
object defined in a namespace scope to its unique symbolic ID. Please note that
this document doesn't try to map the address to something other than a unique
symbolic ID: other required information is usually generated by the device
compiler and communicated to the runtime by device image properties. Unique
symbolic ID which can be obtained from mapping mechanism described in this
design document could be used as a key in those properties to propagate
additional information using existing mechanisms.

So, the overall process is:
- (optionally) device compiler generates property set/s which provide mapping
"unique symbolic ID" -> "various information required by DPC++ RT".
Note: The presence and the format of those property set is defined case by
case for each feature
- device or host compiler generates mapping
"address of a host variable" -> "unique symbolic ID" (as described below by
this document)
- DPC++ RT uses these two mappings to obtain required information

This design document describes two approaches for how the mapping of
"address of a host variable" -> "unique symbolic ID" can be generated:
the first one with integration footer and another one with modification of the
DPC++ host compiler.

Both approaches have their pros and cons and they are expected to be implemented
and exist in the implementation at the same time. Only one of them will be
used at a time, depending on whether a 3rd-party host compiler is used or not.

Integration footer can be used with 3rd-party host compilers. This, however
requires appending to a translation unit provided by a user, which could affect
debug information. Since there are no compilers that support appending a file at
the end (similar to `-include`), appending is done by generating a temporary
input file using concatenation of the original input and integration footer.

Such replacement of the main translation unit causes the following issues:
- debug information about the source file might be incorrect, leading to
problems with gdb `l` command and code coverage tools
- checksum of host and device source files becomes different which causes device
code debugging to be completely broken in some environments (such as MS Visual
Studio, for example)

Modifying DPC++ host compiler allows to avoid issues with debuggers and code
coverage tools, but that is not an option if a user wants to compile host part
of an app with a 3rd-party host compiler.

The sections below describe the implementation design of both approaches in more
detail. Note that there are few components which should be modified regardless
of which approach is in use.

## Common front-end part

DPC++ FE should support the following attribute:
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. The attribute accepts
a string literal and should be applied to types (like `device_global` or
`specialization_id`).

Presence of the attribute instructs the compiler to perform the following
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add examples of usage and IR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 90d8cd1

things:
- emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[__sycl_detail__::device_global]] results in sycl-unique-id as well right? What if both attributes are present? Should we remove this CodeGen functionality from [[__sycl_detail__::device_global]]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, [[__sycl_detail__::device_global]] will be limited to semantic checks only: I will update device global design doc to incorporate the change there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is also a bit hard to understand. WDYT about -“emitsycl-unique-idLLVM IR attribute to the definition of any variable whose type is marked with the[[sycl_detail::uniquely_identifiable_object(kind)]] attribute.”

type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]`
attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a
unique string identifier of the variable it is attached to. The rules for
creating this string are the same as for `__builtin_sycl_unique_stable_id` and
the same algorithm can be used when generating the string for the attribute
- emit `sycl-uid-kind` LLVM IR attribute alongside `sycl-unique-id`, which
contains the `kind` string passed via
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute

To illustrate, here is a SYCL code snippet:

```
template <typename T>
class
// Note: the attribute usage will be guarded by macro to be only applied when
// DPC++ compiler is used to avoid generating warnings. That is described
// later in the doc
[[__sycl_detail__::uniquely_identifiable_object("specialization_id")]]
specialization_id {
// ...
};

specialization_id<int> spec_const(38);

// some code which uses spec_const within a SYCL Kernel Function
```

After processed by DPC++ compiler, it will result in the following LLVM IR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean LLVM IR for device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean for both host and device: there is a paragraph a bit below, which says that. Please let me know how to better rearrange and/or rephrase that to make it more clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mention at the very beginning of attributes description that they trigger emission of new data for both host and device code. This a unique situation among SYCL attributes, so I kind of assumed that it will happen only for device code.


```
%class.specialization_id = type { i32 }
@spec_const = dso_local global %class.specialization_id { i32 38 } #0

attributes #0 = { "sycl-unique-id"="string returned by __builtin_sycl_unique_id(spec_const)" "sycl-uid-kind"="specialization_id" }
```

The new attribute should not be used for any semantic checking and its
sole purpose is to generate necessary LLVM IR attributes. If some feature
requires some semantic checks, then a separate attribute should be introduced
to perform them: for example see `[[__sycl_detail__::device_global]]` in
[device global design doc][device-global-design].

Note about `kind` argument: it should not be parsed by the compiler in any way
and it should be simply propagated as-is through the compiler stack to be used
later at runtime.

When DPC++ compiler is used as both host and device compiler, then the attribute
should be respected by both host and device compiler passes and LLVM IR
attributes should appear in LLVM IR for both host and device code. When DPC++
compiler is only used as a device compiler, then we don't expect the attribute
to be handled on host.

Another thing we need from DPC++ FE host compiler is to define a special macro,
which will allow to distinguish it from other host compilers. That is needed to
apply the aforementioned attribute conditionally to avoid spamming users with
warnings about unknown attributes.

**NOTE:** Alternatively we could simply set a macro which tells us whether or
not integration footer is enabled in the compiler driver instead of creating
a special macro for differentiating our own host compiler.

The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we can use 'INTEL' in the name if we're proposing this for syclos, especially if we want to upstream this at some point. I'm also not sure if we need senior management input for using INTEL in macro name. I guess its less of a concern if its not a documented macro, but it might be better to just have a macro which indicates footer is included or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just remembered that there is a __SYCL_SINGLE_SOURCE__ macro documented in the spec, which we should use here.

AFAIK, it is not yet supported, but now it seems to be the time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use __SYCL_SINGLE_SOURCE__ for this purpose. The intent of that macro is to indicate a compiler that produces both host and device code in a single pass (i.e. the CFE runs just once, and then both host and device code are produced from that common IR). This is not the case for DPC++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of that macro is to indicate a compiler that produces both host and device code in a single pass (i.e. the CFE runs just once, and then both host and device code are produced from that common IR).

Well, the spec says literally nothing about the number of CFE runs and based on the spec wording, DPC++ seems to be a single source compiler:

which produces host as well as device binary;

Anyway, I'm perfectly fine with using another macro introduced by ourselves, we just need to be able to detect that we are in integration footer mode, which will imply that we are in 3rd-party host compiler mode, which will mean do not use an attribute to avoid the warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of spec matches @AlexeySachkov's. Unless 'single-source compiler' has standardized meaning I am unaware of, it sounds like __SYCL_SINGLE_SOURCE__ works here

SYCL_SINGLE_SOURCE is defined to 1 if the source file is being compiled with a SYCL single-source compiler which produces host as well as device binary;

If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't use this, WDYT about just having a macro INTEGRATION_FOOTER defined in footer.

We need this macro available in regular SYCL headers, so I think that we will end up with setting something internal (__MACRO_NAME__) in the driver

when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag).

## Common headers part

Header files should be modified by adding the new attributes to types
declarations, objects of which we will need in our mapping. Again,
`device_global` and `specialization_id` are examples here:

```
template <typename T>
class
#if defined(__SYCL_DEVICE_ONLY__) || defined(__INTEL_SYCL_HOST_COMPILER__)
[[__sycl_detail__::uniquely_identifiable_object("specialization_id")]]
#endif
specialization_id {
// ...
};
```

## Common runtime part

The runtime should implement the following function, which will be called from
a code generated by the compiler (see the next section):

```
void __register_uniquely_identifiable_object(
void *Address, const char* UniqueID, const char *Kind);
```

The function accepts the following arguments:
- `Address` is an address of a variable, which exists in an application on host
- `UniqueID` is a unique symbolic ID, which corresponds to that variable
- `Kind` is a string which corresponds to `kind` argument passed to
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute attached
to the type of the variable identified by `Address`. It can be used to
distinguish different entities like `specialization_id` and `device_global`:
for example they could be stored in different maps to speed up certain
operations with them.

The compiler guarantees that the function will be called zero or more times
(depending on the amount of uniquely identifiable objects found in a program)
_before_ application's `main()` function and _before_ any other global
constructor defined in the same translation unit: this is needed to allow usages
of `specialization_id` and `device_global` variables from user-defined global
constructors.

## Compiler driver part

The compiler driver is the component which is responsible for selecting the
approach we are taking and the decision is made based on whether or not
3rd-party host compiler is in use.

If `-fsycl-host-compiler` option is present, the compiler driver chooses the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if -fsycl-host-compiler=clang++ is passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one: I simply don't think that it worth to implement compiler detection here, because even clang++ can point to the upstream clang, for example and it doesn't have customizations we need.

Copy link
Contributor

@Fznamznon Fznamznon Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know what is the algorithm of determining an absolute path to a 3rd-party host compiler specified through that argument, but I suggest that we always assume that any compiler passed through the flag is a 3rd-party one

I don't know how this option works either. Maybe It is worth to double check how does it work in corner case I mentioned. Maybe it would be good to mention how this works as well.

integration footer approach:
- it supplies device compilation step with `-fsycl-int-footer` option to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdtoguchi - possible change to how -fsycl-int-footer is passed

instruct device compiler to emit an integration footer
- it appends the integration footer to user-provided translation unit before
passing it to a host compiler

Otherwise, if `-fsycl-host-compiler` is not present, then the compiler driver
chooses another approach by simply doing nothing related to integration footer:
- `-fsycl-int-footer` is **not** passed to device compiler
- user-provided translation unit is passes as-is to host compiler

## Integration footer approach

When this approach is used, not only is an extra file (integration footer)
generated, but the integration header is also modified: FE compiler generates a
definition of a namespace scope variable of type
`__sycl_device_global_registration` whose sole purpose is to run its constructor
before the application's `main()` (and any other global constructor defined in
a user-provided translation unit) function:

```
namespace sycl::detail {
namespace {

class __sycl_device_global_registration {
public:
__sycl_device_global_registration() noexcept;
};
__sycl_device_global_registration __sycl_device_global_registrar;

} // namespace (unnamed)
} // namespace sycl::detail
```

Examples below are written for the following code snippet:

```
#include <sycl/sycl.hpp>

static sycl::device_global<int> Foo;
namespace inner {
sycl::device_global<double[2]> Bar;
} // namespace inner

// ...
```

The integration footer generated by the compiler contains the definition of the
constructor, which calls a function in the DPC++ runtime, which registers
needed mappings:

```
namespace sycl::detail {
namespace {

__sycl_device_global_registration::__sycl_device_global_registration() noexcept {
__register_uniquely_identifiable_object(
&::Foo,
/* same string returned from __builtin_sycl_unique_stable_id(::Foo) */,
"device_global");
__register_uniquely_identifiable_object(
&::inner::Bar,
/* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */,
"device_global");
}

} // namespace (unnamed)
} // namespace sycl::detail
```

Note: the integration footer is only populated with the registration object when
integration footer is enabled. Body of the registration object constructor can
be empty if there are no uniquely identifiable objects found in a translation
unit and FE is free to completely omit registration object generation in that
case as well.

### Handling shadowed variables

The example above shows a simple case where the user's device global variables
can all be uniquely referenced via fully qualified lookup (e.g.
`::inner::Bar`). However, it is possible for users to construct applications
where this is not the case, for example:

```
sycl::device_global<int> FuBar;
namespace {
sycl::device_global<int> FuBar;
}
```

In this example, the `FuBar` variable in the global namespace shadows a
variable with the same name in the unnamed namespace. The integration footer
can reference the variable in the global namespace as `::FuBar`, but there is
no way to reference the variable in the unnamed namespace using fully qualified
lookup.

Such programs are still legal, though. The integration footer can support
cases like this by defining a shim function that returns a reference to the
shadowed device global:

```
namespace {
namespace __sycl_detail {

static constexpr decltype(FuBar) &__shim_1() {
return FuBar; // References 'FuBar' in the unnamed namespace
}

} // namespace __sycl_detail
} // namespace (unnamed)

namespace sycl::detail {

__sycl_device_global_registration::__sycl_device_global_registration() noexcept {
__register_uniquely_identifiable_object(
&::FuBar,
/* same string returned from __builtin_sycl_unique_stable_id(::FuBar) */,
"device_global");
__register_uniquely_identifiable_object(
&::__sycl_detail::__shim_1(),
/* same string returned from __builtin_sycl_unique_stable_id(::(unnamed)::FuBar) */,
"device_global");
}

} // namespace sycl::detail
```

The `__shim_1()` function is defined in the same namespace as the second
`FuBar` device global, so it can reference the variable through unqualified
name lookup. Furthermore, the name of the shim function is globally unique, so
it is guaranteed not to be shadowed by any other name in the translation unit.
This problem with variable shadowing is also a problem for the integration
footer we use for specialization constants. See the [specialization constant
design document][spec-constants-design] for more details on this topic.

[spec-constants-design]: <SYCL2020-SpecializationConstants.md>

## Using a modified DPC++ as single source compiler

With this approach, we simply schedule one more pass in the optimization
pipeline, which should be executed regardless of the optimization level, because
it is required for proper functioning of some features.

The pass has functionality similar to the integration footer, i.e. it emits a
global constructor which in turn calls `__register_uniquely_identifiable_object`
to provide the runtime with required mapping information.

Unlike with the integration footer approach, no separate file is being
generated. This preserves all source files mapping and checksums to be in place
and correct.

Generated constructor function should have internal linkage to avoid possible
names clashes and multiple definition errors later at link stage.

Generated constructor contains a call to
`__register_uniquely_identifiable_object` for each global variable which has
`sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those
attributes into the corresponding arguments of the function.

### Handling shadowed variables

Unlike with the integration footer the problem with shadowed variables doesn't
really exists with the modified DPC++ host compiler, because it is compiler's
responsibility to uniquely identify shadowed variables at LLVM IR level
and we are simply re-using what is already there.

For example, for the following code snippet:

```
sycl::device_global<int> FuBar;
namespace {
sycl::device_global<int> FuBar;
}
```

The following IR is generated by our host compiler:

```
@FuBar = dso_local global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8
@_ZN12_GLOBAL__N_15FuBarE = internal global %"class.cl::sycl::ext::oneapi::device_global" zeroinitializer, align 8
```
1 change: 1 addition & 0 deletions sycl/doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Design Documents for the oneAPI DPC++ Compiler
Clang Documentation <https://intel.github.io/llvm-docs/clang>
Clang API Reference <https://intel.github.io/llvm-docs/clang_doxygen>
design/CompilerAndRuntimeDesign
design/MappingHostAddressesToDeviceEntities
design/KernelParameterPassing
design/PluginInterface
design/SpecializationConstants
Expand Down