Skip to content

Commit fd2bd6e

Browse files
authored
[SYCL][Doc] Add FPGA properties to device global spec (#4675)
Although these properties are intended mostly for FPGA users, there is no prohibition against using them for other devices. Therefore, we describe them in the main device global spec, rather than creating a separate add-on spec for FPGA.
1 parent 7575853 commit fd2bd6e

File tree

1 file changed

+228
-9
lines changed

1 file changed

+228
-9
lines changed

sycl/doc/extensions/DeviceGlobal/SYCL_INTEL_device_global.asciidoc

Lines changed: 228 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
// This is necessary for asciidoc, but not for asciidoctor
2121
:cpp: C++
22+
:dpcpp: DPC++
2223

2324
== Introduction
2425
In OpenCL 2.0 and later, a user is able to allocate program
@@ -55,8 +56,8 @@ products.
5556

5657
== Version
5758

58-
Built On: 2021-08-01 +
59-
Revision: 2
59+
Built On: 2021-09-30 +
60+
Revision: 3
6061

6162
== Contact
6263

@@ -186,13 +187,9 @@ The allocation of type _T_ is zero-initialized on each device prior to the first
186187

187188
`device_global` may only be declared with static storage duration at namespace scope or class scope. If a `device_global` is declared with any other storage duration or scope, the program is ill-formed.
188189

189-
Properties may be specified for a `device_global` to provide semantic modification or optimization hint information to the compiler. Specific properties are defined in other extensions, but example uses of a property (with a "no copy" attribute described by another extension) are:
190-
191-
[source,c++]
192-
----
193-
device_global<MyClass, property_list_t<no_copy::value_t>> dm1;
194-
device_global<int[4], property_list_t<no_copy::value_t>> dm2;
195-
----
190+
Properties may be specified for a `device_global` to provide semantic
191+
modification or optimization hint information to the compiler. See the section
192+
below for a list of the properties that are allowed.
196193

197194
[NOTE]
198195
====
@@ -398,6 +395,216 @@ Available only if `sycl::is_property_of_v<propertyT, sycl::ext::oneapi::device_g
398395

399396
|===
400397

398+
=== Properties for device global variables
399+
400+
The `device_global` class supports several compile-time-constant properties.
401+
If specified, these properties are included in the `PropertyListT` template
402+
parameter as shown in this example:
403+
404+
[source,c++]
405+
----
406+
using namespace sycl::ext::oneapi;
407+
408+
device_global<MyClass, property_list_t<device_image_scope::value_t>> dm1;
409+
device_global<int[4], property_list_t<host_access::value_t<host_access::access::read>> dm2;
410+
----
411+
412+
The following code synopsis shows the set of supported properties, and the
413+
following table describes their effect.
414+
415+
[source,c++]
416+
----
417+
namespace sycl::ext::oneapi {
418+
419+
struct device_image_scope {
420+
using value_t = property_value<device_image_scope>;
421+
};
422+
423+
struct host_access {
424+
enum class access: /*unspecified*/ {
425+
read,
426+
write,
427+
read_write,
428+
none
429+
};
430+
template<access A>
431+
using value_t = property_value<host_access, std::integral_constant<access, A>>;
432+
433+
struct init_mode {
434+
enum class trigger: /*unspecified*/ {
435+
reprogram,
436+
reset
437+
};
438+
template<trigger T>
439+
using value_t = property_value<init_mode, std::integral_constant<trigger, T>>;
440+
};
441+
442+
struct implement_in_csr {
443+
template <bool Enable>
444+
using value_t = property_value<implement_in_csr, std::bool_constant<Enable>>;
445+
};
446+
447+
448+
inline constexpr device_image_scope::value_t device_image_scope_v;
449+
450+
template<host_access::access A>
451+
inline constexpr host_access::value_t<A> host_access_v;
452+
453+
template<init_mode::trigger T>
454+
inline constexpr init_mode::value_t<T> init_mode_v;
455+
456+
template<bool Enable>
457+
inline constexpr implement_in_csr::value_t<Enable> implement_in_csr_v;
458+
459+
} // namespace sycl::ext::oneapi
460+
----
461+
462+
[frame="topbot",options="header"]
463+
|===
464+
|Property |Description
465+
466+
a|
467+
[source,c++]
468+
----
469+
device_image_scope
470+
----
471+
a|
472+
This property is most useful for kernels that are submitted to an FPGA device,
473+
but it may be used with any kernel. Normally, a single instance of a device
474+
global variable is allocated for each device, and that instance is shared by
475+
all kernels that are submitted to the device, regardless of which _device
476+
image_ contains the kernel. When this property is specified, it is an
477+
assertion by the user that the device global is referenced only from kernels
478+
that are contained by the same _device image_. An implementation may be able
479+
to optimize accesses to the device global when this property is specified
480+
(especially on an FPGA device), but the user must be aware of which _device
481+
image_ contains the kernels that use the variable.
482+
483+
A device global that is decorated with this property may not be accessed from
484+
kernels that reside in different _device images_, either by direct reference
485+
to the variable or indirectly by passing the variable's address to another
486+
kernel. The implementation is required to diagnose an error if the kernels
487+
that directly access a variable do not all reside in the same _device image_,
488+
however no diagnostic is required for an indirect access from another _device
489+
image_.
490+
491+
When a device global is decorated with this property, the implementation
492+
re-initializes it whenever the _device image_ is loaded onto the device. As a
493+
result, the application can only be guaranteed that a device global retains its
494+
value between kernel invocations if it understands when the _device image_ is
495+
loaded onto the device. For an FPGA, this happens whenever the device is
496+
reprogrammed. Other devices typically load the _device image_ once before the
497+
first invocation of any kernel in that _device image_, and then it remains
498+
loaded onto the device until the program terminates.
499+
500+
The application may copy to or from a device global even before any kernel in
501+
the _device image_ is submitted to the device. Doing so causes the device
502+
global to be initialized immediately before the copy happens. (Typically, the
503+
copy operation causes the _device image_ to be loaded onto the device also.)
504+
As a result, copying from a device global returns the initial value if the
505+
_device image_ that contains the variable is not currently loaded onto the
506+
device.
507+
508+
a|
509+
[source,c++]
510+
----
511+
host_access
512+
----
513+
a|
514+
This property provides an assertion by the user telling the implementation
515+
whether the host code copies to or from the device global. As a result, the
516+
implementation may be able to perform certain optimizations. Although this
517+
property may be used with any device, it is generally only beneficial when used
518+
on FPGA devices.
519+
520+
The following values are supported:
521+
522+
* `read`: The user asserts that the host code may copy from (read) the
523+
variable, but it will never copy to (write) it. For an FPGA device, only a
524+
read port is exposed.
525+
* `write`: The user asserts that the host code may copy to (write) the
526+
variable, but it never copy from (read) it. For an FPGA device, only a write
527+
port is exposed.
528+
* `none`: The user asserts that the host code will never copy to or copy
529+
from the variable. For an FPGA device, no external ports are exposed.
530+
* `read_write`: The user provides no assertions, and the host code may either
531+
copy to or copy from the variable. This is the default. For an FPGA device,
532+
a read/write port is exposed.
533+
534+
a|
535+
[source,c++]
536+
----
537+
init_mode
538+
----
539+
a|
540+
This property is only meaningful when used with an FPGA device. It is ignored
541+
for other devices. The following values are supported:
542+
543+
* `reprogram`: Initialization is performed by reprogramming the device. This
544+
may require more frequent reprogramming but may reduce area.
545+
* `reset`: Initialization is performed by sending a reset signal to the device.
546+
This may increase area but may reduce reprogramming frequency.
547+
548+
If the `init_mode` property is not specified, the default behavior is
549+
equivalent to one of the values listed above, but the choice is implementation
550+
defined.
551+
552+
a|
553+
[source,c++]
554+
----
555+
implement_in_csr
556+
----
557+
a|
558+
This property is only meaningful when used with an FPGA device. It is ignored
559+
for other devices. The following values are supported:
560+
561+
* `true`: Access to this memory is done through a CSR interface shared with
562+
kernel arguments.
563+
* `false`: Access to this memory is done through a dedicated interface.
564+
565+
If the `implement_in_csr` property is not specified, the default behavior is
566+
equivalent to one of the values listed above, but the choice is implementation
567+
defined.
568+
569+
|===
570+
571+
[NOTE]
572+
====
573+
As stated above, the user must understand which _device image_ contains a
574+
kernel in order to use the `device_image_scope` property. Each implementation
575+
may have its own rules that determine when two kernels are bundled together
576+
into the same _device image_. For {dpcpp} two kernels _K1_ and _K2_ will be
577+
bundled into the same _device image_ when both of the following conditions are
578+
satisfied:
579+
580+
* The translation unit containing _K1_ and the translation unit containing _K2_
581+
must both be compiled with `-fsycl-targets=X
582+
-fsycl-assume-all-kernels-run-on-targets` where the target `X` is the same in
583+
both compilations. (A list of targets may also be specified such as
584+
`-fsycl-targets=X,Y`. In this case the list must be the same in both
585+
compilations.)
586+
587+
* The application must be linked with `-fsycl-device-code-split` such that the
588+
kernels _K1_ and _K2_ are not split into different _device images_. For
589+
example, if _K1_ and _K2_ reside in the same translation unit,
590+
`-fsycl-device-code-split=per_source` will guarantee that they are bundled
591+
together in the same _device image_. If they reside in different translation
592+
units, `-fsycl-device-code-split=none` will guarantee that they reside in the
593+
same _device image_.
594+
595+
In addition, the following factors also affect how kernels are bundled into
596+
_device images_:
597+
598+
* Kernels that are online-compiled using `sycl::kernel_bundle` may reside in
599+
different _device images_ if they are compiled from different `kernel_bundle`
600+
objects.
601+
602+
* A kernel that uses specialization constants may have a new instance in a new
603+
_device image_ each time the application sets a new value for the
604+
specialization constant. However, this happens only if the device supports
605+
native specialization constants, which is not the case for FPGA devices.
606+
====
607+
401608
=== Relax language restrictions for SYCL device functions
402609

403610
SYCL 2020 restrictions must be relaxed to allow `device_global` to be used within
@@ -650,6 +857,9 @@ void copy(const std::remove_all_extents_t<T> *src,
650857
----
651858
| `T` must be device copyable.
652859

860+
Not available if `PropertyListT` contains the `host_access` property with
861+
`read` or `none` assertions.
862+
653863
Copies _count_ elements of type `std::remove_all_extents_t<T>` from the pointer _src_ to the `device_global` _dest_, starting at _startIndex_ elements of _dest_. _src_ may be either a host or USM pointer.
654864
a|
655865
[source, c++]
@@ -662,6 +872,9 @@ void copy(const device_global<T, PropertyListT>& src,
662872
----
663873
| `T` must be device copyable.
664874

875+
Not available if `PropertyListT` contains the `host_access` property with
876+
`write` or `none` assertions.
877+
665878
Copies _count_ elements of type `std::remove_all_extents_t<T>` from the `device_global` _src_ to the pointer _dest_, starting at _startIndex_ elements of _src_. _dest_ may be either a host or USM pointer.
666879

667880
a|
@@ -673,6 +886,9 @@ void memcpy(device_global<T, PropertyListT>& dest,
673886
----
674887
|`T` must be device copyable.
675888

889+
Not available if `PropertyListT` contains the `host_access` property with
890+
`read` or `none` assertions.
891+
676892
Copies _count_ bytes from the pointer _src_ to the `device_global` _dest_, starting at _offset_ bytes. _src_ may be either a host or USM pointer.
677893

678894
a|
@@ -685,6 +901,9 @@ void memcpy(void *dest,
685901
----
686902
|`T` must be device copyable.
687903

904+
Not available if `PropertyListT` contains the `host_access` property with
905+
`write` or `none` assertions.
906+
688907
Copies _count_ bytes from the `device_global` _src_ to the pointer _dest_, starting at _offset_ bytes. _dest_ may be either a host or USM pointer.
689908
|====
690909
--

0 commit comments

Comments
 (0)