19
19
20
20
// This is necessary for asciidoc, but not for asciidoctor
21
21
:cpp: C++
22
+ :dpcpp: DPC++
22
23
23
24
== Introduction
24
25
In OpenCL 2.0 and later, a user is able to allocate program
@@ -55,8 +56,8 @@ products.
55
56
56
57
== Version
57
58
58
- Built On: 2021-08-01 +
59
- Revision: 2
59
+ Built On: 2021-09-30 +
60
+ Revision: 3
60
61
61
62
== Contact
62
63
@@ -186,13 +187,9 @@ The allocation of type _T_ is zero-initialized on each device prior to the first
186
187
187
188
`device_global` may only be declared with static storage duration at namespace scope or class scope. If a `device_global` is declared with any other storage duration or scope, the program is ill-formed.
188
189
189
- Properties may be specified for a `device_global` to provide semantic modification or optimization hint information to the compiler. Specific properties are defined in other extensions, but example uses of a property (with a "no copy" attribute described by another extension) are:
190
-
191
- [source,c++]
192
- ----
193
- device_global<MyClass, property_list_t<no_copy::value_t>> dm1;
194
- device_global<int[4], property_list_t<no_copy::value_t>> dm2;
195
- ----
190
+ Properties may be specified for a `device_global` to provide semantic
191
+ modification or optimization hint information to the compiler. See the section
192
+ below for a list of the properties that are allowed.
196
193
197
194
[NOTE]
198
195
====
@@ -398,6 +395,216 @@ Available only if `sycl::is_property_of_v<propertyT, sycl::ext::oneapi::device_g
398
395
399
396
|===
400
397
398
+ === Properties for device global variables
399
+
400
+ The `device_global` class supports several compile-time-constant properties.
401
+ If specified, these properties are included in the `PropertyListT` template
402
+ parameter as shown in this example:
403
+
404
+ [source,c++]
405
+ ----
406
+ using namespace sycl::ext::oneapi;
407
+
408
+ device_global<MyClass, property_list_t<device_image_scope::value_t>> dm1;
409
+ device_global<int[4], property_list_t<host_access::value_t<host_access::access::read>> dm2;
410
+ ----
411
+
412
+ The following code synopsis shows the set of supported properties, and the
413
+ following table describes their effect.
414
+
415
+ [source,c++]
416
+ ----
417
+ namespace sycl::ext::oneapi {
418
+
419
+ struct device_image_scope {
420
+ using value_t = property_value<device_image_scope>;
421
+ };
422
+
423
+ struct host_access {
424
+ enum class access: /*unspecified*/ {
425
+ read,
426
+ write,
427
+ read_write,
428
+ none
429
+ };
430
+ template<access A>
431
+ using value_t = property_value<host_access, std::integral_constant<access, A>>;
432
+
433
+ struct init_mode {
434
+ enum class trigger: /*unspecified*/ {
435
+ reprogram,
436
+ reset
437
+ };
438
+ template<trigger T>
439
+ using value_t = property_value<init_mode, std::integral_constant<trigger, T>>;
440
+ };
441
+
442
+ struct implement_in_csr {
443
+ template <bool Enable>
444
+ using value_t = property_value<implement_in_csr, std::bool_constant<Enable>>;
445
+ };
446
+
447
+
448
+ inline constexpr device_image_scope::value_t device_image_scope_v;
449
+
450
+ template<host_access::access A>
451
+ inline constexpr host_access::value_t<A> host_access_v;
452
+
453
+ template<init_mode::trigger T>
454
+ inline constexpr init_mode::value_t<T> init_mode_v;
455
+
456
+ template<bool Enable>
457
+ inline constexpr implement_in_csr::value_t<Enable> implement_in_csr_v;
458
+
459
+ } // namespace sycl::ext::oneapi
460
+ ----
461
+
462
+ [frame="topbot",options="header"]
463
+ |===
464
+ |Property |Description
465
+
466
+ a|
467
+ [source,c++]
468
+ ----
469
+ device_image_scope
470
+ ----
471
+ a|
472
+ This property is most useful for kernels that are submitted to an FPGA device,
473
+ but it may be used with any kernel. Normally, a single instance of a device
474
+ global variable is allocated for each device, and that instance is shared by
475
+ all kernels that are submitted to the device, regardless of which _device
476
+ image_ contains the kernel. When this property is specified, it is an
477
+ assertion by the user that the device global is referenced only from kernels
478
+ that are contained by the same _device image_. An implementation may be able
479
+ to optimize accesses to the device global when this property is specified
480
+ (especially on an FPGA device), but the user must be aware of which _device
481
+ image_ contains the kernels that use the variable.
482
+
483
+ A device global that is decorated with this property may not be accessed from
484
+ kernels that reside in different _device images_, either by direct reference
485
+ to the variable or indirectly by passing the variable's address to another
486
+ kernel. The implementation is required to diagnose an error if the kernels
487
+ that directly access a variable do not all reside in the same _device image_,
488
+ however no diagnostic is required for an indirect access from another _device
489
+ image_.
490
+
491
+ When a device global is decorated with this property, the implementation
492
+ re-initializes it whenever the _device image_ is loaded onto the device. As a
493
+ result, the application can only be guaranteed that a device global retains its
494
+ value between kernel invocations if it understands when the _device image_ is
495
+ loaded onto the device. For an FPGA, this happens whenever the device is
496
+ reprogrammed. Other devices typically load the _device image_ once before the
497
+ first invocation of any kernel in that _device image_, and then it remains
498
+ loaded onto the device until the program terminates.
499
+
500
+ The application may copy to or from a device global even before any kernel in
501
+ the _device image_ is submitted to the device. Doing so causes the device
502
+ global to be initialized immediately before the copy happens. (Typically, the
503
+ copy operation causes the _device image_ to be loaded onto the device also.)
504
+ As a result, copying from a device global returns the initial value if the
505
+ _device image_ that contains the variable is not currently loaded onto the
506
+ device.
507
+
508
+ a|
509
+ [source,c++]
510
+ ----
511
+ host_access
512
+ ----
513
+ a|
514
+ This property provides an assertion by the user telling the implementation
515
+ whether the host code copies to or from the device global. As a result, the
516
+ implementation may be able to perform certain optimizations. Although this
517
+ property may be used with any device, it is generally only beneficial when used
518
+ on FPGA devices.
519
+
520
+ The following values are supported:
521
+
522
+ * `read`: The user asserts that the host code may copy from (read) the
523
+ variable, but it will never copy to (write) it. For an FPGA device, only a
524
+ read port is exposed.
525
+ * `write`: The user asserts that the host code may copy to (write) the
526
+ variable, but it never copy from (read) it. For an FPGA device, only a write
527
+ port is exposed.
528
+ * `none`: The user asserts that the host code will never copy to or copy
529
+ from the variable. For an FPGA device, no external ports are exposed.
530
+ * `read_write`: The user provides no assertions, and the host code may either
531
+ copy to or copy from the variable. This is the default. For an FPGA device,
532
+ a read/write port is exposed.
533
+
534
+ a|
535
+ [source,c++]
536
+ ----
537
+ init_mode
538
+ ----
539
+ a|
540
+ This property is only meaningful when used with an FPGA device. It is ignored
541
+ for other devices. The following values are supported:
542
+
543
+ * `reprogram`: Initialization is performed by reprogramming the device. This
544
+ may require more frequent reprogramming but may reduce area.
545
+ * `reset`: Initialization is performed by sending a reset signal to the device.
546
+ This may increase area but may reduce reprogramming frequency.
547
+
548
+ If the `init_mode` property is not specified, the default behavior is
549
+ equivalent to one of the values listed above, but the choice is implementation
550
+ defined.
551
+
552
+ a|
553
+ [source,c++]
554
+ ----
555
+ implement_in_csr
556
+ ----
557
+ a|
558
+ This property is only meaningful when used with an FPGA device. It is ignored
559
+ for other devices. The following values are supported:
560
+
561
+ * `true`: Access to this memory is done through a CSR interface shared with
562
+ kernel arguments.
563
+ * `false`: Access to this memory is done through a dedicated interface.
564
+
565
+ If the `implement_in_csr` property is not specified, the default behavior is
566
+ equivalent to one of the values listed above, but the choice is implementation
567
+ defined.
568
+
569
+ |===
570
+
571
+ [NOTE]
572
+ ====
573
+ As stated above, the user must understand which _device image_ contains a
574
+ kernel in order to use the `device_image_scope` property. Each implementation
575
+ may have its own rules that determine when two kernels are bundled together
576
+ into the same _device image_. For {dpcpp} two kernels _K1_ and _K2_ will be
577
+ bundled into the same _device image_ when both of the following conditions are
578
+ satisfied:
579
+
580
+ * The translation unit containing _K1_ and the translation unit containing _K2_
581
+ must both be compiled with `-fsycl-targets=X
582
+ -fsycl-assume-all-kernels-run-on-targets` where the target `X` is the same in
583
+ both compilations. (A list of targets may also be specified such as
584
+ `-fsycl-targets=X,Y`. In this case the list must be the same in both
585
+ compilations.)
586
+
587
+ * The application must be linked with `-fsycl-device-code-split` such that the
588
+ kernels _K1_ and _K2_ are not split into different _device images_. For
589
+ example, if _K1_ and _K2_ reside in the same translation unit,
590
+ `-fsycl-device-code-split=per_source` will guarantee that they are bundled
591
+ together in the same _device image_. If they reside in different translation
592
+ units, `-fsycl-device-code-split=none` will guarantee that they reside in the
593
+ same _device image_.
594
+
595
+ In addition, the following factors also affect how kernels are bundled into
596
+ _device images_:
597
+
598
+ * Kernels that are online-compiled using `sycl::kernel_bundle` may reside in
599
+ different _device images_ if they are compiled from different `kernel_bundle`
600
+ objects.
601
+
602
+ * A kernel that uses specialization constants may have a new instance in a new
603
+ _device image_ each time the application sets a new value for the
604
+ specialization constant. However, this happens only if the device supports
605
+ native specialization constants, which is not the case for FPGA devices.
606
+ ====
607
+
401
608
=== Relax language restrictions for SYCL device functions
402
609
403
610
SYCL 2020 restrictions must be relaxed to allow `device_global` to be used within
@@ -650,6 +857,9 @@ void copy(const std::remove_all_extents_t<T> *src,
650
857
----
651
858
| `T` must be device copyable.
652
859
860
+ Not available if `PropertyListT` contains the `host_access` property with
861
+ `read` or `none` assertions.
862
+
653
863
Copies _count_ elements of type `std::remove_all_extents_t<T>` from the pointer _src_ to the `device_global` _dest_, starting at _startIndex_ elements of _dest_. _src_ may be either a host or USM pointer.
654
864
a|
655
865
[source, c++]
@@ -662,6 +872,9 @@ void copy(const device_global<T, PropertyListT>& src,
662
872
----
663
873
| `T` must be device copyable.
664
874
875
+ Not available if `PropertyListT` contains the `host_access` property with
876
+ `write` or `none` assertions.
877
+
665
878
Copies _count_ elements of type `std::remove_all_extents_t<T>` from the `device_global` _src_ to the pointer _dest_, starting at _startIndex_ elements of _src_. _dest_ may be either a host or USM pointer.
666
879
667
880
a|
@@ -673,6 +886,9 @@ void memcpy(device_global<T, PropertyListT>& dest,
673
886
----
674
887
|`T` must be device copyable.
675
888
889
+ Not available if `PropertyListT` contains the `host_access` property with
890
+ `read` or `none` assertions.
891
+
676
892
Copies _count_ bytes from the pointer _src_ to the `device_global` _dest_, starting at _offset_ bytes. _src_ may be either a host or USM pointer.
677
893
678
894
a|
@@ -685,6 +901,9 @@ void memcpy(void *dest,
685
901
----
686
902
|`T` must be device copyable.
687
903
904
+ Not available if `PropertyListT` contains the `host_access` property with
905
+ `write` or `none` assertions.
906
+
688
907
Copies _count_ bytes from the `device_global` _src_ to the pointer _dest_, starting at _offset_ bytes. _dest_ may be either a host or USM pointer.
689
908
|====
690
909
--
0 commit comments