diff --git a/sycl/doc/Assert.md b/sycl/doc/Assert.md new file mode 100644 index 0000000000000..12b074c258665 --- /dev/null +++ b/sycl/doc/Assert.md @@ -0,0 +1,423 @@ +# Assert feature + +**IMPORTANT**: This document is a draft. + +Using the standard C++ `assert` API ("assertions") is an important debugging +technique widely used by developers. This document describes the design of +supporting assertions within SYCL device code. +The basic approach we chose is delivering device-side assertions as call to +`std::abort()` at host-side. + +As usual, device-side assertions can be disabled by defining `NDEBUG` macro at +compile time. + +## Use-case example + +```c++ +#include +#include + +using namespace sycl; + +void user_func(item<2> Item) { + assert((Item[0] % 2) && “Nil”); +} + +int main() { + queue Q; + Q.submit([&] (handler& CGH) { + CGH.parallel_for(range<2>{N, M}, [=](item<2> It) { + do_smth(); + user_func(It); + do_smth_else(); + }); + }); + Q.wait(); + std::cout << “One shouldn’t see this message.“; + return 0; +} +``` + +In this use-case every work-item with even index along 0 dimension will trigger +assertion failure. Assertion failure should trigger a call to `std::abort()` at +host as described in +[extension](extensions/Assert/SYCL_INTEL_ASSERT.asciidoc). +Even though multiple failures of the same or different assertions can happen in +multiple work-items, implementation is required to deliver at least one +assertion. The assertion failure message is printed to `stderr` by DPCPP +Runtime or underlying backend. + +When multiple kernels are enqueued and more than one fail at assertion, at least +one assertion should be reported. + + +## User requirements + +From user's point of view there are the following requirements: + +| # | Title | Description | Importance | +| - | ----- | ----------- | ---------- | +| 1 | Abort DPC++ application | Abort host application when assert function is called and print a message about assertion | Must have | +| 2 | Print assert message | Assert function should print message to stderr at host | Must have | +| 3 | Stop under debugger | When debugger is attached, break at assertion point | Highly desired | +| 4 | Reliability | Assert failure should be reported regardless of kernel deadlock | Highly desired | + +Implementations without enough capabilities to implement fourth requirement are +allowed to realize the fallback approach described below, which does not +guarantee assertion failure delivery to host, but is still useful in many +practical cases. + + +## Terms + + - Device-side Runtime - runtime library supplied by the Native Device Compiler + and running on the device. + - Native Device Compiler - compiler which generates device-native binary image + based on input SPIR-V image. + - Low-level Runtime - the backend/runtime behind DPCPP Runtime attached via the + Plugin Interface. + + +## How it works? + +`assert(expr)` macro ends up in call to `__devicelib_assert_fail`. This function +is part of [Device library extension](extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst#cl_intel_devicelib_cassert). + +The format of the assert message is unspecified, but it will always include the +text of the failing expression, the values of the standard macros `__FILE__` and +`__LINE__`, and the value of the standard variable `__func__`. If the failing +assert comes from an `nd_range` `parallel_for` it will also include the global +ID and the local ID of the failing work item. + +Implementation of this function is supplied by Native Device Compiler for +safe approach or by DPCPP Compiler for fallback one. + +In order to distinguish which implementation to use, DPCPP Runtime checks for +`PI_INTEL_DEVICELIB_CASSERT` extension. If the extension isn't available, then +fallback implementation is used. + + +## Safe approach + +This is the preferred approach and implementations should use it when possible. +It guarantees assertion failure notification delivery to the host regardless of +kernel behavior which hit the assertion. If backend suports the safe approach, +it must report this capability to DPCPP Runtime via the +`PI_INTEL_DEVICELIB_CASSERT` extension query. + +The Native Device Compiler is responsible for providing implementation of +`__devicelib_assert_fail` which completely hides details of communication +between the device code and the Low-Level Runtime from the SYCL device compiler +and runtime. The Low-Level Runtime is responsible for: + - detecting if assert failure took place; + - flushing assert message to `stderr` on host. + +The following sequence of events describes how user code gets notified: + - Device side: + 1. Assert fails in device-code in kernel + // It's not defined if GPU thread stops execution + // Other GPU threads are left untouched + 2. Specialized version of `__devicelib_assert_fail` is called + 3. Device immediately signals to host (Low-Level Runtime) + - Host side: + 1. The assert failure gets detected by Low-Level Runtime + 2. Low-Level Runtime prints assert failure message to `stderr` + 3. Low-Level Runtime calls `abort()` + + +## Fallback approach + +If Device-side Runtime doesn't support `__devicelib_assert_fail` (as reported +via `PI_INTEL_DEVICELIB_CASSERT` extension query) then a fallback approach comes +in place. The approach doesn't require any support from Device-side Runtime and +Native Device Compiler. Neither it does from Low-level Runtime. + +Within this approach, a mutable program scope variable is introduced. This +variable stores a flag which says if an assert failure was encountered. Fallback +implementation of `__devicelib_assert_fail` atomically raises the flag so that +DPCPP Runtime is able to detect assert failure after kernel finishes. + +The following sequence of events describes how user code gets notified: + - Device side: + 1. Assert fails in device-code in kernel + 2. Fallback version of `__devicelib_assert_fail` is called + 3. Assert information is stored into program-scope variable + 4. Kernel continues running + - Host side: + 1. A copy 'kernel' is enqueued as the one depending on user's kernel to get + the value of assert failure flag. + 2. A host-task is enqueued to check value of assert failure flag. + 3. The host task calls abort whenever assert failure flag is set. + +DPCPP Runtime will automatically check if assertions are enabled in the kernel +being run, and won't enqueue the auxiliary kernels if assertions are not +enabled. So there is no host-side runtime overhead when assertion are not +enabled. + +Illustrating this with an example, lets assume the user enqueues three kernels: + - `Kernel #1`, uses assert + - `Kernel #2`, uses assert + - `Kernel #3`, uses assert and depends on `Kernel #1` + +The resulting graph will look like this: ![graph](images/assert-fallback-graph.svg) + +### Interface to program scope variable + +Multiple translation units could be compiled/linked into a single device binary +image. All of them should have `extern` declaration of program scope variable +available. Definition of the variable is only available within devicelib in the +same binary image where fallback `__devicelib_assert_fail` resides. + +The variable has the following structure and +declaration: + +```c++ +namespace cl { +namespace sycl { +namespace detail { +struct AssertHappened { + int Flag = 0; +}; +} +} +} + +#ifdef __SYCL_DEVICE_ONLY__ +extern SYCL_GLOBAL_VAR AssertHappened AssertHappenedMem; +#endif +``` + +Here, `SYCL_GLOBAL_VAR` is a macro which wraps special attribute to allow for +mutable program-scope variable. + +The reference to extern variable is resolved within online-linking against +fallback devicelib. + +### Online-linking fallback `__devicelib_assert_fail` + +Online linking against fallback implementation of `__devicelib_assert_fail` is +performed only when assertion is enabled and Device-side Runtime doesn't provide +implementation of `__devicelib_assert_fail`. + +In DPCPP headers one can see if assert is enabled with status of `NDEBUG` macro +with `#ifdef`'s. When in DPCPP Runtime Library this knowledge is obtained from +device binary image descriptor's property sets. + +Each device image is supplied with an array of property sets. For description +of property sets see `struct pi_device_binary_struct` in +[`pi.h`](https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/detail/pi.h#L692) + +A distinct property set `SYCL/assert used` is added. In this set a property +with the name of the kernel is added whenever the kernel uses assert. The use of +assert is detected by a specific LLVM IR pass invoked by the `sycl-post-link` +tool which runs on linked device code, i.e. after linking with the `libsycl-crt` +library which defines the assert function. The pass builds complete call graph +for a kernel, and sees if there's a call to `__devicelib_assert_fail` anywhere +in the graph. If found, `sycl-post-link` adds the property for the kernel. + +The same is done for all indirect callable functions (marked with specific +attribute) found in the linked device code. Those are functions whose pointers +can be taken and passed around in device code. If a callgraph for any such +function has a call to `__devicelib_assert_fail`, then all kernels in the module +are conservatively marked as using asserts. + +The added property is used for: + - deciding if online-linking against fallback devicelib is required; + - if there's a need to enqueue program scope variable copier kernel and checker + host-task. + +Suppose the following example user code: +```c++ +void user_func(int X) { + assert(X && “X is nil”); +} + +int main() { + queue Q(...); + Q.submit([&] (handler& CGH) { + CGH.single_task([=] () { + do_smth(); + user_func(0); + do_smth_else(); + }); + }); + ... +} +``` + +The following LLVM IR pseudo code will be generated after linking against +fallback implementation of devicelib: +``` +@AssertHappenedMem = global AssertHappened + +/// user's code +void user_func(int X) { +if (!(X && “X is nil")) { + __assert_fail(...); + } +} + +kernel(...) { + do_smth() + user_func(0); + do_smth_else(); +} + +/// __assert_fail belongs to Linux version of devicelib +void __assert_fail(...) { + ... + __devicelib_assert_fail(...); +} + +void __devicelib_assert_fail(Expr, File, Line, GlobalID, LocalID) { + ... + volatile int *Ptr = (volatile int *)AssertHappenedMem.Flag; + int Expected = 0; + int Desired = 1; + + if (atomic_CAS(&AssertHappenedMem.Flag, Expected, Desired)) + printf("Assertion `%s' failed in %s at line %i. GlobalID: %i, LocalID: %i", + Expr, File, Line, GlobalID, LocalID); +} +``` + +#### Compiling with assert enabled/disabled + +Consider the following example sources: +```c++ +// impl.cpp +using namespace sycl; +int calculus(int X) { + assert(X && "Invalid value"); + return X * 2; +} + +void enqueueKernel(queue &Q, buffer &B) { + Q.submit([](handler &H) { + auto Acc = B.get_access(H); + H.parallel_for(/* range */, [](item It) { + assert(Acc[It]); + // ... + }); + }); +} + +// main.cpp +// ... +using namespace sycl; + +SYCL_EXTERNAL int calculus(int); +void enqueueKernel(queue&, buffer&); + +void workload() { + queue Q; + buffer B; + + Q.submit([](handler &H) { + auto Acc = B.get_access(H); + H.parallel_for(/* range */, [](item It) { + int X = calculus(0); // should fail assertion + assert(X && "Nil in result"); + Acc[It] = X; + }); + }); + + enqueueKernel(Q, B); + ... +} +``` + +These two files are compiled into a single binary application. There are four +states of definition of `NDEBUG` macro available: + +| # | `impl.cpp` | `main.cpp` | +| - | ---------- | ---------- | +| 1 | defined | defined | +| 2 | defined | undefined | +| 3 | undefined | defined | +| 4 | undefined | undefined | + +States of definition of `NDEBUG` macro defines the set of assertions which can +fail. + +### Raising assert failure flag and reading it on host + +In DPCPP headers one can see if assert is enabled with status of `NDEBUG` macro +with `#ifdef`'s. Though, in order to support for multi translation unit use-case +it's not allowed to rely on definition of `NDEBUG` macro. + +*Note: Multi translation unit use-case here is the one with `SYCL_EXTERNAL` +function compiled with assertions enabled and used in a kernel but the kernel +is compiled with assertions disabled.* + +There're two commands used for reading assert failure flag: copy kernel and +checker host task. The copy kernel will copy `AssertHappenedMem` to host and +host-task will check the `Flag` value and `abort()` as needed. The kernel and +host task are enqueued together with a kernel only when the corresponding device +binary image for this kernel tells that it may use (maybe indirectly) the +`assert` in its code. + +All translation units provided by the user should have a declaration of the +assert flag read function available: +```c++ +int __devicelib_assert_read(void); +``` +Also, the [AssertHappened](#prog-scope-var-decl) structure type should be +available for the copier kernel. + +The definition is only provided within devicelib along with +`__devicelib_assert_fail` function which raises the flag. + +Reading of assert failure flag is performed with the help of auxiliary kernel +which is enqueued as dependent on user's one. The flag state is checked later +in host-task. This is achieved with approximately the following changes: + +```c++ +class AssertFlagCopier; +#ifdef __SYCL_DEVICE_ONLY__ +int __devicelib_assert_read(void); +#endif + +class queue { + template event submit(T CGF) { + event Event = submit_impl(CGF); + std::string KernelName = /* get kernel name from calls to parallel_for, etc. */; + // assert required + if (!get_device()->assert_fail_supported() && isAssertUsed(KernelName)) { + // __devicelib_assert_fail isn't supported by Device-side Runtime + // Linking against fallback impl of __devicelib_assert_fail is performed + // by program manager class + AssertHappened *AH = new AssertHappened; + buffer *Buffer = new buffer{1, AH}; + + // read flag value + event CopierEv = submit_impl([&](handler &CGH) { + CGH.depends_on(Event); + + auto Acc = Buffer->get_access(CGH); + + CGH.single_task([=] { +#ifdef __SYCL_DEVICE_ONLY__ + Acc[0].Flag = __devicelib_assert_read(); +#endif + }); + }); + + // check flag state + submit_impl([=](handler &CGH) { + CGH.depends_on(CopierEv); + + CGH.codeplay_host_task([=] { + if (AH->Flag) + abort(); + + free(Buffer); + free(AH); + }); + }); + } + return Event; + } +}; +``` + diff --git a/sycl/doc/extensions/Assert/SYCL_ONEAPI_ASSERT.asciidoc b/sycl/doc/extensions/Assert/SYCL_ONEAPI_ASSERT.asciidoc new file mode 100644 index 0000000000000..24004b525d37d --- /dev/null +++ b/sycl/doc/extensions/Assert/SYCL_ONEAPI_ASSERT.asciidoc @@ -0,0 +1,170 @@ += SYCL_EXT_ONEAPI_ASSERT + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +// This is necessary for asciidoc, but not for asciidoctor +:cpp: C++ + +== Notice + +IMPORTANT: This specification is a draft. + +Copyright (c) 2021 Intel Corporation. All rights reserved. + +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are +trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. +used by permission by Khronos. + +NOTE: This document is better viewed when rendered as html with asciidoctor. +GitHub does not render image icons. + +== Dependencies + +This extension is written against the SYCL 2020 specification, Revision 3. + +== Status + +Working Draft + +This is a preview extension specification, intended to provide early access to +a feature for review and community feedback. When the feature matures, this +specification may be released as a formal extension. + +Because the interfaces defined by this specification are not final and are +subject to change they are not intended to be used by shipping software +products. + +== Introduction + +This extension adds the ability for device code to call the C++ `assert()` +macro. The behavior of `assert()` in device code is similar to its behavior in +host code. If the asserted condition is false, a message is printed to `stderr` +and then the program aborts with `std::abort()`. + +The format of the assert message is unspecified, but it will always include the +text of the failing expression, the values of the standard macros `+__FILE__+` +and `+__LINE__+`, and the value of the standard variable `+__func__+`. If the +failing assert comes from an `nd_range` `parallel_for` it will also include the +global ID and the local ID of the failing work item. + +Some devices implement `assert()` natively while others use a fallback +implementation, and the two implementations provide different guarantees. The +native implementation is most similar to the way `assert()` works on the host. If +an assertion fails in the native implementation, the assertion message is +immediately printed to stderr and the program terminates by calling +`std::abort()`. If an assertion fails with the fallback implementation, the +failing assert() returns back to its caller and the device code must continue +executing (without deadlocking) until the kernel completes. The implementation +prints the assertion message to stderr and terminates with `std::abort()` only +after the kernel completes execution. An application can determine which of the +two mechanisms a device uses by testing the device aspect +`aspect::ext_oneapi_native_assert`. + +The `assert()` macro is defined in system include headers, not in SYCL headers. +On most of systems it is `` and/or `` header files. +The user can disable assertions in device code by defining the `NDEBUG` +preprocessor macro prior to including either of `` and +`/`. + +Following is an example use-case: + +[source] +---- +#include +#include + +using namespace sycl; + +void user_func(item<2> Item) { + assert((Item[0] % 2) && “Nil”); +} + +int main() { + queue Q; + Q.submit([&] (handler& CGH) { + CGH.parallel_for(range<2>{N, M}, [=](item<2> It) { + do_smth(); + user_func(It); + do_smth_else(); + }); + }); + Q.wait(); + std::cout << “One shouldn’t see this message.“; + return 0; +} +---- + +== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification section 6.3.3 "Feature test macros". Therefore, an implementation +supporting this extension must predefine the macro `SYCL_EXT_ONEAPI_ASSERT` to +one of the values defined in the table below. Applications can test for the +existence of this macro to determine if the implementation supports this +feature, or applications can test the macro’s value to determine which of the +extension’s APIs the implementation supports. + +[%header,cols="1,5"] +|=== +|Value |Description +|1 |Initial extension version. Base features are supported. +|=== + +== Extension to `enum class aspect` + +[source] +---- +namespace sycl { +enum class aspect { + ext_oneapi_native_assert +} +} +---- + +If device has the `ext_oneapi_native_assert` aspect, then its Device-Side +Runtime is capable of native support of `assert`. That is, safe implementation +is used. If device doesn't have the aspect, then fallback implementation is +used. + +== Version + +Built On: {docdate} + +Revision: 1 + +== Issues + +None. + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2021-04-08|Sergey Kanaev, Gregory M Lueck |*Initial public working draft* +|======================================== + +//************************************************************************ +//Other formatting suggestions: +// +//* Use *bold* text for host APIs, or [source] syntax highlighting. +//* Use +mono+ text for device APIs, or [source] syntax highlighting. +//* Use +mono+ text for extension names, types, or enum values. +//* Use _italics_ for parameters. +//************************************************************************ diff --git a/sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst b/sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst index 8b8b98d7a12bb..1c370e57ad89c 100644 --- a/sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst +++ b/sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst @@ -33,6 +33,9 @@ Example of a message: .. code: foo.cpp:42: void foo(int): global id: [0,0,0], local id: [0,0,0] Assertion `buf[wiID] == 0 && "Invalid value"` failed. +See also: assert_extension_. +.. _assert_extension: ../Assert/SYCL_ONEAPI_ASSERT.asciidoc) + cl_intel_devicelib_math ========================== diff --git a/sycl/doc/extensions/README.md b/sycl/doc/extensions/README.md index d1a24751d730e..5866d9e5caf38 100755 --- a/sycl/doc/extensions/README.md +++ b/sycl/doc/extensions/README.md @@ -41,6 +41,7 @@ DPC++ extensions status: | [SYCL_INTEL_group_sort](GroupAlgorithms/SYCL_INTEL_group_sort.asciidoc) | Proposal | | | [Invoke SIMD](InvokeSIMD/InvokeSIMD.asciidoc) | Proposal | | | [Uniform](Uniform/Uniform.asciidoc) | Proposal | | +| [Assert](Assert/SYCL_ONEAPI_ASSERT.asciidoc) | Proposal | | Legend: diff --git a/sycl/doc/images/assert-fallback-graph.svg b/sycl/doc/images/assert-fallback-graph.svg new file mode 100644 index 0000000000000..fadf4a07ba1c0 --- /dev/null +++ b/sycl/doc/images/assert-fallback-graph.svg @@ -0,0 +1,3 @@ + + +
User's kernel #1
User's kernel #1
User's kernel #2
User's kernel #2
User's kernel #3
User's kernel #3
Copy assert failure flag
Copy assert failure...
Copy assert failure flag
Copy assert failure...
Host-task with check for the value of assert failure flag
Host-task with check...
Host-task with check for the value of assert failure flag
Host-task with check...
Copy assert failure flag
Copy assert failure...
Host-task with check for the value of assert failure flag
Host-task with check...
Viewer does not support full SVG 1.1
diff --git a/sycl/doc/index.rst b/sycl/doc/index.rst index 8089d12230730..9be7037fbd959 100644 --- a/sycl/doc/index.rst +++ b/sycl/doc/index.rst @@ -32,3 +32,5 @@ Developing oneAPI DPC++ Compiler KernelProgramCache GlobalObjectsInRuntime LinkedAllocations + Assert +