-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] [DOC] Prepare design-document for assert feature #3461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
2911ea7
b69a1cd
15ea88e
ca08fec
1f8d9a9
2ee590c
77699a2
001a573
32b6479
b8637c2
b0cd85f
8c03648
121c945
13b40fd
a4b4884
c06db5f
823124a
a99368b
78d7fcb
6882e95
32663e0
2b84a83
423107b
7611511
a31b808
257054a
3f50173
c1326aa
5095b1a
5078fcc
4dc7b1f
9bcac02
7ec3ac8
8cbfde7
cc085f5
8835bf8
8835756
ecb8659
07debdb
995e4d8
b57ac48
d2f13ff
6281bc5
a5461f3
32a32f4
641d071
dc058a9
16fd8f0
fbca768
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
# Assert feature | ||
|
||
**IMPORTANT**: This document is a draft. | ||
|
||
During debugging of kernel code user may put assertions here and there. | ||
The expected behaviour of assertion failure at host is application abort. | ||
Our choice for device-side assertions is asynchronous exception in order to | ||
allow for extensibility. | ||
|
||
The user is free to disable assertions by defining `NDEBUG` macro at | ||
compile-time. | ||
|
||
|
||
## Use-case example | ||
|
||
``` | ||
using namespace cl::sycl; | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
auto ErrorHandler = [] (exception_list Exs) { | ||
for (exception_ptr const& E : Exs) { | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
try { | ||
std::rethrow_exception(E); | ||
} | ||
catch (event_error const& Ex) { | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
std::cout << “Exception - ” << Ex.what(); // assertion failed | ||
std::abort(); | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
}; | ||
|
||
void user_func(item<2> Item) { | ||
assert((Item[0] % 2) && “Nil”); | ||
} | ||
|
||
int main() { | ||
queue Q(ErrorHandler); | ||
q.submit([&] (handler& CGH) { | ||
CGH.parallel_for<class TheKernel>(range<2>{N, M}, [=](item<2> It) { | ||
do_smth(); | ||
user_func(It); | ||
do_smth_else(); | ||
}); | ||
}); | ||
Q.wait_and_throw(); | ||
std::cout << “One shouldn’t see this message.“; | ||
return 0; | ||
} | ||
``` | ||
|
||
In this use-case every work-item with even X dimension will trigger assertion | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
failure. Assertion failure should be reported via asynchronous exceptions. If | ||
asynchronous exception handler is set the failure is reported with | ||
`cl::sycl::event_error` exception. Otherwise, SYCL Runtime should trigger abort. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
At least one failed assertion should be reported. | ||
|
||
When multiple kernels are enqueued and both fail at assertion at least single | ||
assertion should be reported. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## User requirements | ||
|
||
From user's point of view there are the following requirements: | ||
|
||
| # | Title | Description | Importance | | ||
| - | ----- | ----------- | ---------- | | ||
| 1 | Handle assertion failure | Signal about assertion failure via SYCL asynchronous exception | Must have | | ||
| 2 | Print assert message | Assert function should print message to stderr at host | Must have | | ||
| 3 | Stop under debugger | When debugger is attached, break at assertion point | Highly desired | | ||
| 4 | Reliability | Assert failure should be reported regardless of kernel deadlock | Highly desired | | ||
|
||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Contents of `cl::sycl::event_error` | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`cl::sycl::event_error::what()` should return the same assertion failure message | ||
as is printed at the time being. | ||
|
||
Other than that, interface of `cl::sycl::event_error` should look like: | ||
``` | ||
class event_error : public runtime_error { | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public: | ||
event_error() = default; | ||
|
||
event_error(const char *Msg, cl_int Err) | ||
: event_error(string_class(Msg), Err) {} | ||
|
||
event_error(const string_class &Msg, cl_int Err) : runtime_error(Msg, Err) {} | ||
|
||
/// Returns global ID with the dimension provided | ||
int globalId(int Dim) const; | ||
|
||
/// Returns local ID with the dimension provided | ||
int localId(int Dim) const; | ||
}; | ||
``` | ||
|
||
Regardless of whether asynchronous exception handler is set or not, there's an | ||
action to be performed by SYCL Runtime. To achieve this, information about | ||
assert failure should be propagated from device-side to SYCL Runtime. This | ||
should be performed via calls to `clGetEventInfo` for OpenCL backend and | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`zeEventQueryStatus` for Level-Zero backend. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Terms | ||
|
||
- Device-side Runtime - part of device-code, which is supplied by Device-side | ||
Compiler. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Low-level Runtime - the backend/runtime, behind DPCPP Runtime. | ||
- Device-side Compiler - compiler which generates device-native bitcode based | ||
on input SPIR-V image. | ||
- Accessor metadata - parts of accessor representation at device-side: pointer, | ||
ranges, offset. | ||
|
||
## How it works? | ||
|
||
For the time being, `assert(expr)` macro ends up in call to | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`__devicelib_assert_fail` function. This function is part of [Device library extension](doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst#cl_intel_devicelib_cassert). | ||
Device code already contains call to the function. Currently, a device-binary | ||
is always linked against fallback implementation. | ||
Device-side compiler/linker provides their implementation of `__devicelib_assert_fail` | ||
and prefer this implementation over fallback one. | ||
|
||
If Device-side Runtime supports `__devicelib_assert_fail` then Low-Level Runtime | ||
is responsible for: | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- detecting if assert failure took place; | ||
- flushing assert message to `stderr` on host. | ||
When detected, Low-level Runtime reports assert failure to DPCPP Runtime | ||
at synchronization points. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Refer to [OpenCL](doc/extensions/Assert/opencl.md) and [Level-Zero](doc/extensions/Assert/level-zero.md) | ||
extensions. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If Device-side Runtime doesn't support `__devicelib_assert_fail` then a buffer | ||
based approach comes in place. The approach doesn't require any support from | ||
Device-side Runtime. Neither it does from Low-level Runtime. | ||
|
||
Within this approach, a dedicated assert buffer is allocated and implicit kernel | ||
argument is introduced. The argument is an accessor with `discard_read_write` | ||
or `discard_write` access mode. Accessor metadata is stored to program scope | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
variable. This allows to refer to the accessor without modifying each and every | ||
user's function. Fallback implementation of `__devicelib_assert_fail` restores | ||
accessor metadata from program scope variable and writes assert information to | ||
the assert buffer. Atomic operations are used in order to not overwrite existing | ||
information. | ||
|
||
Storing and restoring of accessor metadata to/from program scope variable is | ||
performed with help of builtins. Implementations of these builtins are | ||
substituted by frontend. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Overview | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This extension enables detection of assert failure of kernel. | ||
|
||
# New enum value | ||
|
||
`ze_result_t` enumeration should be augmented with `ZE_RESULT_ABORTED` enum | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
element. This enum value indicated a detected assert failure at device-side. | ||
|
||
# Changed API | ||
|
||
``` | ||
ze_event_handle_t Event; // describes an event of kernel been submitted previously | ||
ze_result Result = zeEventQueryStatus(Event); | ||
``` | ||
|
||
If kernel failed an assertion `zeEventQueryStatus` should return | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don;t think this is possible to achieve in asynchronous / non-blocking way in L0. We dont have any communication between kernel and event - so we can;t signal events with "assert happened" information. if we use global / program wide assert buffer - each kernel will be using the same assert happened flag - we do not have fine grain control to determine which kernel - and which connected event fired the assert. Fences could be used - allowing to synchronize at cmdQueue level and not kernel - any kernel causing assert executed in cmd Queue can then make fence synchronize to return error:https://spec.oneapi.com/level-zero/latest/core/PROG.html#fences There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it still possible in OpenCL? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you, please, provide more details about using fences? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fences are decribed in L0 spec - they are similar to events, but directly connected to command queues: https://spec.oneapi.com/level-zero/latest/core/PROG.html#fences In OpenCL the submission model is different - each enqueue is independent - single kernel is submitted ( queued) at a time. L0 operates on command lists that may contain multiple kernels - once cmd list is submitted to HW - we can;t control when a kernel in whole sequence is started completed. OpenCL handles kernels with printf in a blocking way - enqueueNDRangeKErnel with printf makes this a blocking call - so we have fine control when specific kernel is completed - we can do the same for assert() message - output event will be created when the kernel has already finished. I L0 this is not possible - as we would have to synchronize whoel command list. |
||
`ZE_RESULT_ABORTED`. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Overview | ||
|
||
This extension enables detection of assert failure of kernel. | ||
s-kanaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# New error code | ||
|
||
`CL_ASSERT_FAILURE` is added to indicate a detected assert failure at | ||
device-side. | ||
|
||
# Changed API | ||
|
||
``` | ||
cl_event Event; // describes an event of kernel been submitted previously | ||
cl_int Result; | ||
size_t ResultSize; | ||
|
||
clGetEventInfo(Event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(Result), &Result, &ResultSize); | ||
``` | ||
|
||
If kernel failed an assertion `clGetEventInfo` should put `CL_ASSERT_FAILURE` | ||
in `Result`. | ||
|
Uh oh!
There was an error while loading. Please reload this page.