@@ -313,12 +313,47 @@ computing the private memory size. As range rounding only applies to basic
313
313
kernels (parametrized by a ` sycl::range ` ), local internalization is not affected
314
314
by the range rounding transformation.
315
315
316
+ ### Reductions
317
+
318
+ Kernel fusion of reductions is partially supported. In order to preserve the
319
+ legality of the fused kernel, i.e., the fact that fused kernel must perform the
320
+ same work as the graph of kernels to be fused, only the fusion of the following
321
+ reduction strategies is supported at the time of writing:
322
+
323
+ - ` group_reduce_and_last_wg_detection `
324
+ - ` local_atomic_and_atomic_cross_wg `
325
+ - ` range_basic `
326
+ - ` group_reduce_and_atomic_cross_wg `
327
+ - ` local_mem_tree_and_atomic_cross_wg `
328
+
329
+ Other strategies require implicit inter-work-group synchronization, not
330
+ supported in kernel fusion.
331
+
332
+ Users may encounters errors, e.g., fusion being aborted or incorrect results due
333
+ to race conditions or any other cause, when using the ` sycl::reduction `
334
+ interface. The SYCL runtime will choose different algorithms depending on the
335
+ reduction operator, data type and hardware capabilities, so strategy selection
336
+ is not possible through the regular interface. In this case, users can instead
337
+ use ` sycl::detail::reduction_parallel_for ` , forcing a supported fusion
338
+ strategy. Reductions implementation in
339
+ [ ` sycl/reduction.hpp ` ] ( ../../include/sycl/reduction.hpp ) might give users an
340
+ insight into which kind of reductions to use for their purposes:
341
+
342
+ ``` c++
343
+ q.submit([&](sycl::handler &cgh) {
344
+ sycl::accessor in(dataBuf, cgh, sycl::read_only);
345
+ sycl::reduction sum(sumBuf, cgh, sycl::plus<>{});
346
+ // Force supported 'group_reduce_and_last_wg_detection' strategy
347
+ sycl::detail::reduction_parallel_for<sycl::detail::auto_name,
348
+ sycl::detail::strategy::group_reduce_and_last_wg_detection>(...);
349
+ });
350
+ ```
351
+
316
352
### Unsupported SYCL constructs
317
353
318
354
The following SYCL API constructs are currently not officially supported for
319
355
kernel fusion and should be considered untested/unsupported:
320
356
321
- - Reductions
322
357
- ` sycl::stream `
323
358
- Specialization constants and ` sycl::kernel_handler `
324
359
- Images (` sycl::unsampled_image ` and ` sycl::sampled_image ` )
0 commit comments