diff --git a/sycl/doc/UsersManual.md b/sycl/doc/UsersManual.md index 8a0ac1be8c725..a8ba4715ff645 100644 --- a/sycl/doc/UsersManual.md +++ b/sycl/doc/UsersManual.md @@ -1,55 +1,117 @@ # Users Manual -The DPC++ Compiler contains many options to generate the desired binaries for -your application. +This is the list of SYCL specific options supported by compiler and some +examples. -## SYCL specific command line options +Options marked as [DEPRECATED] are going to be removed in some future updates. +Options marked as [EXPERIMENTAL] are expected to be used only in limitted cases +and not recommended to use in production environment. + +## Generic options **`-fsycl`** - General enabling option for SYCL compilation mode. This option enables - offload compilation for a given target using the `-fsycl-targets` option. - When the `-fsycl-targets` option is not provided, the default triple is - `spir64-unknown-unknown-sycldevice`. + General enabling option for SYCL compilation and linking mode. List of + targets can be specified with `-fsycl-targets`. This is fundamental option + for any SYCL compilation. All other SYCL specific options require it. + +**`-fsycl-targets=[,...,]`** + + Enables ahead of time (AOT) compilation for specified device targets. T is + a compiler target triple string, representing a target device architecture. + You can specify more than one target, comma separated. Default just in time + (JIT) compilation target can be added to the list to produce a combination + of AOT and JIT code in the resulting fat binary. + The following triples are supported by default: + * spir64-unknown-unknown-sycldevice - this is the default generic SPIR-V + target; + * spir64_x86_64-unknown-unknown-sycldevice - generate code ahead of time + for x86_64 CPUs; + * spir64_fpga-unknown-unknown-sycldevice - generate code ahead of time for + Intel FPGA; + * spir64_gen-unknown-unknown-sycldevice - generate code ahead of time for + Intel Processor Graphics; + Available in special build configuration: + * nvptx64-nvidia-cuda-sycldevice - generate code ahead of time for CUDA + target; + +## Language options + +**`-sycl-std=`** [EXPERIMENTAL] -**`-fsycl-targets=`** + SYCL language standard to compile for. Possible values: + * 121 - SYCL 1.2.1 + * 2020 - SYCL 2020 + It doesn't guarantee specific standard compliance, but some selected + compiler features change behavior. + It is under development and not recommended to use in production + environment. + Default value is 2020. - A comma separated list of triples to specify the device target(s) to - generate code for. This option is only valid when used with `-fsycl`. +**`-f[no-]sycl-unnamed-lambda`** -### Target toolchain options. + Enables/Disables unnamed SYCL lambda kernels support. + Disabled by default. -**`-Xsycl-target-backend= `** +## Optimization options - Pass to the SYCL based backend identified by . +**`-f[no-]sycl-early-optimizations`** -**`-Xsycl-target-backend `** + Enables (or disables) intermediate representation optimization pipeline + before translation to SPIR-V. Have effect only if optimizations are turned + on by standard compiler options (-O1 or higher). + Enabled by default. - Pass to the SYCL based target backend. +**`-f[no-]sycl-dead-args-optimization`** -**`-Xsycl-target-frontend= `** + Enables (or disables) LLVM IR dead argument elimination pass to remove + unused arguments for the kernel functions before translation to SPIR-V. + Currently has effect only on spir64\* targets. + Disabled by default. - Pass to the SYCL based target frontend identified by . +**`-f[no-]sycl-id-queries-fit-in-int`** -**`-Xsycl-target-frontend `** + Assume/Do not assume that SYCL ID queries fit within MAX_INT. It assumes + that these values fit within MAX_INT: + * id class get() member function and operator[] + * item class get_id() member function and operator[] + * nd_item class get_global_id()/get_global_linear_id() member functions + Enabled by default. - Pass to the SYCL based target frontend. +## Target toolchain options -**`-Xsycl-target-linker= `** +**`-Xsycl-target-backend= "options"`** +**`-Xs "options"`** - Pass to the SYCL based target linker identified by . + Pass "options" to the backend of target device compiler, specified by + triple T. The backend of device compiler generates target machine code from + intermediate representation. This option can be used to tune code + generation for a specific target. The "options" are used during AOT + compilation. For JIT compilation "options" are saved in a fat binary and + used when code is JITed during runtime. + -Xs is a shortcut to pass "options" to all backends specified via the + '-fsycl-targets' option (or default one). -**`-Xsycl-target-linker `** +**`-Xsycl-target-frontend= "options"`** - Pass to the SYCL based target linker. + Pass "options" to the frontend of target device compiler, specified by + triple T. This option can be used to control of intermediate representation + generation during offline or online compilation. -### Link options +**`-Xsycl-target-linker= "options"`** + + Pass "options" to the device code linker, when linking multiple device + object modules. T is specific target device triple. + +## Link options **`-fsycl-link`** - Generate partially linked device object to be used with the host link. + Link device object modules and wrap those into a host-compatible object + module that can be linked later by any standard host linker into the final + fat binary. -**`-fsycl-link-targets=`** +**`-fsycl-link-targets=`** [DEPRECATED] Specify comma-separated list of triples SYCL offloading targets to produce linked device images. Used in a link step to link device code for given @@ -57,7 +119,7 @@ your application. of the common prefix taken from the -o option and the triple string. Does not produce fat binary and must be used together with -fsycl. -**`-fsycl-add-targets=`** +**`-fsycl-add-targets=`** [DEPRECATED] Add arbitrary device images to the fat binary being linked @@ -68,7 +130,7 @@ your application. image for the target triple it is paired with, and offload bundler is invoked to do the actual bundling. -**`-foffload-static-lib=`** +**`-foffload-static-lib=`** [DEPRECATED] Link with fat static library. @@ -81,95 +143,81 @@ your application. specified with `-foffload-static-lib` are treated as host libraries and are only used during the final host link. -**`-foffload-whole-static-lib=`** +**`-foffload-whole-static-lib=`** [DEPRECATED] Similar to `-foffload-static-lib` but uses the whole archive when performing the device code extraction. This is helpful when creating shared objects from fat static archives. -**`-fsycl-device-code-split=`** +**`-fsycl-device-code-split=`** - Perform SYCL device code split. There are three possible values for this - option: - - per_kernel - a separate device code module is created for each SYCL - kernel. Each device code module will contain a kernel and all its - dependencies, i.e. called functions and used variables. - - per_source - a separate device code module is created for each source - (translation unit). Each device code module will contain a bunch of kernels - grouped on per-source basis and all their dependencies, i.e. all used - variables and called functions, including the `SYCL_EXTERNAL` macro-marked - functions from other translation units. - - off - no device code split. - NOTE: By default device code split is 'off' - all kernels go into a - single module. + Specifies SYCL device code module assembly. Mode is one of the following: + * per_kernel - creates a separate device code module for each SYCL kernel. + Each device code module will contain a kernel and all its dependencies, + such as called functions and used variables. + * per_source - creates a separate device code module for each source + (translation unit). Each device code module will contain a bunch of + kernels grouped on per-source basis and all their dependencies, such as + all used variables and called functions, including the `SYCL_EXTERNAL` + macro-marked functions from other translation units. + * off - creates a single module for all kernels. + * auto - the compiler will use a heuristic to select the best way of + splitting device code. This is default mode. -**`-fsycl-device-code-split`** +**`-f[no-]sycl-device-lib=[,,...]`** - Perform SYCL device code split in the per_source mode, i.e. create a - separate device code module for each source (translation unit). + Enables/disables linking of the device libraries. Supported libraries: + libm-fp32, libm-fp64, libc, all. Use of 'all' will enable/disable all of + the device libraries. -### Intel FPGA specific options +## Intel FPGA specific options **`-fintelfpga`** - Perform ahead of time compilation for Intel FPGA, which relies on the - external tool `aoc` being available in the `PATH`. - - This option is roughly equivalent to - `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -g -MMD -lOpenCL`. - - It is incompatible with `-fsycl-targets=...`; if ahead of time compilation - is needed for multiple backends (e.g. Intel FPGA, Intel GPU, etc.), the - alternative form based on - `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice` should be used - instead. + Perform ahead of time compilation for Intel FPGA. It sets the target to + FPGA and turns on the debug options that are needed to generate FPGA + reports. It is functionally equivalent shortcut to + `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -g -MMD` on Linux + and `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -Zi -MMD` on + Windows. -**`-fsycl-link=`** +**`-fsycl-link=`** - Generate partially linked device and host object to be used at various - stages of compilation. Takes the device binary(s) generated from a `-fsycl` - enabled compilation and wrap to create a host linkable object. This option - is enabled only in ahead of time compilation mode fore FPGA (i.e. when - `-fintelfpga` is set). + Controls FPGA target binary output format. Same as -fsycl-link, but + optional output can be one of the following: + * early - generate html reports and an intermediate object file that avoids + a full Quartus compile. Usually takes minutes to generate. Link can later + be resumed from this point using -fsycl-link=image. + * image - generate a bitstream which is ready to be linked and used on a + FPGA board. Usually takes hours to generate. **`-reuse-exe=`** - Speed up FPGA aoc compile if the device code in is unchanged. + Speed up FPGA backend compilation if the device code in is + unchanged. If it's safe to do so the compiler will re-use the device binary + embedded within it. This can be used to minimize or avoid long Quartus + compile times for FPGA targets when the device code is unchanged. -### Other options +## Other options **`-fsycl-device-only`** - Compile only SYCL device code. + Compile only device part of the code and ignore host part. -**`-fsycl-use-bitcode`** +**`-f[no-]sycl-use-bitcode`** [EXPERIMENTAL] Emit SYCL device code in LLVM-IR bitcode format. When disabled, SPIR-V is - emitted. Default is true. - -**`-fno-sycl-use-bitcode`** - - Use SPIR-V instead of LLVM bitcode in fat objects. - -**`-sycl-std=`** - - SYCL language standard to compile for. - -**`-fsycl-help`** - - Emit help information from all of the offline compilation tools. - -**`-fsycl-help=`** - - Emit help information from the offline compilation tool associated with the - given architecture argument. Supported architectures: `x86_64`, `fpga` and - `gen`. + emitted. + Enabled by default. -**`-fsycl-unnamed-lambda`** +**`-fsycl-help[=backend]`** - Allow unnamed SYCL lambda kernels. + Emit help information from device compiler backend. Backend can be one of + the following: "x86_64", "fpga", "gen", or "all". Specifying "all" is the + same as specifying -fsycl-help with no argument and emits help for all + backends. -## SYCL device code compilation +# Example: SYCL device code compilation To invoke SYCL device compiler set `-fsycl-device-only` flag. @@ -184,36 +232,3 @@ By default the output format for SYCL device is LLVM bytecode. ```console $ clang++ -fsycl-device-only -fno-sycl-use-bitcode sycl-app.cpp -o sycl-app.spv ``` - -## Static archives with SYCL device code - -The DPC++ Compiler contains support to create and use static archives that -contain device enabled fat objects. - -### Build your objects - -```console -$ clang++ -fsycl sycl-app1.cpp sycl-app2.cpp -c -``` - -### Create the static archive - -Build the static archive in the same manner as you would any other normal -static archive, using the objects that were created using the above step. - -```console -$ ar cr libsyclapp.a sycl-app1.o sycl-app2.o -``` - -### Use the static archive - -Once you have created the archive, you can use it when creating your final -application. The fat archives are treated differently than a regular archive -so the option `-foffload-static-lib` is used to signify the needed behavior. - -```console -$ clang++ -fsycl sycl-main.cpp -foffload-static-lib=libsyclapp.a -``` - -Use of `-foffload-static-lib` is required or the library will be treated as -a normal archive.