From 5fd8cacb4354f86284dde4a277c0cad2aa2df62e Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 1 Mar 2021 23:39:28 +0300 Subject: [PATCH 01/24] WIP --- sycl/doc/SYCL2020-SpecializationConstants.md | 765 +++++++++++++++++++ 1 file changed, 765 insertions(+) create mode 100644 sycl/doc/SYCL2020-SpecializationConstants.md diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md new file mode 100644 index 0000000000000..f38afbcc4fe1e --- /dev/null +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -0,0 +1,765 @@ +# Specialization constants + +Specialization constants are implemented in accordance with how they are defined +by SYCL 2020 specification: [SYCL registry][sycl-registry], +[direct link to the specification][sycl-2020-spec]. + +[sycl-registry]: https://www.khronos.org/registry/SYCL/ +[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf + +TODO: feature overview? code example? + +## Design + +[SYCL 2020][sycl-2020-spec] defines specialization constant as: + +> A constant variable where the value is not known until compilation of the +> SYCL kernel function. +> +> Glossary + +Therefore, implementation is based on [SPIR-V speficiation][spirv-spec] support +for [Specialization][spirv-specialization]. + +[spirv-spec]: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html +[spirv-specialization]: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#SpecializationSection + +However, the specification also states the following: + +> It is expected that many implementations will use an intermediate language +> representation ... such as SPIR-V, and the intermediate language will have +> native support for specialization constants. However, implementations that do +> not have such native support must still support specialization constants in +> some other way. +> +> Section 4.11.12.2. Specialization constant support + +Having that said, the following should be implemented: + +1. We need to ensure that in generated SPIR-V, calls to +`get_specialization_constant` are replaced with corresponding instructions for +referencing specialization constants. + +2. SYCL provides a mechanism to specify default values of specialization +constants, which should be reflected in generated SPIR-V. This part is +especially tricky, because this happens in host part of the SYCL program, which +means that without special handling it won't even be visible to device compiler. + +3. We need to ensure that DPC++ RT properly set specialization constants used in +the program: SYCL uses non-type template parameters to identify specialization +constants in the program, while at SPIR-V and OpenCL level, each specialization +constant is defined by its numerical ID, which means that we need to maintain +some mapping from SYCL identifiers to a numeric identifiers to be able to set +specialization constats. Moreover, at SPIR-V level composite specialization +constants do not have separate ID and can only be set by setting value to each +member of a composite, which means that we have 1:n mapping between SYCL +identifiers and numeric IDs of specialization constants. + +4. When AOT compilation is used or target is a CUDA device (where NVPTX +intermediate representation is used), we need to somehow emulate support for +specialization constants. + +The following sections describe how each item is implemented and which +components are responsible for what. The rest of design document is split info +two parts: +- Support for native specialization constants: items (1), (2) and (3) +- Emulation of specialization constants: item (4) + +Note: emulation part re-uses a lot of things described in native support +section, so if you want to get familiar with emulation in all details, it is +recommended to read native support section first. + +### Support for native specialization constants + +#### DPC++ Headers + +DPC++ Headers provide required definitions of `specialization_id` and +`kernel_handler` classes as well as of many other classes and methods. + +`kernel_handler::get_specialization_constant` method, which provides an access +to specialization constants within device code performs the following tasks: +- It provides a mapping from non-type template parameter, which is used as a + specialization constant identifier in SYCL/DPC++ source file to a symbolic ID + of the constant, which is used by the compiler. +- It provides a special markup, which allows the compiler to detect + specialization constants in the device code and properly handle them. + + +``` +namespace sycl { + +namespace detail { + +template +struct specialization_id_name_generator {}; + +} // namespace detail + +// It is possible that `DefaultValue` will be marked as `const` +template +T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, void *RTBuffer); +template +T __sycl_getComposite2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, void *RTBuffer); + +class kernel_handler { +public: + template + typename std::remove_reference_t::type get_specialization_constant() { +#ifdef __SYCL_DEVICE_ONLY__ + return get_on_device(); +#else + // some fallback implementation in case this code is launched on host +#endif __SYCL_DEVICE_ONLY__ + } + +private: +#ifdef __SYCL_DEVICE_ONLY__ + template::type> + // enable_if T is a scalar type + T get_on_device() { + const char *SymbolicID = __builtin_unique_stable_name(detail::specialization_id_name_generator); + return __sycl_getScalar2020SpecConstantValue(SymbolicID, &S, Ptr); + } + + template::type> + // enable_if T is a composite type + T get_on_device() { + const char *SymbolicID = __builtin_unique_stable_name(detail::specialization_id_name_generator); + return __sycl_getComposite2020SpecConstantValue(SymbolicID, &S, Ptr); + } +#endif // __SYCL_DEVICE_ONLY__ + + byte *Ptr = nullptr; +}; + +} // namespace sycl +``` + +Here [`__builtin_unique_stable_name`][builtin-unique-stable-name] +is a compiler built-in used to translate types to unique strings, which are +used as symbolic IDs of specialization constants. + +[builtin-unique-stable-name]: https://github.com/intel/llvm/blob/sycl/clang/docs/LanguageExtensions.rst#__builtin_unique_stable_name + +`__sycl_getScalar2020SpecConstantValue` and +`__sycl_getComposite2020SpecConstant` are functions with special names - they +are declared in the headers but never defined. Calls to them are recognized by +a special LLVM pass later and this is aforementioned special markup required for +the compiler. +Those intrinsics accept three parameters: +1. Symbolic ID of a specialization constant. Even though at SPIR-V level + specialization constants are identified by numeric IDs, we can't use them + here, because: + - Those IDs can't be generated by runtime, because they need to be encoded + into resulting SPIR-V device image + - Those IDs can't be generated by front-end compiler, because it only sees a + single translation unit at a time and therefore it can't assign unique IDs + to specialization constants from different translation units. + + Therefore, the decision was made to use symbolic IDs as interface between the + compiler and runtime to connect SYCL identifiers of specialization constants + with SPIR-V identifiers of specialization constants. + +2. Default value of the specialization constant. + It is expected that at LLVM IR level the argument will contain a pointer to + a global variable with the initializer, which should be used as the default + value of the specialization constants. + +3. Pointer to a buffer, which will be used if native specialization constants + are not available. This pointer is described later in the section + corresponding to emulation of specialization constants. + +Compilation and subsequent linkage of the device code results in a number of +`__sycl_getScalar2020SpecConstantValue` and +`__sycl_getComposite2020SpecConstantValue` calls. Before generating a device +binary, each linked device code LLVM IR module undergoes processing by +`sycl-post-link` tool which can run LLVM IR passes before passing the module +onto the SPIR-V translator. + +#### DPC++ Compiler: sycl-post-link tool + +As it is stated above, the only place where we can properly handle +specialization constants is somewhere during or after linking device code from +different translation units, so it happens in `sycl-post-link` tool. + +There is a `SpecConstantsPass` LLVM IR pass which: +1. Assigns numeric IDs to specialization constants found in the linked module. +2. Brings IR to the form expected by the SPIR-V translator. +3. Collects and provides \ =\> \ + mapping, which is later being used by DPC++ RT to set specialization constant + values provided by user. + +##### Assignment of numeric IDs to specialization constants + +This task is achieved by maintaining a map, which holds a list of numeric IDs +for each encountered symbolic ID of a specialization constant. Those IDs are +used to identify the specialization constants at SPIR-V level. + +As noted above one symbolic ID can several numeric IDs assigned to it - such 1:N +mapping comes from the fact that at SPIR-V level, composite specialization +constants don't have dedicated IDs and they are being identified and specialized +through their scalar leafs and corresponding numeric IDs. + +For example, the following code: +``` +struct Nested { + float a, b; +}; +struct A { + int x; + Nested n; +}; + +specialization_id id_int; +specialization_id id_A; +// ... + [=](kernel_handler h) { + h.get_specialization_constant(); + h.get_specialization_constant(); + } +``` + +Will result in the following numeric IDs assignment: +``` +// since `id_int` is a simple arithmetic specialization constant, we only +// have a single numeric ID associated with its symbolic ID +unique_symbolic_id_for_id_int -> { 0 } +// `id_A` is a composite with three leafs (scalar members, including ones +// located in nested composite types), which results in three numeric IDs +// associated with the same symbolic ID +unique_symbolic_id_for_id_A -> { 1, 2, 3 } +``` + +As it is shown in the example above, if a composite specialization constant +contains another composite within it, that nested composite is also being +"flattened" and its leafs are considered to be leafs of the parent +specialization constants. This done by depth-first search through the composite +elements. + +##### Transformation of LLVM IR to SPIR-V friendly IR form + +SPIR-V friendly IR form is a special representation of LLVM IR, where some +function are named in particular way in order to be recognizable by the SPIR-V +translator to convert them into corresponding SPIR-V instructions later. +The format is documented [here][spirv-friendly-ir]. + +[spirv-friendly-ir]: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/docs/SPIRVRepresentationInLLVM.rst + +For specialization constant, we need to generate the following constructs: +``` +template // T is arithmetic type +T __spirv_SpecConstant(int numericID, T default_value); + +template // T is composite type, +// Elements are arithmetic or composite types +T __spirv_SpecConstantComposite(Elements... elements); +``` + +Particularly, `SpecConstantsPass` translates calls to the +`T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, char *RTBuffer)` +intrinsic into calls to `T __spirv_SpecConstant(int ID, T default_val)`. +And for `T __sycl_getComposite2020SpecConstantValue(const chat *SybmolicID, void *DefaultValue, char *RTBuffer)` +it generates number of `T __spirv_SpecConstant(int ID, T default_val)` calls for +each leaf of the composite type, plus number of +`T __spirv_SpecConstantComposite(Elements... elements)` for each composite type +(including the outermost one). + +Example of LLVM IR transformation can be found below, input LLVM IR: +``` +%struct.POD = type { [2 x %struct.A], <2 x i32> } +%struct.A = type { i32, float } + +@gold_scalar_default = global %class.specialization_id { i32 42 } +@gold_default = global %class.specialization_id { %struct.POD { [2 x %struct.A] [%struct.A { i32 1, float 2.000000e+00 }, %struct.A { i32 2, float 3.000000e+00 }], <2 x i32> } } + + +; the second argument of intrinsics below are simplified a bit +; in real-life LLVM IR it looks like: +; i8* bitcast (%class.specialization_id* @gold_scalar_default to i8* +%gold_scalar = call i32 __sycl_getScalar2020SpecConstantValue ("gold_scalar_identifier", @gold_scalar_default, i8* %buffer) +%gold = call %struct.POD __sycl_getComposite2020SpecConstantValue ("gold_identifier", @gold_default, i8* %default) +``` + +LLVM IR generated by `SpecConstantsPass`: +``` +%gold_scalar = call i32 __spirv_SpecConstant(i32 0, i32 42) + +%gold_POD_A0_x = call i32 __spirv_SpecConstant(i32 1, i32 1) +%gold_POD_A0_y = call float __spirv_SpecConstant(i32 2, float 2.000000e+00) + +%gold_POD_A0 = call %struct.A __spirv_SpecConstantComposite(i32 %gold_POD_A0_x, float %gold_POD_A0_y) + +%gold_POD_A1_x = call i32 __spirv_SpecConstant(i32 3, i32 2) +%gold_POD_A1_y = call float __spirv_SpecConstant(i32 4, float 3.000000e+00) + +%gold_POD_A1 = call %struct.A __spirv_SpecConstantComposite(i32 %gold_POD_A1_x, float %gold_POD_A1_y) + +%gold_POD_A = call [2 x %struct.A] __spirv_SpecConstantComposite(%struct.A %gold_POD_A0, %struct.A %gold_POD_A1) + +%gold_POD_b0 = call i32 __spirv_SpecConstant(i32 4, i32 44) +%gold_POD_b1 = call i32 __spirv_SpecConstant(i32 6, i32 44) +%gold_POD_b = call <2 x i32> __spirv_SpecConstant(i32 %gold_POD_b0, i32 %gold_POD_b1) + +%gold = call %struct.POD __spirv_SpecConstantComposite([2 x %struct.A] %gold_POD_A, <2 x i32> %gold_POD_b) +``` + +##### Collecting spec constants info and communicating it to DPC++ RT + +For each encountered specialization constants `sycl-post-link` emits a property, +which encodes information required by DPC++ RT to set the value of a +specialization constant through corresponding API. + +This information is communicated through "SYCL/specialization constants" +property set: +``` +// Device binary image property. +// If the type size of the property value is fixed and is no greater than +// 64 bits, then ValAddr is 0 and the value is stored in the ValSize field. +// Example - PI_PROPERTY_TYPE_UINT32, which is 32-bit +struct _pi_device_binary_property_struct { + char *Name; // null-terminated property name + void *ValAddr; // address of property value + uint32_t Type; // _pi_property_type + uint64_t ValSize; // size of property value in bytes +}; +// Named array of properties. +struct _pi_device_binary_property_set_struct { + char *Name; // the name + pi_device_binary_property PropertiesBegin; // array start + pi_device_binary_property PropertiesEnd; // array end +}; +struct pi_device_binary_struct { +... + // Array of property sets; e.g. specialization constants symbol-int ID map is + // propagated to runtime with this mechanism. + pi_device_binary_property_set PropertySetsBegin; + pi_device_binary_property_set PropertySetsEnd; +}; +``` + +So, within a single set we have a separate property for each specialization +constant with name corresponding to its symbolic ID. + +Each such property contains an array of tuples (descriptors) +\. This descriptor might be overcomplicated for +simple arithmetic spec constants, but it is still used for them in order to +unify internal representation of scalar and composite spec constants and +simplify their handling in DPC++ RT. +This descriptor is needed, because at DPC++ RT level, composite constants are +set by user as a byte array and we have to break it down to the leaf members of +the composite and set a value for each leaf as for a separate scalar +specialization constant. + +For simple scalar specialization constants the array will only contain a single +descriptor representing the constant itself. For composite specialization +constants the array will contain several descriptors for each leaf of the +composite type. + +The descriptor contains the following fields: +- ID of a composite constant leaf, i.e. ID of a scalar specialization constant, + which is a part of a composite type or ID of a constant itself if it is a + scalar. +- Offset from the beginning of composite, which points to the location of a + scalar value within the composite, i.e. the position where scalar + specialization constant resides within the byte array supplied by the user. + For scalar specialization constants it will always be 0. +- Size of the scalar specialization constant + +For example, the following code: +``` +struct Nested { + float a, b; +}; +struct A { + int x; + Nested n; +}; + +specialization_id id_int; +specialization_id id_A; +// ... + [=](kernel_handler h) { + h.get_specialization_constant(); + h.get_specialization_constant(); + } +``` + +Will result in the following property set generated: +``` +property_set { + Name = "SYCL/specialization constants", + properties: [ + property { + Name: "id_int_symbolic_ID", + ValAddr: points to byte array [{0, 0, 4}], + Type: PI_PROPERTY_TYPE_BYTE_ARRAY, + Size: sizeof(byte array above) + }, + property { + Name: "id_A_symbolic_ID", + ValAddr: points to byte array [{1, 0, 4}, {2, 4, 4}, {3, 8, 4}], + Type: PI_PROPERTY_TYPE_BYTE_ARRAY, + Size: sizeof(byte array above) + }, + ] +} +``` + +#### DPC++ runtime + +For each device binary compiler generates a map +\ =\> \ ("ID map"). DPC++ +runtime imports that map when loading device binaries. +It also maintains another map \ =\> \ +("value map") per `sycl::kernel_bundle` object. The value map is updated upon +`kernel_bundler::set_specialization_constant(val)` and +`handler::set_specialization_constant(val)` calls from the app. + +In order for runtime to access the right property, it need to compute the +symbolic ID of a specialization constant based on user-provided inputs, such +as non-type template argument passed to `set_specialization_constant` argument. +DPC++ Headers section describes how symbolic IDs are generated and the same +trick is used within `set_specialization_constant` method: +``` +template +void set_specialization_constant( + typename std::remove_reference_t::type value) { + const char *SymbolicID = +#if __has_builint(__builtin_unique_stable_name) + __builtin_unique_stable_name(detail::specialization_id_name_generator); +#else + // without the builtin we can't get the symbolic ID of the constant + ""; +#endif + // remember the value of the specialization constant + SpecConstantValuesMap[SymbolicID] = value; +} +``` + +The major downside of that approach is that it can't be used with any +third-party host compiler, because it uses a specific built-in function to +generate symbolic IDs of specialization constants. Good solution would be to +employ integration header here, i.e. we could provide some class template +specializations which will return symbolic IDs - the same approach as we use +for communicating OpenCL kernel names from the compiler to the runtime. + +For the following user code: +``` +specalization_id id_int; +// ... + [=](kernel_handler h) { + h.get_specialization_constant(); + } +``` + +The following integration header would be produced: +``` +// fallback +template +class specialization_constant_info { + static const char *getName() { return ""; } +}; + +// forward declaration +extern specialization_id id_int; + +// specialization +template<> +class specialization_constant_info { + static const char *getName() { + return "result of __builtin_unique_stable_name(detail::specialization_id_name_generator) encoded here"; + } +}; +``` + +And it would be used by DPC++ RT in the following way: +``` +template +void set_specialization_constant( + typename std::remove_reference_t::type value) { + const char *SymbolicID = specialiation_constant_info::getName(); + // remember the value of the specialization constant + SpecConstantValuesMap[SymbolicID] = value; +} +``` + +Such trick would allow use to compile host part of the app with any third-party +compiler that supports C++17, but the problem here is that SYCL 2020 spec states +the following: + +> Specialization constants must be declared using the `specialization_id` class, +> and the declaration must be outside of kernel scope using static storage +> duration. The declaration must be in either namespace scope or class scope. + +`class` scope `static` variables are not forward-declarable, which means that +the approach with integration header is not available for us here. + +Before invoking JIT compilation of a program, the runtime "flushes" +specialization constants: it iterates through the value map and invokes + +``` +pi_result piextProgramSetSpecializationConstant(pi_program prog, + pi_uint32 spec_id, + size_t spec_size, + const void *spec_value); +``` + +Plugin Interface function for descriptor of each property: `spec_id` and +`spec_size` are taken from the descriptor, `spec_value` is calculated based on +address of the specialization constant provided by user and `offset` field of +the descriptor. + + +#### SPIRV-LLVM-Translator + +Given the `__spirv_SpecConstant` intrinsic calls produced by the +`SpecConstants` pass: +``` +; Function Attrs: alwaysinline +define dso_local spir_func i32 @get() local_unnamed_addr #0 { + ; args are "ID" and "default value": + %1 = tail call spir_func i32 @_Z20__spirv_SpecConstantii(i32 42, i32 0) + ret i32 %1 +} + +%struct.A = type { i32, float } + +; Function Attrs: alwaysinline +define dso_local spir_func void @get2(%struct.A* sret %ret.ptr) local_unnamed_addr #0 { + ; args are "ID" and "default value": + %1 = tail call spir_func i32 @_Z20__spirv_SpecConstantii(i32 43, i32 0) + %2 = tail call spir_func float @_Z20__spirv_SpecConstantif(i32 44, float 0.000000e+00) + %ret = tail call spir_func %struct.A @_Z29__spirv_SpecConstantCompositeif(%1, %2) + store %struct.A %ret, %struct.A* %ret.ptr + ret void +} +``` + +the translator will generate `OpSpecConstant` SPIR-V instructions with proper +`SpecId` decorations: + +``` + OpDecorate %i32 SpecId 42 ; ID + %i32 = OpSpecConstant %int 0 ; Default value + OpDecorate %i32 SpecId 43 ; ID of the 1st member + OpDecorate %float SpecId 44 ; ID of the 2nd member + %A.i32 = OpSpecConstant %int.type 0 ; 1st member with default value + %A.float = OpSpecConstant %float.type 0.0 ; 2nd member with default value + %struct = OpSpecConstantComposite %struct.type %A.i32 %A.float ; Composite doens't need IDs or default value + %1 = OpTypeFunction %int + + %get = OpFunction %int None %1 + %2 = OpLabel + OpReturnValue %i32 + OpFunctionEnd + %1 = OpTypeFunction %struct.type + + %get2 = OpFunction %struct.type None %struct + %2 = OpLabel + OpReturnValue %struct + OpFunctionEnd +``` + +### Emulation of specialization constants + +Emulation of specialization constants is performed by converting them into +kernel arguments. + +Overall idea is that DPC++ runtimes packs all specialization constants into a +single buffer, which is passed as an extra implicit kernel argument. Then the +compiler instead of lowering `__sycl_get*2020SpecConstantValue` intrinsics into +SPIR-V friendly IR replaces it with extracting an element from that buffer. + +"All" specialization constants here means complete list of specialization +constants encountered in an application or a shared library which is being +compiled: that list is computed by `sycl-post-link` tool and communicated to +the runtime through device image properties like it is described in "Support for +native specialization constants" section. + +#### DPC++ Headers + +The same DPC++ Headers are used for native and emulated specialization constants +and their design is decribed in the corresponding sub-section of "Support for +native specialization constants" section. + +However, that part of the document doesn't describe the third argument of +`__sycl_get*2020SpecConstantValue` intrinsics: it is a pointer to a runtime +buffer, which holds values of all specialization constants and should be used +to retrieve their values in device code. + +This pointer is stored within `kernel_handler` object and it is initialized only +if our target doesn't support native specialization constants. +Since `kernel_handler` object is not captured by SYCL kernel funtion, it means +that we are not able to employ some header-only solution here and need help of +the compiler. + +DPC++ FE searches for functions marked with `sycl_kernel` attribute to handle +them and turn into entry points of device code. + + +#### DPC++ FE + +When we compile code for target which doesn't support native specialization +constants, DPC++ FE should look for `kernel_handler` argument in functions +marked as `sycl_kernel`. If such argument is present, it means that this kernel +can access specialization constants and therefore we need to: +- generate one more kernel argument for passing a buffer with specialization + constants values. +- create `kernel_handler` object + + **TODO**: this item should be done for native specialization constants as + well, probably need to refactor the document to outline common parts into a + separate section. +- initialize that `kernel_handler` object with newly created kernel argument +- pass that `kernel_handler` object to user-provided SYCL kernel function + +So, having the following as the input: +``` +template +__attribute__((sycl_kernel)) void +kernel_single_task(const KernelType &KernelFunc, kernel_handler kh) { + KernelFunc(kh); +} +``` +DPC++ FE shoud tranform it into something like: + +``` +__kernel void KernelName(args_for_lambda_init, ..., char *specialization_constants_buffer) { + KernelType LocalLambdaClone = { args_for_lambda_init }; // We already do this + kernel_handler LocalKernelHandler; + LocalKernelHandler.__init_specialization_constants_buffer(specialization_constants_buffer); + // Re-used body of "sycl_kernel" function: + { + LocalLambdaClone(LocalKernelHandler); + } +} +``` + +Besides that transformation, DPC++ FE should also provide information about that +new kernel argument through integration header + +The new kernel argument `specialization_constants_buffer` should have +corresponding entry in the `kernel_signatures` structure in the integration +header. The param kind for this argument should be +`kernel_param_kind_t:specialization_constants_buffer`. + +Example: +``` + const kernel_param_desc_t kernel_signatures[] = { + //--- _ZTSN2cl4sycl6detail19__pf_kernel_wrapperIZZ4mainENK3$_0clERNS0_7handlerEE6init_aEE + { kernel_param_kind_t::kind_std_layout, 8, 0 }, + { kernel_param_kind_t::kind_accessor, 4062, 8 }, + { kernel_param_kind_t::kind_specialization_constants_buffer, /*parameter_size_in_bytes= */ 8, /*offset_in_lambda=*/0}, + }; + +``` + +Offset for this argument is zero since it has no any connected captured +variable. + +#### DPC++ Compiler: sycl-post-link tool + +When native specialization constants are not available, we need to lower +`__sycl_get*2020SpecializationConstant` intrinsic into some load from the +additional kernel argument, which points to a buffer with all specialization +constant values. + +We assume that both DPC++ compiler and runtime know the layout of that buffer so +the compiler can correctly access particular constants from it and the runtime +is able to properly fill the buffer with values of those specialization +constants. + +The layout is defined as follows: all specialization constants are sorted by +their numeric IDs (i.e. the order of they discovery by sycl-post-link tool) and +stored within a buffer one after each other without any paddings. So, the +specialization constant with ID `N` is located within a buffer at offset, which +is equal to sum of sizes of all specialization constants with ID less than `N`. + +For example, if we have the following specialization constants discovered in the +following order: +``` +struct custom_type { int a; double b; } +specialization_id id_double; +specialization_id id_custom; +specialization_id id_int; +``` +`id_double` will be located at the beginning of the buffer, because it is the +first discovered specialization constant (ID = 0). `id_custom` (ID = 1) will be +located at the offset 8, because we have a single specialization constant with +the ID < 1 and its size is 8 bytes. `id_int` (ID = 2) will be located at the +offset 20, which is computed as `sizeof(id_double) + sizeof(id_custom)`. + +When specialization constants emulation is requested, `sycl-post-link` replaces +calls to `__sycl_get*SpecializationConstant` intrinsics with the following +LLVM IR pattern: +``` +%gep = i8, i8* %arg_three_of_sycl_intrinsic_call, i64 [offset] +; We use the third argument of the __sycl_get*SpecializationConstant intrinsic +; as a pointer to where all specialization constants are stored +; [offset] here is a placeholder for some literal integer value computed by +; the pass based on the ID of the requested specialization constant as described +; above +%cast = bitcast i8* %gep to [return-type]* +; [return-type] here is a placeholder for the actual type of the requested +; specialization constant +%load = load [return-type], [return-type]* %cast +; %load is the resulting value, which should replace all uses of the original +; call to __sycl_get*SpecializationConstant intrinsic +``` + +**TODO**: elaborate on handling of composite types. + +##### Collecting spec constants info and communicating it to DPC++ RT + +As in the processing of native specialization constants, `sycl-post-link` emits +some information in device image properties, which is required by DPC++ runtime +to properly handle emulation of specialization constants. + +`sycl-post-link` provides two property sets when specializtion constants are +emulated: +1. Mapping from Symbolic ID to offset +2. Mapping from Symbolic ID to the default value + +The first mapping can be subsituted with the property set generated for native +specialization constants, but it is still provided in order to simplify the +runtime part, i.e. it allows to avoid calculating those offsets at runtime by +re-using ones calculated by the compiler. + +**TODO**: is it possible to have both native and emulated specialization +constants within a single device image? + +The second mapping is required and it allows the runtime to properly set default +values of specialization constants. + +**TODO**: document exact property set names and properties structure + +#### DPC++ Compiler: Generation of OpenCL kernel + +Optional `kernel_handler` SYCL kernel function argument should be created by +front-end and passed to SYCL kernel function if it is expected there. + +So, the following SYCL code +``` +specialization_id id_int; +class WithSpecConst; +class WithoutSpecConst; +// ... +/* ... */.single_task([=](kernel_handler h) { + auto v = h.get_specialization_constant(); + // ... +}); +/* ... */.single_task([=]() { + // ... +}); +``` + +Should produce something like this (pseudo-code): +``` +void WithSpecConstOpenCLKernel(/* ... */) { + kernel_handler h; + WithSpecConstSYCLKernelFunction(/* ... */, h); +} +void WithoutSpecConstOpenCLKernel(/* ... */) { + WithoutSpecConstSYCLKernelFunction(/* ... */); +} +``` From de4ba031110934696d555a2f91875585b61834a0 Mon Sep 17 00:00:00 2001 From: Vlad Romanov Date: Fri, 12 Mar 2021 18:09:26 +0300 Subject: [PATCH 02/24] intrinsics -> functions --- sycl/doc/SYCL2020-SpecializationConstants.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index f38afbcc4fe1e..da0892d8c4b54 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -146,7 +146,7 @@ used as symbolic IDs of specialization constants. are declared in the headers but never defined. Calls to them are recognized by a special LLVM pass later and this is aforementioned special markup required for the compiler. -Those intrinsics accept three parameters: +Those functions accept three parameters: 1. Symbolic ID of a specialization constant. Even though at SPIR-V level specialization constants are identified by numeric IDs, we can't use them here, because: From 20d58266dc9627fe48fac4384f90193129b0f652 Mon Sep 17 00:00:00 2001 From: Vlad Romanov Date: Mon, 15 Mar 2021 16:21:02 +0300 Subject: [PATCH 03/24] Clarify what FE should do in different cases --- sycl/doc/SYCL2020-SpecializationConstants.md | 27 ++++++++++---------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index da0892d8c4b54..0ce9150dd73cd 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -599,19 +599,22 @@ them and turn into entry points of device code. #### DPC++ FE -When we compile code for target which doesn't support native specialization -constants, DPC++ FE should look for `kernel_handler` argument in functions -marked as `sycl_kernel`. If such argument is present, it means that this kernel -can access specialization constants and therefore we need to: +DPC++ FE should look for `kernel_handler` argument in a function marked with +`sycl_kernel" attribute. If such argument is present, it means that this kernel +can access specialization constants and therefore FE needs to do the following: + +If native specialization constants are supported: +- create `kernel_handler` object +- use default constructor to initialize it +- pass that `kernel_handler` object to user-provided SYCL kernel function + +If native specialization constants are not supported: - generate one more kernel argument for passing a buffer with specialization - constants values. + constants values - create `kernel_handler` object - - **TODO**: this item should be done for native specialization constants as - well, probably need to refactor the document to outline common parts into a - separate section. - initialize that `kernel_handler` object with newly created kernel argument - pass that `kernel_handler` object to user-provided SYCL kernel function +- Provide information about new kernel argument through the integration header So, having the following as the input: ``` @@ -621,7 +624,8 @@ kernel_single_task(const KernelType &KernelFunc, kernel_handler kh) { KernelFunc(kh); } ``` -DPC++ FE shoud tranform it into something like: +For the target which has native support for specialization constatns DPC++ FE +shoud tranform it into something like: ``` __kernel void KernelName(args_for_lambda_init, ..., char *specialization_constants_buffer) { @@ -635,9 +639,6 @@ __kernel void KernelName(args_for_lambda_init, ..., char *specialization_constan } ``` -Besides that transformation, DPC++ FE should also provide information about that -new kernel argument through integration header - The new kernel argument `specialization_constants_buffer` should have corresponding entry in the `kernel_signatures` structure in the integration header. The param kind for this argument should be From 6e1a3ee76a3a1f15f2677596c815a2bfc410bae1 Mon Sep 17 00:00:00 2001 From: Vlad Romanov Date: Tue, 16 Mar 2021 15:20:32 +0300 Subject: [PATCH 04/24] fixed wording --- sycl/doc/SYCL2020-SpecializationConstants.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 0ce9150dd73cd..6d1288dc0cc79 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -624,8 +624,8 @@ kernel_single_task(const KernelType &KernelFunc, kernel_handler kh) { KernelFunc(kh); } ``` -For the target which has native support for specialization constatns DPC++ FE -shoud tranform it into something like: +For the target which doesn't have native support for specialization constatns +DPC++ FE shoud tranform it into something like: ``` __kernel void KernelName(args_for_lambda_init, ..., char *specialization_constants_buffer) { @@ -639,7 +639,7 @@ __kernel void KernelName(args_for_lambda_init, ..., char *specialization_constan } ``` -The new kernel argument `specialization_constants_buffer` should have +Also the new kernel argument `specialization_constants_buffer` should have corresponding entry in the `kernel_signatures` structure in the integration header. The param kind for this argument should be `kernel_param_kind_t:specialization_constants_buffer`. From aabd68469f02a1a993181938a054d9f1d3b44186 Mon Sep 17 00:00:00 2001 From: Vlad Romanov Date: Thu, 18 Mar 2021 16:52:18 +0300 Subject: [PATCH 05/24] apply comments --- sycl/doc/SYCL2020-SpecializationConstants.md | 39 +++++++++++--------- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 6d1288dc0cc79..ab7356c107783 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -13,7 +13,7 @@ TODO: feature overview? code example? [SYCL 2020][sycl-2020-spec] defines specialization constant as: -> A constant variable where the value is not known until compilation of the +> A constant variable where the value is not known until invocation of the > SYCL kernel function. > > Glossary @@ -45,15 +45,15 @@ constants, which should be reflected in generated SPIR-V. This part is especially tricky, because this happens in host part of the SYCL program, which means that without special handling it won't even be visible to device compiler. -3. We need to ensure that DPC++ RT properly set specialization constants used in -the program: SYCL uses non-type template parameters to identify specialization -constants in the program, while at SPIR-V and OpenCL level, each specialization -constant is defined by its numerical ID, which means that we need to maintain -some mapping from SYCL identifiers to a numeric identifiers to be able to set -specialization constats. Moreover, at SPIR-V level composite specialization -constants do not have separate ID and can only be set by setting value to each -member of a composite, which means that we have 1:n mapping between SYCL -identifiers and numeric IDs of specialization constants. +3. We need to ensure that DPC++ RT properly sets specialization constants used +in the program: SYCL uses non-type template parameters to identify +specialization constants in the program, while at SPIR-V and OpenCL level, each +specialization constant is defined by its numerical ID, which means that we +need to maintain some mapping from SYCL identifiers to a numeric identifiers to +be able to set specialization constants. Moreover, at SPIR-V level composite +specialization constants do not have separate ID and can only be set by setting +value to each member of a composite, which means that we have 1:n mapping +between SYCL identifiers and numeric IDs of specialization constants. 4. When AOT compilation is used or target is a CUDA device (where NVPTX intermediate representation is used), we need to somehow emulate support for @@ -184,10 +184,13 @@ different translation units, so it happens in `sycl-post-link` tool. There is a `SpecConstantsPass` LLVM IR pass which: 1. Assigns numeric IDs to specialization constants found in the linked module. -2. Brings IR to the form expected by the SPIR-V translator. +2. Brings IR to the form expected by the SPIR-V translator (format of the + expected IR is covered in "Transformation of LLVM IR to SPIR-V friendly IR + form" section) 3. Collects and provides \ =\> \ mapping, which is later being used by DPC++ RT to set specialization constant - values provided by user. + values provided by user(section "Collecting spec constants info and + communicating it to DPC++ RT" provides more info on that) ##### Assignment of numeric IDs to specialization constants @@ -195,10 +198,10 @@ This task is achieved by maintaining a map, which holds a list of numeric IDs for each encountered symbolic ID of a specialization constant. Those IDs are used to identify the specialization constants at SPIR-V level. -As noted above one symbolic ID can several numeric IDs assigned to it - such 1:N -mapping comes from the fact that at SPIR-V level, composite specialization -constants don't have dedicated IDs and they are being identified and specialized -through their scalar leafs and corresponding numeric IDs. +As noted above one symbolic ID can have several numeric IDs assigned to it - +such 1:N mapping comes from the fact that at SPIR-V level, composite +specialization constants don't have dedicated IDs and they are being identified +and specialized through their scalar leafs and corresponding numeric IDs. For example, the following code: ``` @@ -716,12 +719,12 @@ As in the processing of native specialization constants, `sycl-post-link` emits some information in device image properties, which is required by DPC++ runtime to properly handle emulation of specialization constants. -`sycl-post-link` provides two property sets when specializtion constants are +`sycl-post-link` provides two property sets when specialization constants are emulated: 1. Mapping from Symbolic ID to offset 2. Mapping from Symbolic ID to the default value -The first mapping can be subsituted with the property set generated for native +The first mapping can be substituted with the property set generated for native specialization constants, but it is still provided in order to simplify the runtime part, i.e. it allows to avoid calculating those offsets at runtime by re-using ones calculated by the compiler. From a0832d5d83ba1411522a02dc5c8cc5b95aca1fa1 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 26 Mar 2021 10:23:17 +0300 Subject: [PATCH 06/24] Document new mapping mechanism Added overview of new mapping design, detailed description for each component TBD. --- sycl/doc/SYCL2020-SpecializationConstants.md | 221 ++++++++++++------- 1 file changed, 144 insertions(+), 77 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index ab7356c107783..b98cc6cf1d013 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -5,26 +5,26 @@ by SYCL 2020 specification: [SYCL registry][sycl-registry], [direct link to the specification][sycl-2020-spec]. [sycl-registry]: https://www.khronos.org/registry/SYCL/ -[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf +[sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html TODO: feature overview? code example? -## Design +## Design objectives -[SYCL 2020][sycl-2020-spec] defines specialization constant as: +SYCL 2020 [defines specialization constant][sycl-2020-spec-constant-glossary] +as: -> A constant variable where the value is not known until invocation of the +> A constant variable where the value is not known until compilation of the > SYCL kernel function. > -> Glossary - -Therefore, implementation is based on [SPIR-V speficiation][spirv-spec] support -for [Specialization][spirv-specialization]. +> [Glossary][sycl-2020-glossary] -[spirv-spec]: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html -[spirv-specialization]: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#SpecializationSection +[sycl-2020-spec-constant-glossary]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#specialization-constant +[sycl-2020-glossary]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#glossary -However, the specification also states the following: +And implementation is based on [SPIR-V speficiation][spirv-spec] support +for [Specialization][spirv-specialization]. However, the specification also +states the following: > It is expected that many implementations will use an intermediate language > representation ... such as SPIR-V, and the intermediate language will have @@ -32,70 +32,149 @@ However, the specification also states the following: > not have such native support must still support specialization constants in > some other way. > -> Section 4.11.12.2. Specialization constant support +> [Section 4.11.12.2. Specialization constant support][sycl-2020-4-11-12-2] + +[spirv-spec]: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html +[spirv-specialization]: https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#SpecializationSection +[sycl-2020-4-11-12-2]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constant_support Having that said, the following should be implemented: 1. We need to ensure that in generated SPIR-V, calls to `get_specialization_constant` are replaced with corresponding instructions for -referencing specialization constants. +referencing SPIR-V specialization constants. 2. SYCL provides a mechanism to specify default values of specialization -constants, which should be reflected in generated SPIR-V. This part is +constants, which should be reflected in the generated SPIR-V. This part is especially tricky, because this happens in host part of the SYCL program, which -means that without special handling it won't even be visible to device compiler. +means that without special handling it won't even be visible to the device +compiler. 3. We need to ensure that DPC++ RT properly sets specialization constants used -in the program: SYCL uses non-type template parameters to identify -specialization constants in the program, while at SPIR-V and OpenCL level, each +in the program: SYCL spec uses non-type template parameters to identify +specialization constants in the program, while at SPIR-V and OpenCL levels, each specialization constant is defined by its numerical ID, which means that we -need to maintain some mapping from SYCL identifiers to a numeric identifiers to -be able to set specialization constants. Moreover, at SPIR-V level composite -specialization constants do not have separate ID and can only be set by setting -value to each member of a composite, which means that we have 1:n mapping -between SYCL identifiers and numeric IDs of specialization constants. +need to maintain some mapping from SYCL identifiers to a numeric identifiers in +order to be able to set specialization constants. Moreover, at SPIR-V level +composite specialization constants do not have separate ID and can only be set +by setting value to each member of a composite, which means that we have `1:n` +mapping between SYCL identifiers and numeric IDs of specialization constants. 4. When AOT compilation is used or target is a CUDA device (where NVPTX intermediate representation is used), we need to somehow emulate support for specialization constants. -The following sections describe how each item is implemented and which -components are responsible for what. The rest of design document is split info -two parts: -- Support for native specialization constants: items (1), (2) and (3) -- Emulation of specialization constants: item (4) +## Design -Note: emulation part re-uses a lot of things described in native support -section, so if you want to get familiar with emulation in all details, it is -recommended to read native support section first. +As stated above, native specialization constants support is based on +corresponding SPIR-V functionality, while emulation is supposed to be +implemented through transforming specialization constants into kernel arguments. + +In DPC++ Headers/DPC++ RT we don't know a lot of necessary information about +specialization constants, like: which numeric ID is used for particular +specialization constant (since we support `SYCL_EXTERNAL`, those IDs can only +be allocated by the compiler during link stage) or which kernel argument is used +to pass particular specialization constant (because they are not explicitly +captured by SYCL kernel functions and regular mechanism for kernel arguments +handling can't be used here). + +Therefore, we can't have headers-only implementation and the crucial part of +design is how to organize mapping mechanism between SYCL identifiers for +specialization constants (`specialization_id`s) and low-level identifiers +(numeric IDs in SPIR-V or kernel arguments). + +That mapping mechanism is particularly tricky, because of some additional +complexity coming from SYCL 2020 specification: +- `specialization_id` variables, which are used as specialization constant + identifiers (being non-type template parameters of some methods) can't be + forward-declared in general case (for example, if defined as `static`), which + means that we can't use integration header to attach some information to them + through some C++ templates tricks (like it is done for kernel arguments or + kernel names, for example). +- they also can be declared as `static` or just non-`inline` `constexpr`, which + means that they have internal linkage and can't be referenced from other + translation units, which means that we can't for example create a new + translation unit which contains some mapping from `specialization_id` address + to some desired info. + +Based on those limitations, the following design is proposed: +- DPC++ RT uses special function: + ``` + namespace detail { + template + const char *get_spec_constant_symbolic_ID(); + } + ``` + Which is only declared, but not defined in there and used to retrieve required + information like numeric ID of a specialization constant. +- Definition of that function template are provided by DPC++ FE in form of + _integration footer_: the compiler generates a piece of C++ code which is + injected at the end of the translation unit: + ``` + namespace detail { + // assuming user defined and used the following specialization_id: + // constexpr specialiation_id int_const; + // class Wrapper { + // public: + // static constexpr specialization_id float_const; + // }; + + template<> + const char *get_spec_constant_symbolic_ID() { + return "unique_name_for_int_const"; + } + template<> + const char *get_spec_constant_symbolic_ID() { + return "unique_name_for_Wrapper_float_const"; + } + } + ``` -### Support for native specialization constants + Those symbolic IDs are used to identify device image properties corresponding + to those specialization constants, which store additional information (like + numeric SPIR-V ID of a constant) needed for DPC++ RT. +- That integration footer is automatically embedded by the compiler at the end + of user-provided translation unit by driver. -#### DPC++ Headers +Summarizing, overall design looks like: -DPC++ Headers provide required definitions of `specialization_id` and -`kernel_handler` classes as well as of many other classes and methods. +DPC++ Headers provide special markup, which used by the compiler to detect +presence of specialization constants and properly handle them. -`kernel_handler::get_specialization_constant` method, which provides an access -to specialization constants within device code performs the following tasks: -- It provides a mapping from non-type template parameter, which is used as a - specialization constant identifier in SYCL/DPC++ source file to a symbolic ID - of the constant, which is used by the compiler. -- It provides a special markup, which allows the compiler to detect - specialization constants in the device code and properly handle them. +DPC++ FE handles `kernel_handler` SYCL kernel function argument, creates +additional kernel arguments to pass specialization constants through buffer if +necessary (if native support is not available) and generates integration footer. +`sycl-post-link` transforms device code to either generate proper SPIR-V with +specialization constants (when native support is available) or to generate +correct access to corresponding kernel arguments (which are used when native +support is not available); also the tool generates some device image properties +with all information needed for DPC++ RT (like which numeric SPIR-V ID was +assigned to which symbolic ID). -``` -namespace sycl { +With help of `clang-offload-wrapper` tool, those device image properties are +embedded into the application together with device code and used by DPC++ RT +while handling specialization constants during application execution: it either +calls corresponding PI API to set a value of corresponding specialization +constant or it fills a special buffer with values of specialization constants +and passes it as kernel argument to emulate support of specialization constants. -namespace detail { +Sections below describe each component in details. -template -struct specialization_id_name_generator {}; +### DPC++ Headers + +DPC++ Headers provide required definitions of `specialization_id` and +`kernel_handler` classes as well as of many other classes and methods. -} // namespace detail +`kernel_handler::get_specialization_constant` method, which provides an access +to specialization constants within device code implements an interface between +DPC++ Headers and the compiler (`sycl-post-link` tool): it contains a special +markup, which allows the compiler to detect specialization constants in the +device code and properly handle them. -// It is possible that `DefaultValue` will be marked as `const` +``` +namespace sycl { +// TODO: Add `const` to `DefaultValue` and `RTBuffer`? template T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, void *RTBuffer); template @@ -103,10 +182,10 @@ T __sycl_getComposite2020SpecConstantValue(const char *SymbolicID, void *Defa class kernel_handler { public: - template - typename std::remove_reference_t::type get_specialization_constant() { + template + typename std::remove_reference_t::type get_specialization_constant() { #ifdef __SYCL_DEVICE_ONLY__ - return get_on_device(); + return get_on_device(); #else // some fallback implementation in case this code is launched on host #endif __SYCL_DEVICE_ONLY__ @@ -114,18 +193,18 @@ public: private: #ifdef __SYCL_DEVICE_ONLY__ - template::type> + template::type> // enable_if T is a scalar type T get_on_device() { - const char *SymbolicID = __builtin_unique_stable_name(detail::specialization_id_name_generator); - return __sycl_getScalar2020SpecConstantValue(SymbolicID, &S, Ptr); + auto ID = __builtin_unqiue_ID(SpecName); + return __sycl_getScalar2020SpecConstantValue(ID, &S, Ptr); } - template::type> + template::type> // enable_if T is a composite type T get_on_device() { - const char *SymbolicID = __builtin_unique_stable_name(detail::specialization_id_name_generator); - return __sycl_getComposite2020SpecConstantValue(SymbolicID, &S, Ptr); + auto ID = __builtin_unqiue_ID(SpecName); + return __sycl_getComposite2020SpecConstantValue(ID, &S, Ptr); } #endif // __SYCL_DEVICE_ONLY__ @@ -135,30 +214,18 @@ private: } // namespace sycl ``` -Here [`__builtin_unique_stable_name`][builtin-unique-stable-name] -is a compiler built-in used to translate types to unique strings, which are -used as symbolic IDs of specialization constants. - -[builtin-unique-stable-name]: https://github.com/intel/llvm/blob/sycl/clang/docs/LanguageExtensions.rst#__builtin_unique_stable_name +Here `__builtin_unique_ID` is a new compiler built-in which is supposed to +generate unique symbolic IDs for specialization constants. `__sycl_getScalar2020SpecConstantValue` and `__sycl_getComposite2020SpecConstant` are functions with special names - they are declared in the headers but never defined. Calls to them are recognized by a special LLVM pass later and this is aforementioned special markup required for the compiler. + Those functions accept three parameters: -1. Symbolic ID of a specialization constant. Even though at SPIR-V level - specialization constants are identified by numeric IDs, we can't use them - here, because: - - Those IDs can't be generated by runtime, because they need to be encoded - into resulting SPIR-V device image - - Those IDs can't be generated by front-end compiler, because it only sees a - single translation unit at a time and therefore it can't assign unique IDs - to specialization constants from different translation units. - - Therefore, the decision was made to use symbolic IDs as interface between the - compiler and runtime to connect SYCL identifiers of specialization constants - with SPIR-V identifiers of specialization constants. +1. Symbolic ID of specialization constant. + TODO: do we need more details here? 2. Default value of the specialization constant. It is expected that at LLVM IR level the argument will contain a pointer to @@ -176,7 +243,7 @@ binary, each linked device code LLVM IR module undergoes processing by `sycl-post-link` tool which can run LLVM IR passes before passing the module onto the SPIR-V translator. -#### DPC++ Compiler: sycl-post-link tool +### DPC++ Compiler: sycl-post-link tool As it is stated above, the only place where we can properly handle specialization constants is somewhere during or after linking device code from @@ -189,10 +256,10 @@ There is a `SpecConstantsPass` LLVM IR pass which: form" section) 3. Collects and provides \ =\> \ mapping, which is later being used by DPC++ RT to set specialization constant - values provided by user(section "Collecting spec constants info and + values provided by user (section "Collecting spec constants info and communicating it to DPC++ RT" provides more info on that) -##### Assignment of numeric IDs to specialization constants +#### Assignment of numeric IDs to specialization constants This task is achieved by maintaining a map, which holds a list of numeric IDs for each encountered symbolic ID of a specialization constant. Those IDs are From c252e6f80b84b22836501aae6c35229c9dde6289 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 31 Mar 2021 21:14:54 +0300 Subject: [PATCH 07/24] Update the rest of the document according to the new design This is still WIP, see TODOs in the document --- sycl/doc/SYCL2020-SpecializationConstants.md | 587 ++++++++++--------- 1 file changed, 296 insertions(+), 291 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index b98cc6cf1d013..87a63a3bf64a1 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -81,7 +81,7 @@ handling can't be used here). Therefore, we can't have headers-only implementation and the crucial part of design is how to organize mapping mechanism between SYCL identifiers for specialization constants (`specialization_id`s) and low-level identifiers -(numeric IDs in SPIR-V or kernel arguments). +(numeric IDs in SPIR-V or information about corresponding kernel arguments). That mapping mechanism is particularly tricky, because of some additional complexity coming from SYCL 2020 specification: @@ -89,15 +89,15 @@ complexity coming from SYCL 2020 specification: identifiers (being non-type template parameters of some methods) can't be forward-declared in general case (for example, if defined as `static`), which means that we can't use integration header to attach some information to them - through some C++ templates tricks (like it is done for kernel arguments or - kernel names, for example). + through some C++ templates tricks (like it is done for regular kernel + arguments or kernel names, for example). - they also can be declared as `static` or just non-`inline` `constexpr`, which means that they have internal linkage and can't be referenced from other translation units, which means that we can't for example create a new translation unit which contains some mapping from `specialization_id` address to some desired info. -Based on those limitations, the following design is proposed: +Based on those limitations, the following mapping design is proposed: - DPC++ RT uses special function: ``` namespace detail { @@ -109,7 +109,7 @@ Based on those limitations, the following design is proposed: information like numeric ID of a specialization constant. - Definition of that function template are provided by DPC++ FE in form of _integration footer_: the compiler generates a piece of C++ code which is - injected at the end of the translation unit: + injected at the end of a translation unit: ``` namespace detail { // assuming user defined and used the following specialization_id: @@ -133,33 +133,44 @@ Based on those limitations, the following design is proposed: Those symbolic IDs are used to identify device image properties corresponding to those specialization constants, which store additional information (like numeric SPIR-V ID of a constant) needed for DPC++ RT. -- That integration footer is automatically embedded by the compiler at the end +- That integration footer is automatically appended by the compiler at the end of user-provided translation unit by driver. +Another significant part of the design is how specialization constants support +is emulated: as briefly mentioned before, the general approach is to transform +specialization constants into kernel arguments. In fact all specialization +constants used within a program a bundler together and stored into a single +buffer, which is passed as implicit kernel argument. The layout of that buffer +is well-defined and known to both the compiler and the runtime, so when user +sets the value of a specialization constant, that value is being copied into +particular place within that buffer and once the constant is requested in +device code, the compiler generates a load from the same place of the buffer. + Summarizing, overall design looks like: DPC++ Headers provide special markup, which used by the compiler to detect presence of specialization constants and properly handle them. DPC++ FE handles `kernel_handler` SYCL kernel function argument, creates -additional kernel arguments to pass specialization constants through buffer if -necessary (if native support is not available) and generates integration footer. +additional kernel argument to pass specialization constants through buffer if +necessary (i.e. if native support is not available) and generates integration +footer. -`sycl-post-link` transforms device code to either generate proper SPIR-V with -specialization constants (when native support is available) or to generate -correct access to corresponding kernel arguments (which are used when native -support is not available); also the tool generates some device image properties -with all information needed for DPC++ RT (like which numeric SPIR-V ID was -assigned to which symbolic ID). +`sycl-post-link` transforms device code to either generate proper SPIR-V +Friendly IR with specialization constants (when native support is available) or +to generate correct access to corresponding kernel argument (which are used +when native support is not available); also the tool generates some device image +properties with all information needed for DPC++ RT (like which numeric SPIR-V +ID was assigned to which symbolic ID). With help of `clang-offload-wrapper` tool, those device image properties are embedded into the application together with device code and used by DPC++ RT while handling specialization constants during application execution: it either -calls corresponding PI API to set a value of corresponding specialization -constant or it fills a special buffer with values of specialization constants -and passes it as kernel argument to emulate support of specialization constants. +calls corresponding PI API to set a value of a specialization constant or it +fills a special buffer with values of specialization constants and passes it as +kernel argument to emulate support of specialization constants. -Sections below describe each component in details. +Sections below describe each component in more details. ### DPC++ Headers @@ -251,15 +262,17 @@ different translation units, so it happens in `sycl-post-link` tool. There is a `SpecConstantsPass` LLVM IR pass which: 1. Assigns numeric IDs to specialization constants found in the linked module. -2. Brings IR to the form expected by the SPIR-V translator (format of the - expected IR is covered in "Transformation of LLVM IR to SPIR-V friendly IR - form" section) +2. Transforms IR to either: + a. The form expected by the SPIR-V translator (format of the + expected IR is covered in "Transformation of LLVM IR to SPIR-V friendly IR + form" section). + b. The form which is used for emulating specialization constants. 3. Collects and provides \ =\> \ mapping, which is later being used by DPC++ RT to set specialization constant values provided by user (section "Collecting spec constants info and communicating it to DPC++ RT" provides more info on that) -#### Assignment of numeric IDs to specialization constants +#### 1. Assignment of numeric IDs to specialization constants This task is achieved by maintaining a map, which holds a list of numeric IDs for each encountered symbolic ID of a specialization constant. Those IDs are @@ -306,7 +319,7 @@ contains another composite within it, that nested composite is also being specialization constants. This done by depth-first search through the composite elements. -##### Transformation of LLVM IR to SPIR-V friendly IR form +#### 2.a Transformation of LLVM IR to SPIR-V friendly IR form SPIR-V friendly IR form is a special representation of LLVM IR, where some function are named in particular way in order to be recognizable by the SPIR-V @@ -373,7 +386,75 @@ LLVM IR generated by `SpecConstantsPass`: %gold = call %struct.POD __spirv_SpecConstantComposite([2 x %struct.A] %gold_POD_A, <2 x i32> %gold_POD_b) ``` -##### Collecting spec constants info and communicating it to DPC++ RT +#### 2.b Transformation of LLVM IR for emulating specialization constants + +In case we are not targeting SPIR-V, we don't have a native support for +specialization constants and have to emulate them somehow. As stated above, it +is done by converting specialization constants into kernel arguments: they all +bundled together and put into a single buffer. + +`SpecConstatnsPass` should generate proper accesses to that buffer when +specialization constants are used: this is done by replacing special +`__sycl_getScalar2020SpecConstantValue` and +`__sycl_getComposite2020SpecConstantValue` functions with accesses to their +third argument, which contains a pointer to the buffer with values of all +specialization constants. That access looks like a sequence of the following +LLVM IR instruction `getelementptr` from the buffer pointer by calculated, +offset, then `bitcast` to pointer to proper return type (because the buffer +pointer is just an "untyped" `i8 *`) and `load`. An example of that LLVM IR: +``` +; an example for: +; specialization_id id_double; +; [=](kernel_handler h) { +; h.get_specialization_constant(); + +; __sycl_getScalar2020SpecConstantValue(@SymbolicID, %DefaultValue, i8 *%RTBuffer) +; is being replaced with + +%gep = getelementptr i8, i8* %RTBuffer, i32 [calculated-offset-for-@SymbolicID] +%bitcast = bitcase i8* %gep to double* +%load = load double, double* %bitcast + +; %load is the resulting value, which is used further instead of a result of +; call to __sycl_getScalar2020SpecConstantValue +``` + +The layout of that buffer is defined as follows: all specialization constants +are placed there one after another in ascending order of their numeric IDs +assigned to them by `SpecConstantPass` previously. + +For example, the following code: +``` +struct Nested { + float a, b; +}; +struct A { + int x; + Nested n; +}; + +specialization_id id_int; +specialization_id id_A; +specialization_id id_Nested; +// ... + [=](kernel_handler h) { + h.get_specialization_constant(); + h.get_specialization_constant(); + h.get_specialization_constant(); + } +``` + +Will result in the following buffer layout, i.e. offsets of each specialization +constant in that buffer: +``` +[ + 0, // for id_int, the first constant is at the beginning of the buffer + 4, // sizeof(int) == 4, the second constant is located right after the fisrt one + 16, // sizeof(int) + sizezof(A) == 4, the same approach for the third constant +] +``` + +#### 3. Collecting spec constants info and communicating it to DPC++ RT For each encountered specialization constants `sycl-post-link` emits a property, which encodes information required by DPC++ RT to set the value of a @@ -475,97 +556,217 @@ property_set { } ``` -#### DPC++ runtime +The property set described above is mainly intended to be used when native +specialization constants are available, but it will be also used for emulation +of specialization constants: SPIR-V IDs and sizes of specialization constants +will be used to calculate offset of each specialization constant within a +buffer, which is used to propagate them to kernel through kernel arguments. -For each device binary compiler generates a map -\ =\> \ ("ID map"). DPC++ -runtime imports that map when loading device binaries. -It also maintains another map \ =\> \ -("value map") per `sycl::kernel_bundle` object. The value map is updated upon -`kernel_bundler::set_specialization_constant(val)` and -`handler::set_specialization_constant(val)` calls from the app. +Additionally, another property set will be generated to support emulated +specialization constants, which will contain a single property with default +values of all specialization constants in the same form as they will be +propagated from host to device through kernel arguments, i.e. this property will +simply contain a blob, which can be used by DPC++ RT to either pre-initialize +the whole buffer for specialization constants with their default value or to +extract default value of a particular specialization constant out of it. -In order for runtime to access the right property, it need to compute the -symbolic ID of a specialization constant based on user-provided inputs, such -as non-type template argument passed to `set_specialization_constant` argument. -DPC++ Headers section describes how symbolic IDs are generated and the same -trick is used within `set_specialization_constant` method: +For example, the following code: ``` -template -void set_specialization_constant( - typename std::remove_reference_t::type value) { - const char *SymbolicID = -#if __has_builint(__builtin_unique_stable_name) - __builtin_unique_stable_name(detail::specialization_id_name_generator); -#else - // without the builtin we can't get the symbolic ID of the constant - ""; -#endif - // remember the value of the specialization constant - SpecConstantValuesMap[SymbolicID] = value; +struct Nested { + constexpr Nested(float a, float b) : a(a), b(b) {} + float a, b; +}; +struct A { + constexpr A(int x, float a, b) : x(x), n(a, b) {} + int x; + Nested n; +}; + +specialization_id id_int(42); +specialization_id id_A(1, 2.0, 3.0); +specialization_id id_Nested(4.0, 5.0); +// ... + [=](kernel_handler h) { + h.get_specialization_constant(); + h.get_specialization_constant(); + h.get_specialization_constant(); + } +``` + +The following property set will be generated: +``` +property_set { + Name = "SYCL/specialization constants default values", + properties: [ + property { + Name: "all", + ValAddr: points to byte array [ + 42, // id_int + 1, 2.0, 3.0, // id_A + 4.0, 5.0 // id_Nested + ], + Type: PI_PROPERTY_TYPE_BYTE_ARRAY, + Size: sizeof(byte array above) + } + ] } ``` -The major downside of that approach is that it can't be used with any -third-party host compiler, because it uses a specific built-in function to -generate symbolic IDs of specialization constants. Good solution would be to -employ integration header here, i.e. we could provide some class template -specializations which will return symbolic IDs - the same approach as we use -for communicating OpenCL kernel names from the compiler to the runtime. +### DPC++ Compiler: front-end -For the following user code: +DPC++ FE is responsible for several things related to specialization constants: +1. Handling of `kernel_handler` SYCL kernel function argument. +2. Communicating to DPC++ RT which kernel argument should be used for passing + buffer with specialization constants values when they are emulated. +3. Communicating to DPC++ RT mapping between `specialization_id`s and + corresponding symbolic IDs through integration footer. + +`kernel_handler` is defined by SYCL 2020 specification as interface for +retrieving specialization constant values in SYCL kernel functions, but it +actually used only in emulation mode: since native specialization constant are +directly lowered into corresponding SPIR-V instructions, no additional handling +is needed. However, in order to get a value of a specialization constant which +was passed through a buffer, we need to have a pointer to that buffer: as it is +shown in DPC++ Headers section of the document, pointer to that buffer is stored +within `kernel_handler` object and passed to `__sycl_get*2020SpecConstantValue` +function. + +According to the [compiler design][compiler-and-runtime-design], DPC++ FE wraps +SYCL kernel functions into OpenCL kernels and when `kernel_handler` object is +passed as an argument to SYCL kernel function, DPC++ FE should re-create that +object within the wrapper function and initialize it from implicitly created +OpenCL kernel argument, if necessary - if we aim for emulating specialization +constants, which is the case when we do not target SPIR-V (happens in AOT and +for CUDA). + +[compiler-and-runtime-design]: https://github.com/intel/llvm/blob/sycl/sycl/doc/CompilerAndRuntimeDesign.md#lowering-of-lambda-function-objects-and-named-function-objects + +Considering the following input to DPC++ FE: ``` -specalization_id id_int; -// ... - [=](kernel_handler h) { - h.get_specialization_constant(); +template + __attribute__((sycl_kernel)) void + kernel_single_task(const KernelType &KernelFunc, kernel_handler kh) { + KernelFunc(kh); } ``` -The following integration header would be produced: +It should be transformed into something like this: ``` -// fallback -template -class specialization_constant_info { - static const char *getName() { return ""; } +__kernel void oclKernel(args_for_lambda_init, ..., specialization_constants_buffer) { + KernelType LocalLambdaClone = { args_for_lambda_init }; // We already do this + kernel_handler LocalKernelHandler; + LocalKernelHandler.__init_specialization_constants_buffer(specialization_constants_buffer); + // for simplicity we could have just used + // kernel_handler LocalKernelHandler = { args_for_kernel_handler_init }; + // here, but we assume that kernel_handler might be used for more than just + // accessing specialization constants and therefore there could be other + // initialization parameters which also could be conditional + // Even now we don't need to always initialize the kernel_handler object + // Re-used body of "sycl_kernel" function: + { + LocalLambdaClone(LocalKernelHandler); + } +} +``` + +As mentioned above, creation of `specialization_constants_buffer` kernel +argument and initialization of `LocalKernelHandler` object with it only happens +if we are not targeting SPIR-V, i.e. when we compile code for a target without +native support for specialization constants. + +If that new argument was added, it is communicated to DPC++ through regular +integration header mechanism, i.e. it is added as new entry to +`kernel_signatures` structure there with parameter kind set to a new +enumeration value `kernel_param_kind_t::kind_specialization_constants_buffer`. + +Those were descriptions of tasks (1) and (2) of DPC++ FE. Task (3) is to help +DPC++ RT to connect user-provided `specialization_id` variable with +corresponding symbolic ID of a specialization constant when +`handler::set_specialization_constant` is invoked. + +As noted above, we can't use regular integration header here, because in general +case, `specialization_id` variables can't be forward-declared. Therefore, we are +using integration footer approach, which for the following code snippet: +``` +struct A { + float a, b; }; -// forward declaration -extern specialization_id id_int; +specialization_id id_int; +specialization_id id_A; +// ... +[&](handler &cgh) { + cgh.set_specialization_constant(42); + cgh.set_specialization_constant({3.14, 3.14}); +} +``` + +Will look like: -// specialization +``` +namespace detail { +// generic declaration +template +struct get_symbolic_id_helper{}; + +// specializations for each specialization constant: +// we can refer to all those specialization_id variables, because integration +// footer was _appended_ to the user-provided translation unit template<> -class specialization_constant_info { - static const char *getName() { - return "result of __builtin_unique_stable_name(detail::specialization_id_name_generator) encoded here"; +struct get_symbolic_id_helper { + static const char *get_symbolic_id() { + return "result of __builtin_unique_ID(id_int) encoded here"; } }; + +template<> +struct get_symbolic_id_helper { + static const char *get_symbolic_id() { + return "result of __builtin_unique_ID(A) encoded here"; + } +}; + +} // namespace detail + +// TODO: elaborate why we have to include handler implementation here +#include ``` -And it would be used by DPC++ RT in the following way: +NOTE: By direct using `__builtin_unique_ID` in DPC++ Headers we could avoid +generating integration footer at all, but since the host part of the program can +be compiled with a third-party C++ 17-compatible compiler, which is unaware of +the clang-specific built-ins, it can result in build errors. + +### DPC++ runtime + +For each device binary compiler generates a map +\ =\> \ ("ID map"). DPC++ +runtime imports that map when loading device binaries. +It also maintains another map \ =\> \ +("value map") per `sycl::kernel_bundle` object. The value map is updated upon +`kernel_bundler::set_specialization_constant(val)` and +`handler::set_specialization_constant(val)` calls from the app. + +In order for runtime to access the right property, it need to compute the +symbolic ID of a specialization constant based on user-provided inputs, such +as non-type template argument passed to `set_specialization_constant` argument. +DPC++ Headers section describes how symbolic IDs are generated and the same +trick is used within `set_specialization_constant` method: ``` template void set_specialization_constant( typename std::remove_reference_t::type value) { - const char *SymbolicID = specialiation_constant_info::getName(); + const char *SymbolicID = detail::get_symbolic_id_helper::get_symbolic_id(); // remember the value of the specialization constant SpecConstantValuesMap[SymbolicID] = value; } ``` -Such trick would allow use to compile host part of the app with any third-party -compiler that supports C++17, but the problem here is that SYCL 2020 spec states -the following: - -> Specialization constants must be declared using the `specialization_id` class, -> and the declaration must be outside of kernel scope using static storage -> duration. The declaration must be in either namespace scope or class scope. - -`class` scope `static` variables are not forward-declarable, which means that -the approach with integration header is not available for us here. - Before invoking JIT compilation of a program, the runtime "flushes" -specialization constants: it iterates through the value map and invokes +specialization constants: + +If native specialization constants are supported by target device, the runtime +iterates through the value map and invokes ``` pi_result piextProgramSetSpecializationConstant(pi_program prog, @@ -579,8 +780,17 @@ Plugin Interface function for descriptor of each property: `spec_id` and address of the specialization constant provided by user and `offset` field of the descriptor. +If native specialization constants are not supported by target device, then +the runtime calculates the location (offset) of each specialization constant in +corresponding runtime buffer and copied user-provided value into that location. + +**TODO**: buffer creation +**TODO**: lifetime of the buffer +**TODO**: offset calculation +**TODO**: handling of default values +**TODO**: setting buffer as kernel argument -#### SPIRV-LLVM-Translator +### SPIRV-LLVM-Translator Given the `__spirv_SpecConstant` intrinsic calls produced by the `SpecConstants` pass: @@ -629,208 +839,3 @@ the translator will generate `OpSpecConstant` SPIR-V instructions with proper OpReturnValue %struct OpFunctionEnd ``` - -### Emulation of specialization constants - -Emulation of specialization constants is performed by converting them into -kernel arguments. - -Overall idea is that DPC++ runtimes packs all specialization constants into a -single buffer, which is passed as an extra implicit kernel argument. Then the -compiler instead of lowering `__sycl_get*2020SpecConstantValue` intrinsics into -SPIR-V friendly IR replaces it with extracting an element from that buffer. - -"All" specialization constants here means complete list of specialization -constants encountered in an application or a shared library which is being -compiled: that list is computed by `sycl-post-link` tool and communicated to -the runtime through device image properties like it is described in "Support for -native specialization constants" section. - -#### DPC++ Headers - -The same DPC++ Headers are used for native and emulated specialization constants -and their design is decribed in the corresponding sub-section of "Support for -native specialization constants" section. - -However, that part of the document doesn't describe the third argument of -`__sycl_get*2020SpecConstantValue` intrinsics: it is a pointer to a runtime -buffer, which holds values of all specialization constants and should be used -to retrieve their values in device code. - -This pointer is stored within `kernel_handler` object and it is initialized only -if our target doesn't support native specialization constants. -Since `kernel_handler` object is not captured by SYCL kernel funtion, it means -that we are not able to employ some header-only solution here and need help of -the compiler. - -DPC++ FE searches for functions marked with `sycl_kernel` attribute to handle -them and turn into entry points of device code. - - -#### DPC++ FE - -DPC++ FE should look for `kernel_handler` argument in a function marked with -`sycl_kernel" attribute. If such argument is present, it means that this kernel -can access specialization constants and therefore FE needs to do the following: - -If native specialization constants are supported: -- create `kernel_handler` object -- use default constructor to initialize it -- pass that `kernel_handler` object to user-provided SYCL kernel function - -If native specialization constants are not supported: -- generate one more kernel argument for passing a buffer with specialization - constants values -- create `kernel_handler` object -- initialize that `kernel_handler` object with newly created kernel argument -- pass that `kernel_handler` object to user-provided SYCL kernel function -- Provide information about new kernel argument through the integration header - -So, having the following as the input: -``` -template -__attribute__((sycl_kernel)) void -kernel_single_task(const KernelType &KernelFunc, kernel_handler kh) { - KernelFunc(kh); -} -``` -For the target which doesn't have native support for specialization constatns -DPC++ FE shoud tranform it into something like: - -``` -__kernel void KernelName(args_for_lambda_init, ..., char *specialization_constants_buffer) { - KernelType LocalLambdaClone = { args_for_lambda_init }; // We already do this - kernel_handler LocalKernelHandler; - LocalKernelHandler.__init_specialization_constants_buffer(specialization_constants_buffer); - // Re-used body of "sycl_kernel" function: - { - LocalLambdaClone(LocalKernelHandler); - } -} -``` - -Also the new kernel argument `specialization_constants_buffer` should have -corresponding entry in the `kernel_signatures` structure in the integration -header. The param kind for this argument should be -`kernel_param_kind_t:specialization_constants_buffer`. - -Example: -``` - const kernel_param_desc_t kernel_signatures[] = { - //--- _ZTSN2cl4sycl6detail19__pf_kernel_wrapperIZZ4mainENK3$_0clERNS0_7handlerEE6init_aEE - { kernel_param_kind_t::kind_std_layout, 8, 0 }, - { kernel_param_kind_t::kind_accessor, 4062, 8 }, - { kernel_param_kind_t::kind_specialization_constants_buffer, /*parameter_size_in_bytes= */ 8, /*offset_in_lambda=*/0}, - }; - -``` - -Offset for this argument is zero since it has no any connected captured -variable. - -#### DPC++ Compiler: sycl-post-link tool - -When native specialization constants are not available, we need to lower -`__sycl_get*2020SpecializationConstant` intrinsic into some load from the -additional kernel argument, which points to a buffer with all specialization -constant values. - -We assume that both DPC++ compiler and runtime know the layout of that buffer so -the compiler can correctly access particular constants from it and the runtime -is able to properly fill the buffer with values of those specialization -constants. - -The layout is defined as follows: all specialization constants are sorted by -their numeric IDs (i.e. the order of they discovery by sycl-post-link tool) and -stored within a buffer one after each other without any paddings. So, the -specialization constant with ID `N` is located within a buffer at offset, which -is equal to sum of sizes of all specialization constants with ID less than `N`. - -For example, if we have the following specialization constants discovered in the -following order: -``` -struct custom_type { int a; double b; } -specialization_id id_double; -specialization_id id_custom; -specialization_id id_int; -``` -`id_double` will be located at the beginning of the buffer, because it is the -first discovered specialization constant (ID = 0). `id_custom` (ID = 1) will be -located at the offset 8, because we have a single specialization constant with -the ID < 1 and its size is 8 bytes. `id_int` (ID = 2) will be located at the -offset 20, which is computed as `sizeof(id_double) + sizeof(id_custom)`. - -When specialization constants emulation is requested, `sycl-post-link` replaces -calls to `__sycl_get*SpecializationConstant` intrinsics with the following -LLVM IR pattern: -``` -%gep = i8, i8* %arg_three_of_sycl_intrinsic_call, i64 [offset] -; We use the third argument of the __sycl_get*SpecializationConstant intrinsic -; as a pointer to where all specialization constants are stored -; [offset] here is a placeholder for some literal integer value computed by -; the pass based on the ID of the requested specialization constant as described -; above -%cast = bitcast i8* %gep to [return-type]* -; [return-type] here is a placeholder for the actual type of the requested -; specialization constant -%load = load [return-type], [return-type]* %cast -; %load is the resulting value, which should replace all uses of the original -; call to __sycl_get*SpecializationConstant intrinsic -``` - -**TODO**: elaborate on handling of composite types. - -##### Collecting spec constants info and communicating it to DPC++ RT - -As in the processing of native specialization constants, `sycl-post-link` emits -some information in device image properties, which is required by DPC++ runtime -to properly handle emulation of specialization constants. - -`sycl-post-link` provides two property sets when specialization constants are -emulated: -1. Mapping from Symbolic ID to offset -2. Mapping from Symbolic ID to the default value - -The first mapping can be substituted with the property set generated for native -specialization constants, but it is still provided in order to simplify the -runtime part, i.e. it allows to avoid calculating those offsets at runtime by -re-using ones calculated by the compiler. - -**TODO**: is it possible to have both native and emulated specialization -constants within a single device image? - -The second mapping is required and it allows the runtime to properly set default -values of specialization constants. - -**TODO**: document exact property set names and properties structure - -#### DPC++ Compiler: Generation of OpenCL kernel - -Optional `kernel_handler` SYCL kernel function argument should be created by -front-end and passed to SYCL kernel function if it is expected there. - -So, the following SYCL code -``` -specialization_id id_int; -class WithSpecConst; -class WithoutSpecConst; -// ... -/* ... */.single_task([=](kernel_handler h) { - auto v = h.get_specialization_constant(); - // ... -}); -/* ... */.single_task([=]() { - // ... -}); -``` - -Should produce something like this (pseudo-code): -``` -void WithSpecConstOpenCLKernel(/* ... */) { - kernel_handler h; - WithSpecConstSYCLKernelFunction(/* ... */, h); -} -void WithoutSpecConstOpenCLKernel(/* ... */) { - WithoutSpecConstSYCLKernelFunction(/* ... */); -} -``` From 82882b9a91580d741020f2bfe694cf53fab00824 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 2 Apr 2021 10:57:50 +0300 Subject: [PATCH 08/24] Apply code review comments Fixed typos, updated description of the device image property with default values of specialization constants. --- sycl/doc/SYCL2020-SpecializationConstants.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 87a63a3bf64a1..a661ea63c28f0 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -139,7 +139,7 @@ Based on those limitations, the following mapping design is proposed: Another significant part of the design is how specialization constants support is emulated: as briefly mentioned before, the general approach is to transform specialization constants into kernel arguments. In fact all specialization -constants used within a program a bundler together and stored into a single +constants used within a program are bundled together and stored into a single buffer, which is passed as implicit kernel argument. The layout of that buffer is well-defined and known to both the compiler and the runtime, so when user sets the value of a specialization constant, that value is being copied into @@ -566,14 +566,19 @@ Additionally, another property set will be generated to support emulated specialization constants, which will contain a single property with default values of all specialization constants in the same form as they will be propagated from host to device through kernel arguments, i.e. this property will -simply contain a blob, which can be used by DPC++ RT to either pre-initialize -the whole buffer for specialization constants with their default value or to -extract default value of a particular specialization constant out of it. +simply contain a blob that for each specialization constant of type `A` +represents an object of type `A` constructed with values passed to +`specialization_id` constructor; those values are ordered in ascending order of +numeric SPIR-V IDs assigned to corresponding specialization constants. + +This blob can be used by DPC++ RT to either pre-initialize the whole buffer for +specialization constants with their default value or to extract default value of +a particular specialization constant out of it. For example, the following code: ``` struct Nested { - constexpr Nested(float a, float b) : a(a), b(b) {} + constexpr Nested(float a, float b) : a(a + 1.0), b(b + 1.0) {} float a, b; }; struct A { @@ -603,7 +608,7 @@ property_set { ValAddr: points to byte array [ 42, // id_int 1, 2.0, 3.0, // id_A - 4.0, 5.0 // id_Nested + 5.0, 6.0 // id_Nested ], Type: PI_PROPERTY_TYPE_BYTE_ARRAY, Size: sizeof(byte array above) From edb1586bd90efed3a8d2bff66abb1e3184bb0168 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 2 Apr 2021 11:02:55 +0300 Subject: [PATCH 09/24] Align description of integration footer content Details of proposed mapping mechanism didn't match between the overview and DPC++ FE sections: aligned them to use function template in both cases. Also added `inline` keyword to `get_spec_constant_symbolic_ID` function template. --- sycl/doc/SYCL2020-SpecializationConstants.md | 24 ++++++++------------ 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index a661ea63c28f0..d0a92803e2170 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -102,7 +102,7 @@ Based on those limitations, the following mapping design is proposed: ``` namespace detail { template - const char *get_spec_constant_symbolic_ID(); + inline const char *get_spec_constant_symbolic_ID(); } ``` Which is only declared, but not defined in there and used to retrieve required @@ -120,11 +120,11 @@ Based on those limitations, the following mapping design is proposed: // }; template<> - const char *get_spec_constant_symbolic_ID() { + inline const char *get_spec_constant_symbolic_ID() { return "unique_name_for_int_const"; } template<> - const char *get_spec_constant_symbolic_ID() { + inline const char *get_spec_constant_symbolic_ID() { return "unique_name_for_Wrapper_float_const"; } } @@ -712,24 +712,20 @@ Will look like: namespace detail { // generic declaration template -struct get_symbolic_id_helper{}; +inline const char *get_spec_constant_symbolic_ID(); // specializations for each specialization constant: // we can refer to all those specialization_id variables, because integration // footer was _appended_ to the user-provided translation unit template<> -struct get_symbolic_id_helper { - static const char *get_symbolic_id() { - return "result of __builtin_unique_ID(id_int) encoded here"; - } -}; +inline const char *get_spec_constant_symbolic_ID() { + return "result of __builtin_unique_ID(id_int) encoded here"; +} template<> -struct get_symbolic_id_helper { - static const char *get_symbolic_id() { - return "result of __builtin_unique_ID(A) encoded here"; - } -}; +inline const char *get_spec_constant_symbolic_ID() { + return "result of __builtin_unique_ID(A) encoded here"; +} } // namespace detail From ce4788a2f1c1556042f9cc356a6eea4e1e528aba Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 2 Apr 2021 12:25:43 +0300 Subject: [PATCH 10/24] Add description/requirements for __builtin_unique_ID --- sycl/doc/SYCL2020-SpecializationConstants.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index d0a92803e2170..5e1872ad04c2c 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -625,6 +625,7 @@ DPC++ FE is responsible for several things related to specialization constants: buffer with specialization constants values when they are emulated. 3. Communicating to DPC++ RT mapping between `specialization_id`s and corresponding symbolic IDs through integration footer. +4. It provides `__builtin_unique_ID` implementation. `kernel_handler` is defined by SYCL 2020 specification as interface for retrieving specialization constant values in SYCL kernel functions, but it @@ -738,6 +739,15 @@ generating integration footer at all, but since the host part of the program can be compiled with a third-party C++ 17-compatible compiler, which is unaware of the clang-specific built-ins, it can result in build errors. +`__builtin_unique_ID` is defined as follows: it accepts a variable and returns +a C-string (`const char *`), which: +- if the variable has external linkage, the string must be consistent in all + translation units that reference this same variable. +- if the variable has internal linkage, the string must be unique across all + translation units. +- return string must be the same if the built-in was called twice for the same + variable within a single translation unit. + ### DPC++ runtime For each device binary compiler generates a map From beed9ad0b37e23a28f3cb4c1adb32e70b654e99e Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Mon, 5 Apr 2021 17:04:21 +0300 Subject: [PATCH 11/24] Update integration footer description Expanded the example, fixed some bugs in it. Added clarification that `get_spec_constant_symbolic_ID` helper is generated for each definition of `specialization_id` variable. --- sycl/doc/SYCL2020-SpecializationConstants.md | 36 ++++++++++++++++---- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 5e1872ad04c2c..0269e1d7ab6b1 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -698,12 +698,19 @@ struct A { float a, b; }; -specialization_id id_int; -specialization_id id_A; +constexpr specialization_id id_int; +constexpr specialization_id id_A; +constexpr inline specialization_id id_double; +constexpr inline specialization_id id_float; // ... [&](handler &cgh) { cgh.set_specialization_constant(42); - cgh.set_specialization_constant({3.14, 3.14}); + cgh.get_specialization_constant(); + // ... + [=](kernel_handler h) { + h.get_specialization_constant(); + h.get_specialization_constant(); + } } ``` @@ -719,21 +726,36 @@ inline const char *get_spec_constant_symbolic_ID(); // we can refer to all those specialization_id variables, because integration // footer was _appended_ to the user-provided translation unit template<> -inline const char *get_spec_constant_symbolic_ID() { +inline const char *get_spec_constant_symbolic_ID() { return "result of __builtin_unique_ID(id_int) encoded here"; } template<> -inline const char *get_spec_constant_symbolic_ID() { - return "result of __builtin_unique_ID(A) encoded here"; +inline const char *get_spec_constant_symbolic_ID() { + return "result of __builtin_unique_ID(id_A) encoded here"; +} + +template<> +inline const char *get_spec_constant_symbolic_ID() { + return "result of __builtin_unique_ID(id_double) encoded here"; +} + +template<> +inline const char *get_spec_constant_symbolic_ID() { + return "result of __builtin_unique_ID(id_float) encoded here"; } } // namespace detail // TODO: elaborate why we have to include handler implementation here -#include +#include ``` +Note that `get_spec_constant_symbolic_ID` specialization are generated for each +definition of `specialization_id` object regardless of its uses within SYCL +kernel functions: those IDs are used by DPC++ RT as well even for those spec +constants, which are never accessed on device. + NOTE: By direct using `__builtin_unique_ID` in DPC++ Headers we could avoid generating integration footer at all, but since the host part of the program can be compiled with a third-party C++ 17-compatible compiler, which is unaware of From 9faccb58fd4f5576d3d09fc00136fbf51768578e Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 7 Apr 2021 15:07:32 +0300 Subject: [PATCH 12/24] Update signatures of __sycl_get*2020SpecConstantValue Added `const` to all arguments --- sycl/doc/SYCL2020-SpecializationConstants.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 0269e1d7ab6b1..02ccb42371629 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -185,11 +185,10 @@ device code and properly handle them. ``` namespace sycl { -// TODO: Add `const` to `DefaultValue` and `RTBuffer`? template -T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, void *RTBuffer); +T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, const void *DefaultValue, const void *RTBuffer); template -T __sycl_getComposite2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, void *RTBuffer); +T __sycl_getComposite2020SpecConstantValue(const char *SymbolicID, const void *DefaultValue, const void *RTBuffer); class kernel_handler { public: @@ -339,9 +338,9 @@ T __spirv_SpecConstantComposite(Elements... elements); ``` Particularly, `SpecConstantsPass` translates calls to the -`T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, void *DefaultValue, char *RTBuffer)` +`T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, const void *DefaultValue, const char *RTBuffer)` intrinsic into calls to `T __spirv_SpecConstant(int ID, T default_val)`. -And for `T __sycl_getComposite2020SpecConstantValue(const chat *SybmolicID, void *DefaultValue, char *RTBuffer)` +And for `T __sycl_getComposite2020SpecConstantValue(const chat *SybmolicID, const void *DefaultValue, const char *RTBuffer)` it generates number of `T __spirv_SpecConstant(int ID, T default_val)` calls for each leaf of the composite type, plus number of `T __spirv_SpecConstantComposite(Elements... elements)` for each composite type From 0a02d81baf619e9b99821efa6c247fe69155ba25 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 7 Apr 2021 15:27:56 +0300 Subject: [PATCH 13/24] Add missing constexpr --- sycl/doc/SYCL2020-SpecializationConstants.md | 22 ++++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 02ccb42371629..85caf2f76983a 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -292,8 +292,8 @@ struct A { Nested n; }; -specialization_id id_int; -specialization_id id_A; +constexpr specialization_id id_int; +constexpr specialization_id id_A; // ... [=](kernel_handler h) { h.get_specialization_constant(); @@ -403,7 +403,7 @@ offset, then `bitcast` to pointer to proper return type (because the buffer pointer is just an "untyped" `i8 *`) and `load`. An example of that LLVM IR: ``` ; an example for: -; specialization_id id_double; +; constexpr specialization_id id_double; ; [=](kernel_handler h) { ; h.get_specialization_constant(); @@ -432,9 +432,9 @@ struct A { Nested n; }; -specialization_id id_int; -specialization_id id_A; -specialization_id id_Nested; +constexpr specialization_id id_int; +constexpr specialization_id id_A; +constexpr specialization_id id_Nested; // ... [=](kernel_handler h) { h.get_specialization_constant(); @@ -525,8 +525,8 @@ struct A { Nested n; }; -specialization_id id_int; -specialization_id id_A; +constexpr specialization_id id_int; +constexpr specialization_id id_A; // ... [=](kernel_handler h) { h.get_specialization_constant(); @@ -586,9 +586,9 @@ struct A { Nested n; }; -specialization_id id_int(42); -specialization_id id_A(1, 2.0, 3.0); -specialization_id id_Nested(4.0, 5.0); +constexpr specialization_id id_int(42); +constexpr specialization_id id_A(1, 2.0, 3.0); +constexpr specialization_id id_Nested(4.0, 5.0); // ... [=](kernel_handler h) { h.get_specialization_constant(); From 1d933ca0c2a25e48344ba66d0976b08c47a5f52d Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 7 Apr 2021 16:05:56 +0300 Subject: [PATCH 14/24] Cleanup outdated things --- sycl/doc/SYCL2020-SpecializationConstants.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 85caf2f76983a..76c9858f922c4 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -788,7 +788,7 @@ trick is used within `set_specialization_constant` method: template void set_specialization_constant( typename std::remove_reference_t::type value) { - const char *SymbolicID = detail::get_symbolic_id_helper::get_symbolic_id(); + const char *SymbolicID = detail::get_spec_constant_symbolic_ID(); // remember the value of the specialization constant SpecConstantValuesMap[SymbolicID] = value; } From 4aa87da0ff03b8e41201e233733f9c0fc350a4ca Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 7 Apr 2021 17:27:44 +0300 Subject: [PATCH 15/24] Add a bit more details into runtime section --- sycl/doc/SYCL2020-SpecializationConstants.md | 24 +++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 76c9858f922c4..9669e27f774b7 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -797,8 +797,8 @@ void set_specialization_constant( Before invoking JIT compilation of a program, the runtime "flushes" specialization constants: -If native specialization constants are supported by target device, the runtime -iterates through the value map and invokes +If native specialization constants are supported by the target device, the +runtime iterates through the value map and invokes ``` pi_result piextProgramSetSpecializationConstant(pi_program prog, @@ -812,15 +812,23 @@ Plugin Interface function for descriptor of each property: `spec_id` and address of the specialization constant provided by user and `offset` field of the descriptor. -If native specialization constants are not supported by target device, then +If native specialization constants are not supported by the target device, then the runtime calculates the location (offset) of each specialization constant in corresponding runtime buffer and copied user-provided value into that location. -**TODO**: buffer creation -**TODO**: lifetime of the buffer -**TODO**: offset calculation -**TODO**: handling of default values -**TODO**: setting buffer as kernel argument +That buffer should be allocated for each `kernel_bundler` or `queue::submit` and +it should be set as a kernel argument, if corresponding `kernel_signature` +contains `kernel_param_kind_t::kind_specialization_constants_buffer`. + +Offsets into that buffer are calculated based on "SYCL/specialization constants" +property set, i.e. all properties from there are sorted in ascending order of +their numeric IDs and offset for each specialization constant is calculated as +sum of sizes of all other specialization constants with smaller numeric ID. + +In order to properly set default values of specialization constants, +"SYCL/specialization constants default values" property set is used: its content +is used to either fully or partially initialize the buffer with specialization +constant values. ### SPIRV-LLVM-Translator From 0e756c1c3147b2a0b5d701037a2b868716c63346 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 7 Apr 2021 17:39:12 +0300 Subject: [PATCH 16/24] Add feature overview and an example --- sycl/doc/SYCL2020-SpecializationConstants.md | 60 +++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 9669e27f774b7..597dcde616344 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -7,7 +7,65 @@ by SYCL 2020 specification: [SYCL registry][sycl-registry], [sycl-registry]: https://www.khronos.org/registry/SYCL/ [sycl-2020-spec]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html -TODO: feature overview? code example? +> Specialization constants represent constants whose values can be set +> dynamically during execution of the SYCL application. The values of these +> constants are fixed when a SYCL kernel function is invoked, and they do not +> change during the execution of the kernel. However, the application is able to +> set a new value for a specialization constants each time a kernel is invoked, +> so the values can be tuned differently for each invocation. +> +> [Section 4.9.5 Specialization constants][sycl-2020-4-9-5] + +[sycl-2020-4-9-5]: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constants + +Example usage: + +``` +#include +using namespace sycl; + +using coeff_t = std::array, 3>; + +// Read coefficients from somewhere. +coeff_t get_coefficients(); + +// Identify the specialization constant. +constexpr specialization_id coeff_id; + +void do_conv(buffer in, buffer out) { + queue myQueue; + + myQueue.submit([&](handler &cgh) { + accessor in_acc { in, cgh, read_only }; + accessor out_acc { out, cgh, write_only }; + + // Set the coefficient of the convolution as constant. + // This will build a specific kernel the coefficient available as literals. + cgh.set_specialization_constant(get_coefficients()); + + cgh.parallel_for( + in.get_range(), [=](item<2> item_id, kernel_handler h) { + float acc = 0; + coeff_t coeff = h.get_specialization_constant(); + for (int i = -1; i <= 1; i++) { + if (item_id[0] + i < 0 || item_id[0] + i >= in_acc.get_range()[0]) + continue; + for (int j = -1; j <= 1; j++) { + if (item_id[1] + j < 0 || item_id[1] + j >= in_acc.get_range()[1]) + continue; + // The underlying JIT can see all the values of the array returned + // by coeff.get(). + acc += coeff[i + 1][j + 1] * + in_acc[item_id[0] + i][item_id[1] + j]; + } + } + out_acc[item_id] = acc; + }); + }); + + myQueue.wait(); +} +``` ## Design objectives From 9797c17f59398d0c4f8f6d3222b4a35ee19cc820 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Thu, 8 Apr 2021 11:34:31 +0300 Subject: [PATCH 17/24] Update sycl/doc/SYCL2020-SpecializationConstants.md Co-authored-by: Dmitry Vodopyanov --- sycl/doc/SYCL2020-SpecializationConstants.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 597dcde616344..b2ca3bb0c9700 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -450,7 +450,7 @@ specialization constants and have to emulate them somehow. As stated above, it is done by converting specialization constants into kernel arguments: they all bundled together and put into a single buffer. -`SpecConstatnsPass` should generate proper accesses to that buffer when +`SpecConstantsPass` should generate proper accesses to that buffer when specialization constants are used: this is done by replacing special `__sycl_getScalar2020SpecConstantValue` and `__sycl_getComposite2020SpecConstantValue` functions with accesses to their From 558f8375bbfdba188635813949fec2c3ff619388 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 9 Apr 2021 10:15:21 +0300 Subject: [PATCH 18/24] Apply suggestions from code review Co-authored-by: Romanov Vlad <17316488+romanovvlad@users.noreply.github.com> --- sycl/doc/SYCL2020-SpecializationConstants.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index b2ca3bb0c9700..4c287568024be 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -170,7 +170,7 @@ Based on those limitations, the following mapping design is proposed: injected at the end of a translation unit: ``` namespace detail { - // assuming user defined and used the following specialization_id: + // assuming user defined the following specialization_id: // constexpr specialiation_id int_const; // class Wrapper { // public: @@ -639,7 +639,7 @@ struct Nested { float a, b; }; struct A { - constexpr A(int x, float a, b) : x(x), n(a, b) {} + constexpr A(int x, float a, float b) : x(x), n(a, b) {} int x; Nested n; }; @@ -664,7 +664,7 @@ property_set { Name: "all", ValAddr: points to byte array [ 42, // id_int - 1, 2.0, 3.0, // id_A + 1, 3.0, 4.0, // id_A 5.0, 6.0 // id_Nested ], Type: PI_PROPERTY_TYPE_BYTE_ARRAY, @@ -874,7 +874,7 @@ If native specialization constants are not supported by the target device, then the runtime calculates the location (offset) of each specialization constant in corresponding runtime buffer and copied user-provided value into that location. -That buffer should be allocated for each `kernel_bundler` or `queue::submit` and +That buffer should be allocated for each `device_image` and it should be set as a kernel argument, if corresponding `kernel_signature` contains `kernel_param_kind_t::kind_specialization_constants_buffer`. From 86853e22eeab87634da7f0bc7e99d8840466e7d3 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Tue, 13 Apr 2021 11:46:26 +0300 Subject: [PATCH 19/24] Apply comments --- sycl/doc/SYCL2020-SpecializationConstants.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 597dcde616344..b051d02dc8aa9 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -118,15 +118,16 @@ composite specialization constants do not have separate ID and can only be set by setting value to each member of a composite, which means that we have `1:n` mapping between SYCL identifiers and numeric IDs of specialization constants. -4. When AOT compilation is used or target is a CUDA device (where NVPTX -intermediate representation is used), we need to somehow emulate support for -specialization constants. +4. When AOT compilation is used or the target device does not use SPIR-V as the +device code format (for example, CUDA device, where NVPTX intermediate +representation is used), we need to somehow emulate support for specialization +constants. ## Design As stated above, native specialization constants support is based on -corresponding SPIR-V functionality, while emulation is supposed to be -implemented through transforming specialization constants into kernel arguments. +corresponding SPIR-V functionality, while emulation is implemented through +transforming specialization constants into kernel arguments. In DPC++ Headers/DPC++ RT we don't know a lot of necessary information about specialization constants, like: which numeric ID is used for particular From 71ece5916d29e0d3119607291c60a9cf1e7222fa Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 14 Apr 2021 15:30:21 +0300 Subject: [PATCH 20/24] Apply comments, update integration footer section --- sycl/doc/SYCL2020-SpecializationConstants.md | 177 +++++++++++++++---- 1 file changed, 143 insertions(+), 34 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index b051d02dc8aa9..adb4e316eaf3e 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -678,12 +678,19 @@ property_set { ### DPC++ Compiler: front-end DPC++ FE is responsible for several things related to specialization constants: -1. Handling of `kernel_handler` SYCL kernel function argument. -2. Communicating to DPC++ RT which kernel argument should be used for passing - buffer with specialization constants values when they are emulated. -3. Communicating to DPC++ RT mapping between `specialization_id`s and - corresponding symbolic IDs through integration footer. -4. It provides `__builtin_unique_ID` implementation. + +While transforming SYCL kernel function into an OpenCL kernel, DPC++ FE should +- Handle `kernel_handler` argument: it is not captured by lambda and therefore + should be separately handled in DPC++ FE +- Communicate to DPC++ RT which kernel argument should be used for passing + a buffer with specialization constant values when they are emulated. + +DPC++ FE provides implementation of `__builtin_unique_ID` built-in function and +it also populates special integration footer with the content required by DPC++ +RT for access to right device image properties describing specialization +constants. + +#### SYCL Kernel function transformations `kernel_handler` is defined by SYCL 2020 specification as interface for retrieving specialization constant values in SYCL kernel functions, but it @@ -743,21 +750,46 @@ integration header mechanism, i.e. it is added as new entry to `kernel_signatures` structure there with parameter kind set to a new enumeration value `kernel_param_kind_t::kind_specialization_constants_buffer`. -Those were descriptions of tasks (1) and (2) of DPC++ FE. Task (3) is to help -DPC++ RT to connect user-provided `specialization_id` variable with -corresponding symbolic ID of a specialization constant when -`handler::set_specialization_constant` is invoked. +#### `__builtin_unique_ID` -As noted above, we can't use regular integration header here, because in general -case, `specialization_id` variables can't be forward-declared. Therefore, we are -using integration footer approach, which for the following code snippet: +This built-in is used to generate unique identifiers for specialization +constants, which are used in communication between the compiler and the runtime. + +`__builtin_unique_ID` is defined as follows: it accepts a variable and returns +a C-string (`const char *`), which: +- if the input variable has external linkage, the string must be the same in all + translation units that pass this same variable to the built-in. +- if the input variable has internal linkage, the string must be unique across + all translation units. +- return string must be the same if the built-in was called twice for the same + variable within a single translation unit (regardless of its linkage type). + +#### Integration footer generation + +Note: we could have used `__builtin_unique_ID` directly in DPC++ Headers, but +this would break compilation of those with a third-party C++ 17-compatible +compiler, which is unaware of this built-in function. Therefore, the compiler +generates a header file, which includes _the result_ of calling +`__builtin_unique_ID` function and it is included into the user's program. By +doing so we can still use this non-standard built-in function and preserve +support for third-party host compilers. + +However, as noted above, we can't use regular integration header here, because +in general case, `specialization_id` variables can't be forward-declared. +Therefore, we are using _integration footer_ approach, i.e. we generate a header +file which must be included at the end of a translation unit. + +For the following code snippet: ``` struct A { float a, b; }; constexpr specialization_id id_int; -constexpr specialization_id id_A; +struct Wraper { +public: + static constexpr specialization_id id_A; +}; constexpr inline specialization_id id_double; constexpr inline specialization_id id_float; // ... @@ -767,20 +799,19 @@ constexpr inline specialization_id id_float; // ... [=](kernel_handler h) { h.get_specialization_constant(); - h.get_specialization_constant(); + h.get_specialization_constant(); } } ``` -Will look like: +The integration footer will look like: ``` namespace detail { -// generic declaration -template -inline const char *get_spec_constant_symbolic_ID(); +// Note: we do not declare `get_spec_constant_symbolic_ID` here and assume that +// it is declared in some other header which was already included. -// specializations for each specialization constant: +// specializations for each specialization constant (for each `specialization_id`): // we can refer to all those specialization_id variables, because integration // footer was _appended_ to the user-provided translation unit template<> @@ -789,8 +820,8 @@ inline const char *get_spec_constant_symbolic_ID() { } template<> -inline const char *get_spec_constant_symbolic_ID() { - return "result of __builtin_unique_ID(id_A) encoded here"; +inline const char *get_spec_constant_symbolic_ID() { + return "result of __builtin_unique_ID(Wrapper::id_A) encoded here"; } template<> @@ -814,19 +845,97 @@ definition of `specialization_id` object regardless of its uses within SYCL kernel functions: those IDs are used by DPC++ RT as well even for those spec constants, which are never accessed on device. -NOTE: By direct using `__builtin_unique_ID` in DPC++ Headers we could avoid -generating integration footer at all, but since the host part of the program can -be compiled with a third-party C++ 17-compatible compiler, which is unaware of -the clang-specific built-ins, it can result in build errors. -`__builtin_unique_ID` is defined as follows: it accepts a variable and returns -a C-string (`const char *`), which: -- if the variable has external linkage, the string must be consistent in all - translation units that reference this same variable. -- if the variable has internal linkage, the string must be unique across all - translation units. -- return string must be the same if the built-in was called twice for the same - variable within a single translation unit. +##### Ambiguous references to specialization_id + +There are valid C++ code examples, where references to `specialization_id` +variables could be ambiguous if they just referenced from a global namespace +like shown above. For example: + +``` +constexpr sycl::specialization_id same_name{1}; + +/* application code that references "::same_name" */ + +namespace { + constexpr sycl::specialization_id same_name{2}: + /* application code that referenes ::(unnamed)::same_name */ + namespace { + constexpr sycl::specialization_id same_name{3}: + /* application code that referenes ::(unnamed)::(unnamed)::same_name */ + } +} + +/* application code that references "::same_name" */ +``` + +In that case we can't use `same_name` for specializing +`get_spec_constant_symbolic_ID`, because it would be ambiguous reference. +However, we can do the following trick: + +``` +// Content of integration footer for the example above + +// For unambiguous references we can generate regular specialization +template<> +inline const char *get_spec_constant_symbolic_ID<::same_name>() { + return "result of __builtin_unique_ID(::same_name) encoded here"; +} + +// For ambiguous references we generate 'shim' functions, which allows us to +// get an address of a variable within a (possible nested) anonymous namespace +// without spelling it. +namespace { + namespace __sycl_detail { + // This helper is need to get addresses of variables defined within + // anonymous namespace. + // It is generated for each specialization_id within an anonymous namespace + // if there is the same specialization_id defined in global namespace + static constexpr decltype(spec_name) __spec_id_shim_0() { + // address of ::(unnamed)::same_name; + return spec_name; + } + } +} +namespace sycl { + namespace detail { + // By using 'shim' function were are able to unambiguously refer to a + // variable within an anonymous namespace + template<> + inline const char *get_spec_constant_symbolic_ID<::__sycl_detail::__spec_id_shim_0()>() { + return "unique id for ::(unnamed)::same_name"; + } + } +} +namespace { + namespace { + namespace __sycl_detail { + static constexpr decltype(same_name) &spec_id_shim_1() { + // address of ::(unnamed)::(unnamed)::same_name; + return same_name; + } + } + } + + namespace __sycl_detail { + // Sometimes we need a 'shim', which points to another 'shim' in order to + // "extract" a variable from an anonymous namespace unambiguosly + static constexpr decltype(__sycl_detail::__spec_id_shim_1()) &__spec_id_shim_2() { + // still address of ::(unnamed)::(unnamed)::same_name; + return __sycl_detail::__spec_id_shim_1(); + } + } +} +namespace sycl { + namespace detail { + template<> + inline const char *get_spec_constant_symbolic_ID<::__sycl_detail::__spec_id_shim_2()>() { + return "unique id for ::(unnamed)::(unnamed)::same_name"; + } + } +} + +``` ### DPC++ runtime From a0ff0e8fc73f7f8a3500f1a5eea1cffd9f0f4483 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Wed, 23 Jun 2021 23:31:04 +0300 Subject: [PATCH 21/24] __builtin_unique_ID -> __builtin_sycl_unique_id --- sycl/doc/SYCL2020-SpecializationConstants.md | 32 +++++++++++--------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index adb4e316eaf3e..718b89017d875 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -265,14 +265,14 @@ private: template::type> // enable_if T is a scalar type T get_on_device() { - auto ID = __builtin_unqiue_ID(SpecName); + auto ID = __builtin_sycl_unique_id(SpecName); return __sycl_getScalar2020SpecConstantValue(ID, &S, Ptr); } template::type> // enable_if T is a composite type T get_on_device() { - auto ID = __builtin_unqiue_ID(SpecName); + auto ID = __builtin_sycl_unique_id(SpecName); return __sycl_getComposite2020SpecConstantValue(ID, &S, Ptr); } #endif // __SYCL_DEVICE_ONLY__ @@ -283,7 +283,7 @@ private: } // namespace sycl ``` -Here `__builtin_unique_ID` is a new compiler built-in which is supposed to +Here `__builtin_sycl_unique_id` is a new compiler built-in which is supposed to generate unique symbolic IDs for specialization constants. `__sycl_getScalar2020SpecConstantValue` and @@ -685,7 +685,7 @@ While transforming SYCL kernel function into an OpenCL kernel, DPC++ FE should - Communicate to DPC++ RT which kernel argument should be used for passing a buffer with specialization constant values when they are emulated. -DPC++ FE provides implementation of `__builtin_unique_ID` built-in function and +DPC++ FE provides implementation of `__builtin_sycl_unique_id` built-in function and it also populates special integration footer with the content required by DPC++ RT for access to right device image properties describing specialization constants. @@ -750,12 +750,12 @@ integration header mechanism, i.e. it is added as new entry to `kernel_signatures` structure there with parameter kind set to a new enumeration value `kernel_param_kind_t::kind_specialization_constants_buffer`. -#### `__builtin_unique_ID` +#### `__builtin_sycl_unique_id` This built-in is used to generate unique identifiers for specialization constants, which are used in communication between the compiler and the runtime. -`__builtin_unique_ID` is defined as follows: it accepts a variable and returns +`__builtin_sycl_unique_id` is defined as follows: it accepts a variable and returns a C-string (`const char *`), which: - if the input variable has external linkage, the string must be the same in all translation units that pass this same variable to the built-in. @@ -766,11 +766,11 @@ a C-string (`const char *`), which: #### Integration footer generation -Note: we could have used `__builtin_unique_ID` directly in DPC++ Headers, but +Note: we could have used `__builtin_sycl_unique_id` directly in DPC++ Headers, but this would break compilation of those with a third-party C++ 17-compatible compiler, which is unaware of this built-in function. Therefore, the compiler generates a header file, which includes _the result_ of calling -`__builtin_unique_ID` function and it is included into the user's program. By +`__builtin_sycl_unique_id` function and it is included into the user's program. By doing so we can still use this non-standard built-in function and preserve support for third-party host compilers. @@ -807,6 +807,8 @@ constexpr inline specialization_id id_float; The integration footer will look like: ``` +__SYCL_INLINE_NAMESPACE(cl) { +namespace sycl { namespace detail { // Note: we do not declare `get_spec_constant_symbolic_ID` here and assume that // it is declared in some other header which was already included. @@ -816,27 +818,29 @@ namespace detail { // footer was _appended_ to the user-provided translation unit template<> inline const char *get_spec_constant_symbolic_ID() { - return "result of __builtin_unique_ID(id_int) encoded here"; + return "result of __builtin_sycl_unique_id(id_int) encoded here"; } template<> inline const char *get_spec_constant_symbolic_ID() { - return "result of __builtin_unique_ID(Wrapper::id_A) encoded here"; + return "result of __builtin_sycl_unique_id(Wrapper::id_A) encoded here"; } template<> inline const char *get_spec_constant_symbolic_ID() { - return "result of __builtin_unique_ID(id_double) encoded here"; + return "result of __builtin_sycl_unique_id(id_double) encoded here"; } template<> inline const char *get_spec_constant_symbolic_ID() { - return "result of __builtin_unique_ID(id_float) encoded here"; + return "result of __builtin_sycl_unique_id(id_float) encoded here"; } } // namespace detail +} //namespace sycl +} // __SYCL_INLINE_NAMESPACE(cl) -// TODO: elaborate why we have to include handler implementation here +// get_spec_constant_symbolic_ID #include ``` @@ -879,7 +883,7 @@ However, we can do the following trick: // For unambiguous references we can generate regular specialization template<> inline const char *get_spec_constant_symbolic_ID<::same_name>() { - return "result of __builtin_unique_ID(::same_name) encoded here"; + return "result of __builtin_sycl_unique_id(::same_name) encoded here"; } // For ambiguous references we generate 'shim' functions, which allows us to From e88f4027cb8a6385d7db3d2104ca8aea600cdfa9 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 25 Jun 2021 11:55:04 +0300 Subject: [PATCH 22/24] Fix a typo --- sycl/doc/SYCL2020-SpecializationConstants.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 718b89017d875..3d6108251ebab 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -399,7 +399,7 @@ T __spirv_SpecConstantComposite(Elements... elements); Particularly, `SpecConstantsPass` translates calls to the `T __sycl_getScalar2020SpecConstantValue(const char *SymbolicID, const void *DefaultValue, const char *RTBuffer)` intrinsic into calls to `T __spirv_SpecConstant(int ID, T default_val)`. -And for `T __sycl_getComposite2020SpecConstantValue(const chat *SybmolicID, const void *DefaultValue, const char *RTBuffer)` +And for `T __sycl_getComposite2020SpecConstantValue(const char *SybmolicID, const void *DefaultValue, const char *RTBuffer)` it generates number of `T __spirv_SpecConstant(int ID, T default_val)` calls for each leaf of the composite type, plus number of `T __spirv_SpecConstantComposite(Elements... elements)` for each composite type From fc2faf303762674a426f00e65bbfb1b67a9ad170 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 25 Jun 2021 12:31:00 +0300 Subject: [PATCH 23/24] Add description for spec_const_integration --- sycl/doc/SYCL2020-SpecializationConstants.md | 25 +++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index 3d6108251ebab..d397e0082ad0a 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -840,7 +840,7 @@ inline const char *get_spec_constant_symbolic_ID() { } //namespace sycl } // __SYCL_INLINE_NAMESPACE(cl) -// get_spec_constant_symbolic_ID +// Detailed description of this header is provided below in corresponding RT section #include ``` @@ -939,6 +939,7 @@ namespace sycl { } } +#include ``` ### DPC++ runtime @@ -1002,6 +1003,28 @@ In order to properly set default values of specialization constants, is used to either fully or partially initialize the buffer with specialization constant values. +#### sycl/detail/spec_const_integration.hpp header file + +DPC++ RT needs to have access to a mapping between `specialization_id` variables +and corresponding unique symbolic IDs used by the compiler. As already stated +above, we use integration footer for that by providing template specializations +of `get_spec_constant_symbolic_ID` function template. + +The tricky thing here, is that C++ specification states the following: + +> Specialization must be declared before the first use that would cause implicit +> instantiation, in every translation unit where such use occurs. +> +> [cppreference][cppreference-template-specialization] + +[cppreference-template-specialization]: https://en.cppreference.com/w/cpp/language/template_specialization + +That means that all users of `get_spec_constant_symbolic_ID` has to appear +*after* we defined all `get_spec_constant_symbolic_ID` template specializations. + +`sycl/detail/spec_const/integration.hpp` header file is intended to be a +location for such methods/classes/functions. + ### SPIRV-LLVM-Translator Given the `__spirv_SpecConstant` intrinsic calls produced by the From 8d809e3dfb5df39e5cb261fc854106751498fd34 Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Fri, 25 Jun 2021 12:33:20 +0300 Subject: [PATCH 24/24] Resolve TODO about symbolic ID argument --- sycl/doc/SYCL2020-SpecializationConstants.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sycl/doc/SYCL2020-SpecializationConstants.md b/sycl/doc/SYCL2020-SpecializationConstants.md index d397e0082ad0a..57bd63b606d0c 100644 --- a/sycl/doc/SYCL2020-SpecializationConstants.md +++ b/sycl/doc/SYCL2020-SpecializationConstants.md @@ -293,8 +293,10 @@ a special LLVM pass later and this is aforementioned special markup required for the compiler. Those functions accept three parameters: -1. Symbolic ID of specialization constant. - TODO: do we need more details here? +1. Symbolic ID of specialization constant. This must be a constant string, which + will be used by the compiler to uniquely identify the specialization + constant. Device image properties generated by the compiler will use that + string as well to attach additional data to the constant. 2. Default value of the specialization constant. It is expected that at LLVM IR level the argument will contain a pointer to