-
Notifications
You must be signed in to change notification settings - Fork 768
Adding compile time and runtime properties for usm allocations #5656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sycl/doc/extensions/proposed/sycl_ext_oneapi_usm_properties.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_usm_properties.asciidoc
Outdated
Show resolved
Hide resolved
I think this spec should solely cover the new malloc overloads that accept a properties argument and all references to buffer_location should be removed. Buffer_location is an INTEL extension that is intended only for FPGAs while malloc accepting properties is a oneAPI extension that is more generally useful across targets. We don't want to merge the two as some implementations might want to support properties on malloc without supporting buffer_location. I don't even think we need a new extension for buffer_location; the existing SYCL_INTEL_buffer_location extension could be extended to mention that buffer_location can be applied to usm malloc calls if sycl_ext_oneapi_usm_properties is supported. |
sycl/doc/extensions/proposed/sycl_ext_oneapi_usm_properties.asciidoc
Outdated
Show resolved
Hide resolved
|
||
This extension introduces an alternative way to pass both runtime and compile time properties into USM malloc APIs. | ||
|
||
`malloc_device`, `malloc_host`, `malloc_shared` take the properties, pass runtime properties onto runtime libraries and create annotated_ptr base on compile-time properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mention that run time properties are passed to the runtime; these are literally runtime APIs so that's self evident. Returning an annotated_ptr is a change in API though relative to the other malloc overloads and is the vehicle through which this information makes it to the compiler so it makes sense to mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions! now phrased as:
malloc_device
, malloc_host
, malloc_shared
take the properties, and return an annotated_ptr that carries the passed in compile-time properties.
|
||
`malloc_device`, `malloc_host`, `malloc_shared` take the properties, pass runtime properties onto runtime libraries and create annotated_ptr base on compile-time properties. | ||
|
||
The goal of these changes is to enable information about properties to propagate to the device compiler and thereby enable additional optimization of kernel code. Further, this will also be used by the runtime for allocation in the correct target memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the second sentence because (1) run time properties can already be passed to the runtime using the existing malloc overloads so something like this can already be accomplished with the existing API and (2) per my earlier comment, I don't think this extension should mention buffer_location.
|
||
This extension is written against the SYCL 2020 specification, Revision 4 and the following extensions: | ||
|
||
- link:sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link is broken. The spec was moved from "proposed" to "experimental" recently so that's probably why.
72ac523
to
fc9c779
Compare
Thanks! Yes the change is up in another PR: #5661 |
IMPORTANT: This specification is a draft. | ||
|
||
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are | ||
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. | ||
used by permission by Khronos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This IMPORTANT note and the Khronos attribution are repeated lower down in the spec, so remove these.
This "Introduction" section does not follow the extension spec template. I think it would be better to move it to the "Overview" section later in the spec as shown in the template.
namespace sycl::ext::oneapi::experimental { | ||
|
||
// Available only when is_property_list<PropertyListT>::value is true | ||
template <typename T = void*, typename PropertyListT = property_list_t<>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename T = void*, typename PropertyListT = property_list_t<>> | |
template <typename T = void, typename PropertyListT> |
I think you probably intended the default value of T
to be void
not void *
. As it is now, this will create an annotated pointer to a void *
value (i.e. void **
).
I don't think you want a default value for PropertyListT
, do you? Don't you always want this to be deduced from the PropList
parameter?
} // namespace sycl::ext::oneapi::experimental | ||
---- | ||
|
||
The same setup is applied to other overloads of malloc APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't say this. Instead, list every API this extension introduces.
|
||
The same setup is applied to other overloads of malloc APIs. | ||
|
||
Compile time properties can be pass into runtime properties within the allocation function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this sentence given it is more of an implementation detail specific to subset of the properties.
Lower level runtime's usm allocation API now supports pass in of buffer location property [1] defined in OpenCL spec [2] For this feature to be accessible to users, sycl runtime now passes in usm::buffer_location property defined in sycl spec [3] into the opencl runtime calls, the property is only passed if the extension name occurs in the platform extension string. If the lower level runtime does not support such property, then the property will yield no effect. This is a temporary solution to allow user to specify which memory location the device usm allocation should be in. The full solution will require retuning an annotated_ptr that carries compile time properties for further optimization. The full solution spec is in [4] [1] https://github.com/intel/fpga-runtime-for-opencl/pull/46/files [2] https://github.com/KhronosGroup/OpenCL-Docs/blob/master/extensions/cl_intel_mem_alloc_buffer_location.asciidoc [3] #5665 [4] #5656
@tiwaria1 and @sherry-yuan, do we want to prevent properties that aren't related to the memory allocation from being accepted by the USM malloc calls? I think doing so will require users to write longer code but that code will be more readable. For example, let's say we have two properties: align and restrict with the typical meanings. Align is a property of the allocation, but restrict is a property of how the pointer is used. So, conceptually, it makes sense to pass align to the allocation but not restrict. If we allow all properties to be passed to the malloc, a user who wants to apply both properties can do the following:
On the other hand, if we only allow properties that affect the allocation to be passed to malloc, the user has to write this longer code which might be more clear:
What are your thoughts? |
|
||
The same setup is applied to other overloads of malloc APIs. | ||
|
||
The returned `annotated_ptr` contains the USM pointers and the passed in compile-time properties to enable additional compiler optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply that the annotated_ptr can only contain those properties passed in to the properties argument of the malloc call. I can think of one situation where we don't want to be that restrictive: if we add a property to encode if an allocation is device vs. host vs. shared we'd like to be able to add that to the annotated_ptr based on which malloc call was made even if the user didn't put the associated property in the properties argument.
@GarveyJoe Just to clarify: The usm malloc in the following image is the normal usm allocation (not the one returning annotated_ptr)? If it is the original malloc, then how do we want the new usm allocation to behave (new malloc return annotated_ptr)? Edit: I see what you mean, in the longer version, the new malloc returns normal pointer, not the annotated pointer. But the first simpler version returns the annotated_ptr. The difference will be the return type of new malloc function. If this is the case, then would the allocation property ever be compile time properties? If so, I think the simpler (first) version need to be chosen. For example, "buffer_location" property is both compile time and allocation time. We don't want user to pass buffer location after calling malloc. Therefore, I would choose not to separate these properties. Open to other suggestions. |
In my example all malloc calls are the new version that returns an annotated_ptr, that's why they take a properties argument and not a property_list.
I think it should be. I'm not sure if the latest version of the spec allows it yet, but Abhishek and I have chatted about it and if not, I'll comment on the PR of the annotated_ptr spec asking for this to be allowed.
My intention is that when you construct an annotated_ptr from an annotated_ptr the result contains the union of the properties of the original annotated_ptr and the new properties you provide to the constructor. |
Thanks for the clarification @GarveyJoe. If this is supported by annotated pointer, then I think it is overall a good idea. This setup prevents the user from unexpected no-ops. I think it also allows the same pointer to be used in different ways without allocating multiple one of them (eg. maybe like the following)
P.S I know the read/write property is not existing in USM world, but it is just an illustrative example. Open to other suggestions/opinions |
I agree, I think the latter is cleaner design. I have provided a ctor in annotated_ptr p{malloc_device<int>(N, q, properties{alignment<1024>}), properties{kernel_arg_restrict});
annotated_ptr q{malloc_device<int>(N, q, properties{alignment<1024>}), properties{kernel_arg_restrict});
q.single_task([=] {
// use p and q
}); All the braces and angle brackets may look unpleasant however this seems conceptually very clear to me. |
Closing this PR because it has been replaced by #6346 |
We need buffer location property to be passed as both a compile time and runtime property to the malloc apis.
The malloc api returns annotated_ptr that contains compile-time constant information, while the runtime property passed down to runtime library's alloc functions.
This is more of a full solution for passing compile time & runtime property to usm malloc.
#5634 is a stopgap version of this.
A related change in buffer location property doc in order to pass it into malloc API: #5661