Skip to content

Adding compile time and runtime properties for usm allocations #5656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

sherry-yuan
Copy link
Contributor

@sherry-yuan sherry-yuan commented Feb 24, 2022

We need buffer location property to be passed as both a compile time and runtime property to the malloc apis.

The malloc api returns annotated_ptr that contains compile-time constant information, while the runtime property passed down to runtime library's alloc functions.

This is more of a full solution for passing compile time & runtime property to usm malloc.

#5634 is a stopgap version of this.

A related change in buffer location property doc in order to pass it into malloc API: #5661

@sherry-yuan
Copy link
Contributor Author

sherry-yuan commented Feb 24, 2022

@GarveyJoe
Copy link
Contributor

I think this spec should solely cover the new malloc overloads that accept a properties argument and all references to buffer_location should be removed. Buffer_location is an INTEL extension that is intended only for FPGAs while malloc accepting properties is a oneAPI extension that is more generally useful across targets. We don't want to merge the two as some implementations might want to support properties on malloc without supporting buffer_location. I don't even think we need a new extension for buffer_location; the existing SYCL_INTEL_buffer_location extension could be extended to mention that buffer_location can be applied to usm malloc calls if sycl_ext_oneapi_usm_properties is supported.


This extension introduces an alternative way to pass both runtime and compile time properties into USM malloc APIs.

`malloc_device`, `malloc_host`, `malloc_shared` take the properties, pass runtime properties onto runtime libraries and create annotated_ptr base on compile-time properties.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mention that run time properties are passed to the runtime; these are literally runtime APIs so that's self evident. Returning an annotated_ptr is a change in API though relative to the other malloc overloads and is the vehicle through which this information makes it to the compiler so it makes sense to mention it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! now phrased as:

malloc_device, malloc_host, malloc_shared take the properties, and return an annotated_ptr that carries the passed in compile-time properties.


`malloc_device`, `malloc_host`, `malloc_shared` take the properties, pass runtime properties onto runtime libraries and create annotated_ptr base on compile-time properties.

The goal of these changes is to enable information about properties to propagate to the device compiler and thereby enable additional optimization of kernel code. Further, this will also be used by the runtime for allocation in the correct target memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the second sentence because (1) run time properties can already be passed to the runtime using the existing malloc overloads so something like this can already be accomplished with the existing API and (2) per my earlier comment, I don't think this extension should mention buffer_location.


This extension is written against the SYCL 2020 specification, Revision 4 and the following extensions:

- link:sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is broken. The spec was moved from "proposed" to "experimental" recently so that's probably why.

@sherry-yuan
Copy link
Contributor Author

the existing SYCL_INTEL_buffer_location extension could be extended

Thanks! Yes the change is up in another PR: #5661

Comment on lines 21 to 25
IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IMPORTANT note and the Khronos attribution are repeated lower down in the spec, so remove these.

This "Introduction" section does not follow the extension spec template. I think it would be better to move it to the "Overview" section later in the spec as shown in the template.

namespace sycl::ext::oneapi::experimental {

// Available only when is_property_list<PropertyListT>::value is true
template <typename T = void*, typename PropertyListT = property_list_t<>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T = void*, typename PropertyListT = property_list_t<>>
template <typename T = void, typename PropertyListT>

I think you probably intended the default value of T to be void not void *. As it is now, this will create an annotated pointer to a void * value (i.e. void **).

I don't think you want a default value for PropertyListT, do you? Don't you always want this to be deduced from the PropList parameter?

} // namespace sycl::ext::oneapi::experimental
----

The same setup is applied to other overloads of malloc APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't say this. Instead, list every API this extension introduces.


The same setup is applied to other overloads of malloc APIs.

Compile time properties can be pass into runtime properties within the allocation function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this sentence given it is more of an implementation detail specific to subset of the properties.

bader pushed a commit that referenced this pull request Mar 9, 2022
Lower level runtime's usm allocation API now supports pass in of buffer location property [1] defined in OpenCL spec [2]

For this feature to be accessible to users, sycl runtime now passes in usm::buffer_location property defined in sycl spec [3] into the opencl runtime calls, the property is only passed if the extension name occurs in the platform extension string. If the lower level runtime does not support such property, then the property will yield no effect.

This is a temporary solution to allow user to specify which memory location the device usm allocation should be in. The full solution will require retuning an annotated_ptr that carries compile time properties for further optimization. The full solution spec is in [4]

[1] https://github.com/intel/fpga-runtime-for-opencl/pull/46/files
[2] https://github.com/KhronosGroup/OpenCL-Docs/blob/master/extensions/cl_intel_mem_alloc_buffer_location.asciidoc
[3] #5665
[4] #5656
@GarveyJoe
Copy link
Contributor

GarveyJoe commented Apr 6, 2022

@tiwaria1 and @sherry-yuan, do we want to prevent properties that aren't related to the memory allocation from being accepted by the USM malloc calls? I think doing so will require users to write longer code but that code will be more readable. For example, let's say we have two properties: align and restrict with the typical meanings. Align is a property of the allocation, but restrict is a property of how the pointer is used. So, conceptually, it makes sense to pass align to the allocation but not restrict. If we allow all properties to be passed to the malloc, a user who wants to apply both properties can do the following:

/* using various namespaces */
auto p = malloc_device<int>(N, q, properties{align<1024>, restrict});
auto q = malloc_device<int>(N, q, properties{align<1024>, restrict});
q.single_task([=] {
  // use p and q
});

On the other hand, if we only allow properties that affect the allocation to be passed to malloc, the user has to write this longer code which might be more clear:

auto p_alloc = malloc_device<int>(N, q, properties{align<1024>});
auto q_alloc = malloc_device<int>(N, q, properties{align<1024>});
annotated_ptr p{p_alloc, properties{restrict});
annotated_ptr q{q_alloc, properties{restrict});
q.single_task([=] {
  // use p and q
});

What are your thoughts?


The same setup is applied to other overloads of malloc APIs.

The returned `annotated_ptr` contains the USM pointers and the passed in compile-time properties to enable additional compiler optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that the annotated_ptr can only contain those properties passed in to the properties argument of the malloc call. I can think of one situation where we don't want to be that restrictive: if we add a property to encode if an allocation is device vs. host vs. shared we'd like to be able to add that to the annotated_ptr based on which malloc call was made even if the user didn't put the associated property in the properties argument.

@sherry-yuan
Copy link
Contributor Author

sherry-yuan commented Apr 6, 2022

do we want to prevent properties that aren't related to the memory allocation from being accepted by the USM malloc calls?

@GarveyJoe Just to clarify: The usm malloc in the following image is the normal usm allocation (not the one returning annotated_ptr)?

If it is the original malloc, then how do we want the new usm allocation to behave (new malloc return annotated_ptr)?
If it is the new malloc (that return annotated_ptr), is annotated_ptr wrapping annotated_ptr allowed? Wouldn't the inner level of properties be lost? @tiwaria1 might know more.

image

Edit: I see what you mean, in the longer version, the new malloc returns normal pointer, not the annotated pointer. But the first simpler version returns the annotated_ptr. The difference will be the return type of new malloc function. If this is the case, then would the allocation property ever be compile time properties? If so, I think the simpler (first) version need to be chosen. For example, "buffer_location" property is both compile time and allocation time. We don't want user to pass buffer location after calling malloc.

Therefore, I would choose not to separate these properties. Open to other suggestions.

@GarveyJoe
Copy link
Contributor

In my example all malloc calls are the new version that returns an annotated_ptr, that's why they take a properties argument and not a property_list.

If it is the new malloc (that return annotated_ptr), is annotated_ptr wrapping annotated_ptr allowed?

I think it should be. I'm not sure if the latest version of the spec allows it yet, but Abhishek and I have chatted about it and if not, I'll comment on the PR of the annotated_ptr spec asking for this to be allowed.

Wouldn't the inner level of properties be lost?

My intention is that when you construct an annotated_ptr from an annotated_ptr the result contains the union of the properties of the original annotated_ptr and the new properties you provide to the constructor.

@sherry-yuan
Copy link
Contributor Author

sherry-yuan commented Apr 6, 2022

If it is the new malloc (that return annotated_ptr), is annotated_ptr wrapping annotated_ptr allowed?

I think it should be.

Thanks for the clarification @GarveyJoe. If this is supported by annotated pointer, then I think it is overall a good idea. This setup prevents the user from unexpected no-ops. I think it also allows the same pointer to be used in different ways without allocating multiple one of them (eg. maybe like the following)

auto p_alloc = malloc_device<int>(N, q, properties{align<1024>});
annotated_ptr p_read{p_alloc, properties{host_read});
annotated_ptr p_write{q_alloc, properties{host_write});
q.single_task([=] {
  // use p and q
});

P.S I know the read/write property is not existing in USM world, but it is just an illustrative example. Open to other suggestions/opinions

@tiwaria1
Copy link
Contributor

@tiwaria1 and @sherry-yuan, do we want to prevent properties that aren't related to the memory allocation from being accepted by the USM malloc calls? I think doing so will require users to write longer code but that code will be more readable. For example, let's say we have two properties: align and restrict with the typical meanings. Align is a property of the allocation, but restrict is a property of how the pointer is used. So, conceptually, it makes sense to pass align to the allocation but not restrict. If we allow all properties to be passed to the malloc, a user who wants to apply both properties can do the following:

/* using various namespaces */
auto p = malloc_device<int>(N, q, properties{align<1024>, restrict});
auto q = malloc_device<int>(N, q, properties{align<1024>, restrict});
q.single_task([=] {
  // use p and q
});

On the other hand, if we only allow properties that affect the allocation to be passed to malloc, the user has to write this longer code which might be more clear:

auto p_alloc = malloc_device<int>(N, q, properties{align<1024>});
auto q_alloc = malloc_device<int>(N, q, properties{align<1024>});
annotated_ptr p{p_alloc, properties{restrict});
annotated_ptr q{q_alloc, properties{restrict});
q.single_task([=] {
  // use p and q
});

What are your thoughts?

I agree, I think the latter is cleaner design. I have provided a ctor in annotated_ptr to support such use cases.
We can coach users to write the following or they may even naturally start writing the following:

annotated_ptr p{malloc_device<int>(N, q, properties{alignment<1024>}), properties{kernel_arg_restrict});
annotated_ptr q{malloc_device<int>(N, q, properties{alignment<1024>}), properties{kernel_arg_restrict});
q.single_task([=] {
  // use p and q
});

All the braces and angle brackets may look unpleasant however this seems conceptually very clear to me.

@jessicadavies-intel
Copy link
Contributor

Closing this PR because it has been replaced by #6346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants