-
Notifications
You must be signed in to change notification settings - Fork 769
Support buffer location on CUDA #5827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Eventually, this will be supported through the full solution to new usm API (spec here: #5656). Once that is done, the properties of
Thanks to @GarveyJoe for the suggested solution above! |
In the code snippet, the compiler returns the read-only property only after it analyzes the kernel. Is that right ? |
I believe it is user who provide the property to malloc API (I updated the example code a little so that it is clear). The user provided information (readonly & noalias) can be used in compiler in deciding whether the optimization (eg allocating it in constant memory) is possible. |
It is clear. Thanks. |
Let's be careful not to mix up statically allocated const memory and non-statically allocated read-only caching (texture cache in cuda).
in CUDA runtime API maps to:
in SYCL, and this will be enabled with this PR #7946 Now at the moment statically allocating cuda global memory is not possible in DPC++, and I do not know that it maps to anything in the SYCL spec. I.e. there is no SYCL analogue to:
However my understanding of the first message in this issue is that this was not the request. The request appears to be able to malloc memory that is guaranteed to use the texture memory cache in the cuda case. This is largely overlapping with what this extension is for: #7397. I am now concerned that #7397 and the "restrict" properties usage in https://github.com/tiwaria1/llvm/blob/36d521e0edef3fab4444e2964c24aa5f10879f63/sycl/doc/extensions/proposed/sycl_ext_oneapi_kernel_arg_properties.asciidoc are close to being duplicates.
in a kernel would require in the cuda backend that we implicitly call the This brings up all kinds of questions:
cc @gmlueck |
Isn't this what Regarding the use of |
Thanks, I didn't know about that extension. cc @jchlanda
OK that might be a good idea. It would be good to learn a bit more about caching on Intel devices. Is there some good documentation on this anywhere? |
When this issue/feature request was opened, the ask was for a "temporary" feature to enable a developer to allocate memory in constant memory for the CUDA backend. Looking at the first comments made to this PR, this feature would be supported via the new USM API extension. The extension mentioned by @sherry-yuan is now closed and has been handed over to @jessicadavies-intel. Is there any ongoing implementation work for these extensions? Moreover, as @JackAKirk and @gmlueck mentioned, ongoing works are closely in line with this issue but not solving that issue "directly". The question here is, where do we stand with this issue? Is this feature ask still relevant, only for the CUDA backend without the extensions (like I believe the issue was opened for)? Is this being handled by the works mentioned in the latest comments? |
When a feature to enable a developer to allocate memory in constant memory for the CUDA backend is available, please let us know. Thank you. In CUDA, users often allocate memory in constant memory in CUDA:
|
I list the results on a V100 GPU for the example: https://github.com/zjin-lcf/HeCBench/blob/master/cmembench-cuda/main.cu cuda dpct |
Uh oh!
There was an error while loading. Please reload this page.
Cuda support allocation in global memory and cache, the ask is to let cuda usm allocation reserve cache memory when buffer location property of value of 4 is passed in.
change will need to be made here:
llvm/sycl/plugins/cuda/pi_cuda.cpp
Lines 4573 to 4593 in d47dda3
The text was updated successfully, but these errors were encountered: