-
Notifications
You must be signed in to change notification settings - Fork 768
Add prefetch for HIP USM allocations #10430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor stylish stuff
c0bd637
to
3187744
Compare
This change is necessary to workaround a delightful bug in either HIP runtime, or the HIP spec. It's discussed at length in github.com/intel/issues/7252 but for the purposes of this patch, it suffices to say that a call to `hipMemPrefetchAsync` is *required* for correctness in the face of global atomic operations on (*at least*) shared USM allocations. The architecture of this change is slightly strange on first sight in that we reduntantly track allocation information in several places. The context now keeps track of all USM mappings. We require a mapping of pointers to the allocated size, but these allocations aren't pinned to any particular queue or HIP stream. The `hipMemPrefetchAsync`, however, requires the associated HIP stream object, and the size of the allocation. The stream comes hot-off-the-queue *only* just before a kernel is launched, so we need to defer the prefetch until we have that information. Finally, the kernel itself keeps track of pointer arguments in a more accessible way so we can determine which of the kernel's pointer arguments do, in-fact, point to USM allocations.
@intel/llvm-reviewers-runtime ping |
What if I pass that pointer via memory and not through kernel argument/capture? |
Then it won't work. As I see it, there are basically two options for how to solve this problem:
For now, I think (1) is acceptable as most things already work - the atomics test cases linked in issue #7252 are a notable exception which motivated this patch |
friendly ping @jandres742 @aelovikov-intel @intel/llvm-reviewers-runtime |
@aelovikov-intel is this good enough for merge, please? |
This change is necessary to workaround a delightful bug in either HIP runtime, or the HIP spec. It's discussed at length in github.com/intel/issues/7252 but for the purposes of this patch, it suffices to say that a call to `hipMemPrefetchAsync` is *required* for correctness in the face of global atomic operations on (*at least*) shared USM allocations. The architecture of this change is slightly strange on first sight in that we reduntantly track allocation information in several places. The context now keeps track of all USM mappings. We require a mapping of pointers to the allocated size, but these allocations aren't pinned to any particular queue or HIP stream. The `hipMemPrefetchAsync`, however, requires the associated HIP stream object, and the size of the allocation. The stream comes hot-off-the-queue *only* just before a kernel is launched, so we need to defer the prefetch until we have that information. Finally, the kernel itself keeps track of pointer arguments in a more accessible way so we can determine which of the kernel's pointer arguments do, in-fact, point to USM allocations.
This change is necessary to workaround a delightful bug in either HIP runtime, or the HIP spec.
It's discussed at length in github.com//issues/7252 but for the purposes of this patch, it suffices to say that a call to
hipMemPrefetchAsync
is required for correctness in the face of global atomic operations on (at least) shared USM allocations.The architecture of this change is slightly strange on first sight in that we reduntantly track allocation information in several places. The context now keeps track of all USM mappings. We require a mapping of pointers to the allocated size, but these allocations aren't pinned to any particular queue or HIP stream.
The
hipMemPrefetchAsync
, however, requires the associated HIP stream object, and the size of the allocation. The stream comes hot-off-the-queue only just before a kernel is launched, so we need to defer the prefetch until we have that information.Finally, the kernel itself keeps track of pointer arguments in a more accessible way so we can determine which of the kernel's pointer arguments do, in-fact, point to USM allocations.