-
Notifications
You must be signed in to change notification settings - Fork 61
Low-Level XPU Local Atomic Enhancement for Add & CAS #2293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances atomic operations on XPU Shared Local Memory (SMEM) by implementing Compare-and-Swap (CAS) operations and completing local atomic add support for various data types. The changes enable high-performance caching logic in operators like index_add by providing foundation-level atomic primitives.
Key Changes:
- Introduced generic
AtomicCASIntegerandAtomicCASFPtemplate structures supporting CAS operations on local memory for integer and floating-point types (including Half/BFloat16) - Completed
atomicAddLocalimplementations for basic types (float, double, int variants) and half-precision types using CAS-based loops - Fixed macro naming inconsistency by renaming
SYCL_ATOMIC_INTEGER_LOCALoutput fromatomic##NAMEtoatomic##NAME##Local
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const sycl_local_ptr<int64_t>& address, | ||
| int64_t val) { | ||
| sycl_atomic_ref_rlx_wg_local_t<int64_t> target(*address); | ||
| target.fetch_add(val); |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line calls fetch_add in atomicMax function, but should call fetch_max instead. The incorrect operation will not produce maximum values as intended.
| unsigned int expected_ui = *((unsigned int*)&expected); | ||
| newval = *((unsigned int*)&desired); |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-style pointer casts for type punning are undefined behavior in C++. Use std::bit_cast (C++20) or memcpy for safe type reinterpretation between float and unsigned int.
| unsigned long long expected_ull = *((unsigned long long*)&expected); | ||
| newval = *((unsigned long long*)&desired); |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-style pointer casts for type punning are undefined behavior in C++. Use std::bit_cast (C++20) or memcpy for safe type reinterpretation between double and unsigned long long.
| if (assumed == expected_ui) { | ||
| return expected; | ||
| } else { | ||
| return *((T*)&assumed); |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-style pointer cast for type punning is undefined behavior. Use std::bit_cast (C++20) or memcpy to safely reinterpret unsigned int bits as type T (float).
| if (assumed == expected_ull) { | ||
| return expected; | ||
| } else { | ||
| return *((T*)&assumed); |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-style pointer cast for type punning is undefined behavior. Use std::bit_cast (C++20) or memcpy to safely reinterpret unsigned long long bits as type T (double).
|
|
||
| do { | ||
| newval = assumed; | ||
| at::Half hsum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the motivation behind declaring at::Half hsum in the loop body?
| unsigned int expected_ui = *((unsigned int*)&expected); | ||
| newval = *((unsigned int*)&desired); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if such casts will cause undefined behavior.
This PR focuses on the src/ATen/native/xpu/sycl/Atomics.h file, aiming to fully implement and enhance atomic operations on Shared Local Memory SMEM to support performance optimizations in upper-layer kernels.
This foundational work is crucial for enabling high-performance caching logic in operators like index_add.