Skip to content

Conversation

@yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Nov 5, 2025

This PR focuses on the src/ATen/native/xpu/sycl/Atomics.h file, aiming to fully implement and enhance atomic operations on Shared Local Memory SMEM to support performance optimizations in upper-layer kernels.

  1. Implementation of Local AtomicCAS (Compare-and-Swap) Mechanism
  • The PR introduces generic AtomicCASInteger and AtomicCASFP structs.
  • It implements CAS operations on SMEM for all major data types, including int64, float, and Half.
  • The implementation uses either native SYCL CAS Soft RMW loops to ensure correctness across different data widths.
  1. Completeness and Correction of Local AtomicAdd Operations
  • The PR provides direct SYCL implementations for atomicAddLocal for basic types like float, double, and int.
  • It ensures the correct accumulation of half-precision floats like Half and BFloat16 on SMEM using a CAS loop mechanism.

This foundational work is crucial for enabling high-performance caching logic in operators like index_add.

@yucai-intel yucai-intel changed the title Add acomicCAS() and unify atomicadd() interface Low-Level XPU Local Atomic Enhancement for Add & CAS Nov 5, 2025
@CuiYifeng CuiYifeng requested a review from Copilot November 26, 2025 13:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances atomic operations on XPU Shared Local Memory (SMEM) by implementing Compare-and-Swap (CAS) operations and completing local atomic add support for various data types. The changes enable high-performance caching logic in operators like index_add by providing foundation-level atomic primitives.

Key Changes:

  • Introduced generic AtomicCASInteger and AtomicCASFP template structures supporting CAS operations on local memory for integer and floating-point types (including Half/BFloat16)
  • Completed atomicAddLocal implementations for basic types (float, double, int variants) and half-precision types using CAS-based loops
  • Fixed macro naming inconsistency by renaming SYCL_ATOMIC_INTEGER_LOCAL output from atomic##NAME to atomic##NAME##Local

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const sycl_local_ptr<int64_t>& address,
int64_t val) {
sycl_atomic_ref_rlx_wg_local_t<int64_t> target(*address);
target.fetch_add(val);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line calls fetch_add in atomicMax function, but should call fetch_max instead. The incorrect operation will not produce maximum values as intended.

Copilot uses AI. Check for mistakes.
Comment on lines +774 to +775
unsigned int expected_ui = *((unsigned int*)&expected);
newval = *((unsigned int*)&desired);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C-style pointer casts for type punning are undefined behavior in C++. Use std::bit_cast (C++20) or memcpy for safe type reinterpretation between float and unsigned int.

Copilot uses AI. Check for mistakes.
Comment on lines +803 to +804
unsigned long long expected_ull = *((unsigned long long*)&expected);
newval = *((unsigned long long*)&desired);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C-style pointer casts for type punning are undefined behavior in C++. Use std::bit_cast (C++20) or memcpy for safe type reinterpretation between double and unsigned long long.

Copilot uses AI. Check for mistakes.
if (assumed == expected_ui) {
return expected;
} else {
return *((T*)&assumed);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C-style pointer cast for type punning is undefined behavior. Use std::bit_cast (C++20) or memcpy to safely reinterpret unsigned int bits as type T (float).

Copilot uses AI. Check for mistakes.
if (assumed == expected_ull) {
return expected;
} else {
return *((T*)&assumed);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C-style pointer cast for type punning is undefined behavior. Use std::bit_cast (C++20) or memcpy to safely reinterpret unsigned long long bits as type T (double).

Copilot uses AI. Check for mistakes.

do {
newval = assumed;
at::Half hsum;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the motivation behind declaring at::Half hsum in the loop body?

Comment on lines +774 to +775
unsigned int expected_ui = *((unsigned int*)&expected);
newval = *((unsigned int*)&desired);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if such casts will cause undefined behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants