Skip to content

[clang] [libc++] fix _Atomic c11 compare exchange does not update expected results #78707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

huixie90
Copy link
Member

@huixie90 huixie90 commented Jan 19, 2024

fixes #30023

The issue is that for compare exchange builtin, if the type's size is not power of 2, it creates a temporary of size power of 2, then emit the compare exchange operation. And later, the results of the compare exchange operation has two components: 1. a boolean whether or not the exchange happens. 2. the old value
we are supposed to write the old value into user's "expected" value. However, in case the type is not power of 2, what we actually wrote to is the temporary that was created.

The fix is to pass the "expected" address all the way down so it can wrote to the correct address

Copy link

github-actions bot commented Jan 19, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp libcxx/test/std/atomics/atomics.types.generic/30023.pass.cpp clang/lib/CodeGen/CGAtomic.cpp
View the diff from clang-format here.
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 5569c6bc8..a60b01add 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -374,11 +374,9 @@ bool AtomicInfo::emitMemSetZeroIfNecessary() const {
 }
 
 static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
-                              Address Dest, Address Ptr,
-                              Address Val1, Address Val2,
-                              Address ExpectedResult,
-                              uint64_t Size,
-                              llvm::AtomicOrdering SuccessOrder,
+                              Address Dest, Address Ptr, Address Val1,
+                              Address Val2, Address ExpectedResult,
+                              uint64_t Size, llvm::AtomicOrdering SuccessOrder,
                               llvm::AtomicOrdering FailureOrder,
                               llvm::SyncScope::ID Scope) {
   // Note that cmpxchg doesn't support weak cmpxchg, at least at the moment.
@@ -412,33 +410,30 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 
   CGF.Builder.SetInsertPoint(StoreExpectedBB);
   // Update the memory at Expected with Old's value.
-llvm::Type *ExpectedType = ExpectedResult.getElementType();
-const llvm::DataLayout &DL = CGF.CGM.getDataLayout();
-uint64_t ExpectedSizeInBytes = DL.getTypeStoreSize(ExpectedType);
-
-if (ExpectedSizeInBytes == Size) {
-  // Sizes match: store directly
-  CGF.Builder.CreateStore(Old, ExpectedResult);
-
-} else {
-  // store only the first ExpectedSizeInBytes bytes of Old
-  llvm::Type *OldType = Old->getType();
-
-  // Allocate temporary storage for Old value
-  Address OldTmp = CGF.CreateTempAlloca(OldType, Ptr.getAlignment(), "old.tmp");
-
-  // Store Old into this temporary
-  CGF.Builder.CreateStore(Old, OldTmp);
-
-  // Perform memcpy for first ExpectedSizeInBytes bytes
-  CGF.Builder.CreateMemCpy(
-    ExpectedResult,
-    OldTmp,
-    ExpectedSizeInBytes,
-    /*isVolatile=*/false);
-}
+  llvm::Type *ExpectedType = ExpectedResult.getElementType();
+  const llvm::DataLayout &DL = CGF.CGM.getDataLayout();
+  uint64_t ExpectedSizeInBytes = DL.getTypeStoreSize(ExpectedType);
+
+  if (ExpectedSizeInBytes == Size) {
+    // Sizes match: store directly
+    CGF.Builder.CreateStore(Old, ExpectedResult);
+
+  } else {
+    // store only the first ExpectedSizeInBytes bytes of Old
+    llvm::Type *OldType = Old->getType();
+
+    // Allocate temporary storage for Old value
+    Address OldTmp =
+        CGF.CreateTempAlloca(OldType, Ptr.getAlignment(), "old.tmp");
+
+    // Store Old into this temporary
+    CGF.Builder.CreateStore(Old, OldTmp);
+
+    // Perform memcpy for first ExpectedSizeInBytes bytes
+    CGF.Builder.CreateMemCpy(ExpectedResult, OldTmp, ExpectedSizeInBytes,
+                             /*isVolatile=*/false);
+  }
 
-  
   // Finally, branch to the exit point.
   CGF.Builder.CreateBr(ContinueBB);
 
@@ -450,14 +445,11 @@ if (ExpectedSizeInBytes == Size) {
 /// Given an ordering required on success, emit all possible cmpxchg
 /// instructions to cope with the provided (but possibly only dynamically known)
 /// FailureOrder.
-static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
-                                        bool IsWeak, Address Dest, Address Ptr,
-                                        Address Val1, Address Val2,
-                                        Address ExpectedResult,
-                                        llvm::Value *FailureOrderVal,
-                                        uint64_t Size,
-                                        llvm::AtomicOrdering SuccessOrder,
-                                        llvm::SyncScope::ID Scope) {
+static void emitAtomicCmpXchgFailureSet(
+    CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak, Address Dest, Address Ptr,
+    Address Val1, Address Val2, Address ExpectedResult,
+    llvm::Value *FailureOrderVal, uint64_t Size,
+    llvm::AtomicOrdering SuccessOrder, llvm::SyncScope::ID Scope) {
   llvm::AtomicOrdering FailureOrder;
   if (llvm::ConstantInt *FO = dyn_cast<llvm::ConstantInt>(FailureOrderVal)) {
     auto FOS = FO->getSExtValue();
@@ -484,8 +476,8 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
     // success argument". This condition has been lifted and the only
     // precondition is 31.7.2.18. Effectively treat this as a DR and skip
     // language version checks.
-    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
-                      FailureOrder, Scope);
+    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult,
+                      Size, SuccessOrder, FailureOrder, Scope);
     return;
   }
 
@@ -509,18 +501,19 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
 
   // Emit all the different atomics
   CGF.Builder.SetInsertPoint(MonotonicBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult,
-                    Size, SuccessOrder, llvm::AtomicOrdering::Monotonic, Scope);
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size,
+                    SuccessOrder, llvm::AtomicOrdering::Monotonic, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(AcquireBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
-                    llvm::AtomicOrdering::Acquire, Scope);
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size,
+                    SuccessOrder, llvm::AtomicOrdering::Acquire, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(SeqCstBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
-                    llvm::AtomicOrdering::SequentiallyConsistent, Scope);
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size,
+                    SuccessOrder, llvm::AtomicOrdering::SequentiallyConsistent,
+                    Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(ContBB);
@@ -552,9 +545,9 @@ static llvm::Value *EmitPostAtomicMinMax(CGBuilderTy &Builder,
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
-                         Address ExpectedResult,
-                         llvm::Value *IsWeak, llvm::Value *FailureOrder,
-                         uint64_t Size, llvm::AtomicOrdering Order,
+                         Address ExpectedResult, llvm::Value *IsWeak,
+                         llvm::Value *FailureOrder, uint64_t Size,
+                         llvm::AtomicOrdering Order,
                          llvm::SyncScope::ID Scope) {
   llvm::AtomicRMWInst::BinOp Op = llvm::AtomicRMWInst::Add;
   bool PostOpMinMax = false;
@@ -569,13 +562,15 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
     emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                ExpectedResult, FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order,
+                                Scope);
     return;
   case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
   case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
     emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                ExpectedResult, FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order,
+                                Scope);
     return;
   case AtomicExpr::AO__atomic_compare_exchange:
   case AtomicExpr::AO__atomic_compare_exchange_n:
@@ -583,7 +578,8 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__scoped_atomic_compare_exchange_n: {
     if (llvm::ConstantInt *IsWeakC = dyn_cast<llvm::ConstantInt>(IsWeak)) {
       emitAtomicCmpXchgFailureSet(CGF, E, IsWeakC->getZExtValue(), Dest, Ptr,
-                                  Val1, Val2, ExpectedResult, FailureOrder, Size, Order, Scope);
+                                  Val1, Val2, ExpectedResult, FailureOrder,
+                                  Size, Order, Scope);
     } else {
       // Create all the relevant BB's
       llvm::BasicBlock *StrongBB =
@@ -597,12 +593,14 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
 
       CGF.Builder.SetInsertPoint(StrongBB);
       emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                  ExpectedResult, FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order,
+                                  Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(WeakBB);
       emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                  ExpectedResult, FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order,
+                                  Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(ContBB);
@@ -806,10 +804,9 @@ EmitValToTemp(CodeGenFunction &CGF, Expr *E) {
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
-                         Address OriginalVal1,
-                         llvm::Value *IsWeak, llvm::Value *FailureOrder,
-                         uint64_t Size, llvm::AtomicOrdering Order,
-                         llvm::Value *Scope) {
+                         Address OriginalVal1, llvm::Value *IsWeak,
+                         llvm::Value *FailureOrder, uint64_t Size,
+                         llvm::AtomicOrdering Order, llvm::Value *Scope) {
   auto ScopeModel = Expr->getScopeModel();
 
   // LLVM atomic instructions always have sync scope. If clang atomic
@@ -826,8 +823,8 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                                                    Order, CGF.getLLVMContext());
     else
       SS = llvm::SyncScope::System;
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
-                 Order, SS);
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                 FailureOrder, Size, Order, SS);
     return;
   }
 
@@ -836,8 +833,8 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
     auto SCID = CGF.getTargetHooks().getLLVMSyncScopeID(
         CGF.CGM.getLangOpts(), ScopeModel->map(SC->getZExtValue()),
         Order, CGF.CGM.getLLVMContext());
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
-                 Order, SCID);
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                 FailureOrder, Size, Order, SCID);
     return;
   }
 
@@ -862,12 +859,11 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
       SI->addCase(Builder.getInt32(S), B);
 
     Builder.SetInsertPoint(B);
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
-                 Order,
-                 CGF.getTargetHooks().getLLVMSyncScopeID(CGF.CGM.getLangOpts(),
-                                                         ScopeModel->map(S),
-                                                         Order,
-                                                         CGF.getLLVMContext()));
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                 FailureOrder, Size, Order,
+                 CGF.getTargetHooks().getLLVMSyncScopeID(
+                     CGF.CGM.getLangOpts(), ScopeModel->map(S), Order,
+                     CGF.getLLVMContext()));
     Builder.CreateBr(ContBB);
   }
 
@@ -1310,30 +1306,32 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     if (llvm::isValidAtomicOrderingCABI(ord))
       switch ((llvm::AtomicOrderingCABI)ord) {
       case llvm::AtomicOrderingCABI::relaxed:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                     llvm::AtomicOrdering::Monotonic, Scope);
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                     OrderFail, Size, llvm::AtomicOrdering::Monotonic, Scope);
         break;
       case llvm::AtomicOrderingCABI::consume:
       case llvm::AtomicOrderingCABI::acquire:
         if (IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                     llvm::AtomicOrdering::Acquire, Scope);
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                     OrderFail, Size, llvm::AtomicOrdering::Acquire, Scope);
         break;
       case llvm::AtomicOrderingCABI::release:
         if (IsLoad)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                     llvm::AtomicOrdering::Release, Scope);
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                     OrderFail, Size, llvm::AtomicOrdering::Release, Scope);
         break;
       case llvm::AtomicOrderingCABI::acq_rel:
         if (IsLoad || IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                     llvm::AtomicOrdering::AcquireRelease, Scope);
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                     OrderFail, Size, llvm::AtomicOrdering::AcquireRelease,
+                     Scope);
         break;
       case llvm::AtomicOrderingCABI::seq_cst:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                     OrderFail, Size,
                      llvm::AtomicOrdering::SequentiallyConsistent, Scope);
         break;
       }
@@ -1369,13 +1367,13 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
 
   // Emit all the different atomics
   Builder.SetInsertPoint(MonotonicBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-               llvm::AtomicOrdering::Monotonic, Scope);
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail,
+               Size, llvm::AtomicOrdering::Monotonic, Scope);
   Builder.CreateBr(ContBB);
   if (!IsStore) {
     Builder.SetInsertPoint(AcquireBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                 llvm::AtomicOrdering::Acquire, Scope);
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                 OrderFail, Size, llvm::AtomicOrdering::Acquire, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::consume),
                 AcquireBB);
@@ -1384,23 +1382,23 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad) {
     Builder.SetInsertPoint(ReleaseBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                 llvm::AtomicOrdering::Release, Scope);
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                 OrderFail, Size, llvm::AtomicOrdering::Release, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::release),
                 ReleaseBB);
   }
   if (!IsLoad && !IsStore) {
     Builder.SetInsertPoint(AcqRelBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-                 llvm::AtomicOrdering::AcquireRelease, Scope);
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak,
+                 OrderFail, Size, llvm::AtomicOrdering::AcquireRelease, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::acq_rel),
                 AcqRelBB);
   }
   Builder.SetInsertPoint(SeqCstBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
-               llvm::AtomicOrdering::SequentiallyConsistent, Scope);
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail,
+               Size, llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   Builder.CreateBr(ContBB);
   SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::seq_cst),
               SeqCstBB);

@huixie90 huixie90 changed the title [libc++] fix _Atomic c11 compare exchange does not update expected results [clang] [libc++] fix _Atomic c11 compare exchange does not update expected results Jun 7, 2025
@huixie90 huixie90 marked this pull request as ready for review June 7, 2025 07:32
@huixie90 huixie90 requested a review from a team as a code owner June 7, 2025 07:32
@llvmbot llvmbot added clang Clang issues not falling into any other category libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jun 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2025

@llvm/pr-subscribers-clang-codegen

Author: Hui (huixie90)

Changes

fixes #30023


Patch is 24.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78707.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGAtomic.cpp (+69-23)
  • (added) clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp (+126)
  • (added) libcxx/test/std/atomics/atomics.types.generic/30023.pass.cpp (+65)
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 51f0799a792fd..9eee834a520e0 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -376,6 +376,7 @@ bool AtomicInfo::emitMemSetZeroIfNecessary() const {
 static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
                               Address Dest, Address Ptr,
                               Address Val1, Address Val2,
+                              Address ExpectedResult,
                               uint64_t Size,
                               llvm::AtomicOrdering SuccessOrder,
                               llvm::AtomicOrdering FailureOrder,
@@ -411,7 +412,48 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 
   CGF.Builder.SetInsertPoint(StoreExpectedBB);
   // Update the memory at Expected with Old's value.
-  CGF.Builder.CreateStore(Old, Val1);
+llvm::Type *ExpectedType = ExpectedResult.getElementType();
+const llvm::DataLayout &DL = CGF.CGM.getDataLayout();
+uint64_t ExpectedSizeInBytes = DL.getTypeStoreSize(ExpectedType);
+
+if (ExpectedSizeInBytes == Size) {
+  // Sizes match: store directly
+  CGF.Builder.CreateStore(Old, ExpectedResult);
+
+} else {
+  // store only the first ExpectedSizeInBytes bytes of Old
+  llvm::Type *OldType = Old->getType();
+
+  llvm::Align SrcAlignLLVM = DL.getABITypeAlign(OldType);
+  llvm::Align DstAlignLLVM = DL.getABITypeAlign(ExpectedType);
+
+  clang::CharUnits SrcAlign = clang::CharUnits::fromQuantity(SrcAlignLLVM.value());
+  clang::CharUnits DstAlign = clang::CharUnits::fromQuantity(DstAlignLLVM.value());
+
+  // Allocate temporary storage for Old value
+  llvm::AllocaInst *Alloca = CGF.CreateTempAlloca(OldType, "old.tmp");
+
+  // Wrap into clang::CodeGen::Address with proper type and alignment
+  Address OldStorage(Alloca, OldType, SrcAlign);
+
+  // Store Old into this temporary
+  CGF.Builder.CreateStore(Old, OldStorage);
+
+  // Bitcast pointers to i8*
+  llvm::Type *I8PtrTy = llvm::PointerType::getUnqual(CGF.getLLVMContext());
+
+  llvm::Value *SrcPtr = CGF.Builder.CreateBitCast(OldStorage.getBasePointer(), I8PtrTy);
+  llvm::Value *DstPtr = CGF.Builder.CreateBitCast(ExpectedResult.getBasePointer(), I8PtrTy);
+
+  // Perform memcpy for first ExpectedSizeInBytes bytes
+  CGF.Builder.CreateMemCpy(
+    DstPtr, DstAlignLLVM,
+    SrcPtr, SrcAlignLLVM,
+    llvm::ConstantInt::get(CGF.IntPtrTy, ExpectedSizeInBytes),
+    /*isVolatile=*/false);
+}
+
+  
   // Finally, branch to the exit point.
   CGF.Builder.CreateBr(ContinueBB);
 
@@ -426,6 +468,7 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
                                         bool IsWeak, Address Dest, Address Ptr,
                                         Address Val1, Address Val2,
+                                        Address ExpectedResult,
                                         llvm::Value *FailureOrderVal,
                                         uint64_t Size,
                                         llvm::AtomicOrdering SuccessOrder,
@@ -456,7 +499,7 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
     // success argument". This condition has been lifted and the only
     // precondition is 31.7.2.18. Effectively treat this as a DR and skip
     // language version checks.
-    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                       FailureOrder, Scope);
     return;
   }
@@ -481,17 +524,17 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
 
   // Emit all the different atomics
   CGF.Builder.SetInsertPoint(MonotonicBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult,
                     Size, SuccessOrder, llvm::AtomicOrdering::Monotonic, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(AcquireBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                     llvm::AtomicOrdering::Acquire, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(SeqCstBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                     llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   CGF.Builder.CreateBr(ContBB);
 
@@ -524,6 +567,7 @@ static llvm::Value *EmitPostAtomicMinMax(CGBuilderTy &Builder,
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
+                         Address ExpectedResult,
                          llvm::Value *IsWeak, llvm::Value *FailureOrder,
                          uint64_t Size, llvm::AtomicOrdering Order,
                          llvm::SyncScope::ID Scope) {
@@ -540,13 +584,13 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
     emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order, Scope);
     return;
   case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
   case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
     emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order, Scope);
     return;
   case AtomicExpr::AO__atomic_compare_exchange:
   case AtomicExpr::AO__atomic_compare_exchange_n:
@@ -554,7 +598,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__scoped_atomic_compare_exchange_n: {
     if (llvm::ConstantInt *IsWeakC = dyn_cast<llvm::ConstantInt>(IsWeak)) {
       emitAtomicCmpXchgFailureSet(CGF, E, IsWeakC->getZExtValue(), Dest, Ptr,
-                                  Val1, Val2, FailureOrder, Size, Order, Scope);
+                                  Val1, Val2, ExpectedResult, FailureOrder, Size, Order, Scope);
     } else {
       // Create all the relevant BB's
       llvm::BasicBlock *StrongBB =
@@ -568,12 +612,12 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
 
       CGF.Builder.SetInsertPoint(StrongBB);
       emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                  FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order, Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(WeakBB);
       emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                  FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order, Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(ContBB);
@@ -777,6 +821,7 @@ EmitValToTemp(CodeGenFunction &CGF, Expr *E) {
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
+                         Address OriginalVal1,
                          llvm::Value *IsWeak, llvm::Value *FailureOrder,
                          uint64_t Size, llvm::AtomicOrdering Order,
                          llvm::Value *Scope) {
@@ -796,7 +841,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                                                    Order, CGF.getLLVMContext());
     else
       SS = llvm::SyncScope::System;
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order, SS);
     return;
   }
@@ -806,7 +851,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
     auto SCID = CGF.getTargetHooks().getLLVMSyncScopeID(
         CGF.CGM.getLangOpts(), ScopeModel->map(SC->getZExtValue()),
         Order, CGF.CGM.getLLVMContext());
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order, SCID);
     return;
   }
@@ -832,7 +877,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
       SI->addCase(Builder.getInt32(S), B);
 
     Builder.SetInsertPoint(B);
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order,
                  CGF.getTargetHooks().getLLVMSyncScopeID(CGF.CGM.getLangOpts(),
                                                          ScopeModel->map(S),
@@ -1036,6 +1081,7 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   LValue AtomicVal = MakeAddrLValue(Ptr, AtomicTy);
   AtomicInfo Atomics(*this, AtomicVal);
 
+  Address OriginalVal1 = Val1;
   if (ShouldCastToIntPtrTy) {
     Ptr = Atomics.castToAtomicIntPointer(Ptr);
     if (Val1.isValid())
@@ -1279,30 +1325,30 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     if (llvm::isValidAtomicOrderingCABI(ord))
       switch ((llvm::AtomicOrderingCABI)ord) {
       case llvm::AtomicOrderingCABI::relaxed:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Monotonic, Scope);
         break;
       case llvm::AtomicOrderingCABI::consume:
       case llvm::AtomicOrderingCABI::acquire:
         if (IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Acquire, Scope);
         break;
       case llvm::AtomicOrderingCABI::release:
         if (IsLoad)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Release, Scope);
         break;
       case llvm::AtomicOrderingCABI::acq_rel:
         if (IsLoad || IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::AcquireRelease, Scope);
         break;
       case llvm::AtomicOrderingCABI::seq_cst:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::SequentiallyConsistent, Scope);
         break;
       }
@@ -1338,12 +1384,12 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
 
   // Emit all the different atomics
   Builder.SetInsertPoint(MonotonicBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                llvm::AtomicOrdering::Monotonic, Scope);
   Builder.CreateBr(ContBB);
   if (!IsStore) {
     Builder.SetInsertPoint(AcquireBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::Acquire, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::consume),
@@ -1353,7 +1399,7 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad) {
     Builder.SetInsertPoint(ReleaseBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::Release, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::release),
@@ -1361,14 +1407,14 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad && !IsStore) {
     Builder.SetInsertPoint(AcqRelBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::AcquireRelease, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::acq_rel),
                 AcqRelBB);
   }
   Builder.SetInsertPoint(SeqCstBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   Builder.CreateBr(ContBB);
   SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::seq_cst),
diff --git a/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp b/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp
new file mode 100644
index 0000000000000..4e2830e7a66cd
--- /dev/null
+++ b/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp
@@ -0,0 +1,126 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -std=c++20 -triple=x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
+
+
+template <unsigned Size>
+struct S {
+    char data[Size];
+};
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj3EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i24 [[DESIRED_COERCE:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S]], align 1
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca { [[STRUCT_S]], [1 x i8] }, align 4
+// CHECK-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca { [[STRUCT_S]], [1 x i8] }, align 4
+// CHECK-NEXT:    [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[OLD_TMP:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[DESIRED]], i32 0, i32 0
+// CHECK-NEXT:    store i24 [[DESIRED_COERCE]], ptr [[COERCE_DIVE]], align 1
+// CHECK-NEXT:    store ptr [[A]], ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[EXPECTED]], ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DOTATOMICTMP]], ptr align 1 [[DESIRED]], i64 3, i1 false)
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[ATOMIC_TEMP]], ptr align 1 [[TMP1]], i64 3, i1 false)
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[ATOMIC_TEMP1]], ptr align 1 [[DOTATOMICTMP]], i64 3, i1 false)
+// CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[ATOMIC_TEMP1]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = cmpxchg ptr [[TMP0]], i32 [[TMP2]], i32 [[TMP3]] monotonic monotonic, align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = extractvalue { i32, i1 } [[TMP4]], 0
+// CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { i32, i1 } [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[TMP6]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK:       [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT:    store i32 [[TMP5]], ptr [[OLD_TMP]], align 4
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[TMP1]], ptr align 4 [[OLD_TMP]], i64 3, i1 false)
+// CHECK-NEXT:    br label %[[CMPXCHG_CONTINUE]]
+// CHECK:       [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT:    [[STOREDV:%.*]] = zext i1 [[TMP6]] to i8
+// CHECK-NEXT:    store i8 [[STOREDV]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[TMP7:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP7]] to i1
+// CHECK-NEXT:    ret i1 [[LOADEDV]]
+//
+bool test_compare_exchange(_Atomic(S<3>)* a, S<3>* expected, S<3> desired) {
+    return __c11_atomic_compare_exchange_strong(a, expected, desired, 0, 0);
+}
+
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj4EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i32 [[DESIRED_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S_0:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S_0]], align 1
+// CHECK-NEXT:    [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_S_0]], ptr [[DESIRED]], i32 0, i32 0
+// CHECK-NEXT:    store i32 [[DESIRED_COERCE]], ptr [[COERCE_DIVE]], align 1
+// CHECK-NEXT:    store ptr [[A]], ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[EXPECTED]], ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DOTATOMICTMP]], ptr align 1 [[DESIRED]], i64 4, i1 false)
+// CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[DOTATOMICTMP]], align 1
+// CHECK-NEXT:    [[TMP4:%.*]] = cmpxchg ptr [[TMP0]], i32 [[TMP2]], i32 [[TMP3]] monotonic monotonic, align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = extractvalue { i32, i1 } [[TMP4]], 0
+// CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { i32, i1 } [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[TMP6]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK:       [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT:    store i32 [[TMP5]], ptr [[TMP1]], align 1
+// CHECK-NEXT:    br label %[[CMPXCHG_CONTINUE]]
+// CHECK:       [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT:    [[STOREDV:%.*]] = zext i1 [[TMP6]] to i8
+// CHECK-NEXT:    store i8 [[STOREDV]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[TMP7:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP7]] to i1
+// CHECK-NEXT:    ret i1 [[LOADEDV]]
+//
+bool test_compare_exchange(_Atomic(S<4>)* a, S<4>* expected, S<4> desired) {
+    return __c11_atomic_compare_exchange_strong(a, expected, desired, 0, 0);
+}
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj6EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i48 [[DESIRED_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S_1:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S_1]], align 1
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca { [[STRUCT_S_1]], [2 x i8] }, align 8
+// CHECK-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca { [[STRUCT_S_1]], [2 x i8...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2025

@llvm/pr-subscribers-libcxx

Author: Hui (huixie90)

Changes

fixes #30023


Patch is 24.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78707.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGAtomic.cpp (+69-23)
  • (added) clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp (+126)
  • (added) libcxx/test/std/atomics/atomics.types.generic/30023.pass.cpp (+65)
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 51f0799a792fd..9eee834a520e0 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -376,6 +376,7 @@ bool AtomicInfo::emitMemSetZeroIfNecessary() const {
 static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
                               Address Dest, Address Ptr,
                               Address Val1, Address Val2,
+                              Address ExpectedResult,
                               uint64_t Size,
                               llvm::AtomicOrdering SuccessOrder,
                               llvm::AtomicOrdering FailureOrder,
@@ -411,7 +412,48 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 
   CGF.Builder.SetInsertPoint(StoreExpectedBB);
   // Update the memory at Expected with Old's value.
-  CGF.Builder.CreateStore(Old, Val1);
+llvm::Type *ExpectedType = ExpectedResult.getElementType();
+const llvm::DataLayout &DL = CGF.CGM.getDataLayout();
+uint64_t ExpectedSizeInBytes = DL.getTypeStoreSize(ExpectedType);
+
+if (ExpectedSizeInBytes == Size) {
+  // Sizes match: store directly
+  CGF.Builder.CreateStore(Old, ExpectedResult);
+
+} else {
+  // store only the first ExpectedSizeInBytes bytes of Old
+  llvm::Type *OldType = Old->getType();
+
+  llvm::Align SrcAlignLLVM = DL.getABITypeAlign(OldType);
+  llvm::Align DstAlignLLVM = DL.getABITypeAlign(ExpectedType);
+
+  clang::CharUnits SrcAlign = clang::CharUnits::fromQuantity(SrcAlignLLVM.value());
+  clang::CharUnits DstAlign = clang::CharUnits::fromQuantity(DstAlignLLVM.value());
+
+  // Allocate temporary storage for Old value
+  llvm::AllocaInst *Alloca = CGF.CreateTempAlloca(OldType, "old.tmp");
+
+  // Wrap into clang::CodeGen::Address with proper type and alignment
+  Address OldStorage(Alloca, OldType, SrcAlign);
+
+  // Store Old into this temporary
+  CGF.Builder.CreateStore(Old, OldStorage);
+
+  // Bitcast pointers to i8*
+  llvm::Type *I8PtrTy = llvm::PointerType::getUnqual(CGF.getLLVMContext());
+
+  llvm::Value *SrcPtr = CGF.Builder.CreateBitCast(OldStorage.getBasePointer(), I8PtrTy);
+  llvm::Value *DstPtr = CGF.Builder.CreateBitCast(ExpectedResult.getBasePointer(), I8PtrTy);
+
+  // Perform memcpy for first ExpectedSizeInBytes bytes
+  CGF.Builder.CreateMemCpy(
+    DstPtr, DstAlignLLVM,
+    SrcPtr, SrcAlignLLVM,
+    llvm::ConstantInt::get(CGF.IntPtrTy, ExpectedSizeInBytes),
+    /*isVolatile=*/false);
+}
+
+  
   // Finally, branch to the exit point.
   CGF.Builder.CreateBr(ContinueBB);
 
@@ -426,6 +468,7 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
                                         bool IsWeak, Address Dest, Address Ptr,
                                         Address Val1, Address Val2,
+                                        Address ExpectedResult,
                                         llvm::Value *FailureOrderVal,
                                         uint64_t Size,
                                         llvm::AtomicOrdering SuccessOrder,
@@ -456,7 +499,7 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
     // success argument". This condition has been lifted and the only
     // precondition is 31.7.2.18. Effectively treat this as a DR and skip
     // language version checks.
-    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                       FailureOrder, Scope);
     return;
   }
@@ -481,17 +524,17 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
 
   // Emit all the different atomics
   CGF.Builder.SetInsertPoint(MonotonicBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult,
                     Size, SuccessOrder, llvm::AtomicOrdering::Monotonic, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(AcquireBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                     llvm::AtomicOrdering::Acquire, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(SeqCstBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                     llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   CGF.Builder.CreateBr(ContBB);
 
@@ -524,6 +567,7 @@ static llvm::Value *EmitPostAtomicMinMax(CGBuilderTy &Builder,
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
+                         Address ExpectedResult,
                          llvm::Value *IsWeak, llvm::Value *FailureOrder,
                          uint64_t Size, llvm::AtomicOrdering Order,
                          llvm::SyncScope::ID Scope) {
@@ -540,13 +584,13 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
     emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order, Scope);
     return;
   case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
   case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
     emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order, Scope);
     return;
   case AtomicExpr::AO__atomic_compare_exchange:
   case AtomicExpr::AO__atomic_compare_exchange_n:
@@ -554,7 +598,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__scoped_atomic_compare_exchange_n: {
     if (llvm::ConstantInt *IsWeakC = dyn_cast<llvm::ConstantInt>(IsWeak)) {
       emitAtomicCmpXchgFailureSet(CGF, E, IsWeakC->getZExtValue(), Dest, Ptr,
-                                  Val1, Val2, FailureOrder, Size, Order, Scope);
+                                  Val1, Val2, ExpectedResult, FailureOrder, Size, Order, Scope);
     } else {
       // Create all the relevant BB's
       llvm::BasicBlock *StrongBB =
@@ -568,12 +612,12 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
 
       CGF.Builder.SetInsertPoint(StrongBB);
       emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                  FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order, Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(WeakBB);
       emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                  FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order, Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(ContBB);
@@ -777,6 +821,7 @@ EmitValToTemp(CodeGenFunction &CGF, Expr *E) {
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
+                         Address OriginalVal1,
                          llvm::Value *IsWeak, llvm::Value *FailureOrder,
                          uint64_t Size, llvm::AtomicOrdering Order,
                          llvm::Value *Scope) {
@@ -796,7 +841,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                                                    Order, CGF.getLLVMContext());
     else
       SS = llvm::SyncScope::System;
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order, SS);
     return;
   }
@@ -806,7 +851,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
     auto SCID = CGF.getTargetHooks().getLLVMSyncScopeID(
         CGF.CGM.getLangOpts(), ScopeModel->map(SC->getZExtValue()),
         Order, CGF.CGM.getLLVMContext());
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order, SCID);
     return;
   }
@@ -832,7 +877,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
       SI->addCase(Builder.getInt32(S), B);
 
     Builder.SetInsertPoint(B);
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order,
                  CGF.getTargetHooks().getLLVMSyncScopeID(CGF.CGM.getLangOpts(),
                                                          ScopeModel->map(S),
@@ -1036,6 +1081,7 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   LValue AtomicVal = MakeAddrLValue(Ptr, AtomicTy);
   AtomicInfo Atomics(*this, AtomicVal);
 
+  Address OriginalVal1 = Val1;
   if (ShouldCastToIntPtrTy) {
     Ptr = Atomics.castToAtomicIntPointer(Ptr);
     if (Val1.isValid())
@@ -1279,30 +1325,30 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     if (llvm::isValidAtomicOrderingCABI(ord))
       switch ((llvm::AtomicOrderingCABI)ord) {
       case llvm::AtomicOrderingCABI::relaxed:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Monotonic, Scope);
         break;
       case llvm::AtomicOrderingCABI::consume:
       case llvm::AtomicOrderingCABI::acquire:
         if (IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Acquire, Scope);
         break;
       case llvm::AtomicOrderingCABI::release:
         if (IsLoad)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Release, Scope);
         break;
       case llvm::AtomicOrderingCABI::acq_rel:
         if (IsLoad || IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::AcquireRelease, Scope);
         break;
       case llvm::AtomicOrderingCABI::seq_cst:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::SequentiallyConsistent, Scope);
         break;
       }
@@ -1338,12 +1384,12 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
 
   // Emit all the different atomics
   Builder.SetInsertPoint(MonotonicBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                llvm::AtomicOrdering::Monotonic, Scope);
   Builder.CreateBr(ContBB);
   if (!IsStore) {
     Builder.SetInsertPoint(AcquireBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::Acquire, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::consume),
@@ -1353,7 +1399,7 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad) {
     Builder.SetInsertPoint(ReleaseBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::Release, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::release),
@@ -1361,14 +1407,14 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad && !IsStore) {
     Builder.SetInsertPoint(AcqRelBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::AcquireRelease, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::acq_rel),
                 AcqRelBB);
   }
   Builder.SetInsertPoint(SeqCstBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   Builder.CreateBr(ContBB);
   SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::seq_cst),
diff --git a/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp b/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp
new file mode 100644
index 0000000000000..4e2830e7a66cd
--- /dev/null
+++ b/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp
@@ -0,0 +1,126 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -std=c++20 -triple=x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
+
+
+template <unsigned Size>
+struct S {
+    char data[Size];
+};
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj3EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i24 [[DESIRED_COERCE:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S]], align 1
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca { [[STRUCT_S]], [1 x i8] }, align 4
+// CHECK-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca { [[STRUCT_S]], [1 x i8] }, align 4
+// CHECK-NEXT:    [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[OLD_TMP:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[DESIRED]], i32 0, i32 0
+// CHECK-NEXT:    store i24 [[DESIRED_COERCE]], ptr [[COERCE_DIVE]], align 1
+// CHECK-NEXT:    store ptr [[A]], ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[EXPECTED]], ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DOTATOMICTMP]], ptr align 1 [[DESIRED]], i64 3, i1 false)
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[ATOMIC_TEMP]], ptr align 1 [[TMP1]], i64 3, i1 false)
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[ATOMIC_TEMP1]], ptr align 1 [[DOTATOMICTMP]], i64 3, i1 false)
+// CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[ATOMIC_TEMP1]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = cmpxchg ptr [[TMP0]], i32 [[TMP2]], i32 [[TMP3]] monotonic monotonic, align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = extractvalue { i32, i1 } [[TMP4]], 0
+// CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { i32, i1 } [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[TMP6]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK:       [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT:    store i32 [[TMP5]], ptr [[OLD_TMP]], align 4
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[TMP1]], ptr align 4 [[OLD_TMP]], i64 3, i1 false)
+// CHECK-NEXT:    br label %[[CMPXCHG_CONTINUE]]
+// CHECK:       [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT:    [[STOREDV:%.*]] = zext i1 [[TMP6]] to i8
+// CHECK-NEXT:    store i8 [[STOREDV]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[TMP7:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP7]] to i1
+// CHECK-NEXT:    ret i1 [[LOADEDV]]
+//
+bool test_compare_exchange(_Atomic(S<3>)* a, S<3>* expected, S<3> desired) {
+    return __c11_atomic_compare_exchange_strong(a, expected, desired, 0, 0);
+}
+
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj4EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i32 [[DESIRED_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S_0:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S_0]], align 1
+// CHECK-NEXT:    [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_S_0]], ptr [[DESIRED]], i32 0, i32 0
+// CHECK-NEXT:    store i32 [[DESIRED_COERCE]], ptr [[COERCE_DIVE]], align 1
+// CHECK-NEXT:    store ptr [[A]], ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[EXPECTED]], ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DOTATOMICTMP]], ptr align 1 [[DESIRED]], i64 4, i1 false)
+// CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[DOTATOMICTMP]], align 1
+// CHECK-NEXT:    [[TMP4:%.*]] = cmpxchg ptr [[TMP0]], i32 [[TMP2]], i32 [[TMP3]] monotonic monotonic, align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = extractvalue { i32, i1 } [[TMP4]], 0
+// CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { i32, i1 } [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[TMP6]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK:       [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT:    store i32 [[TMP5]], ptr [[TMP1]], align 1
+// CHECK-NEXT:    br label %[[CMPXCHG_CONTINUE]]
+// CHECK:       [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT:    [[STOREDV:%.*]] = zext i1 [[TMP6]] to i8
+// CHECK-NEXT:    store i8 [[STOREDV]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[TMP7:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP7]] to i1
+// CHECK-NEXT:    ret i1 [[LOADEDV]]
+//
+bool test_compare_exchange(_Atomic(S<4>)* a, S<4>* expected, S<4> desired) {
+    return __c11_atomic_compare_exchange_strong(a, expected, desired, 0, 0);
+}
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj6EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i48 [[DESIRED_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S_1:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S_1]], align 1
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca { [[STRUCT_S_1]], [2 x i8] }, align 8
+// CHECK-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca { [[STRUCT_S_1]], [2 x i8...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2025

@llvm/pr-subscribers-clang

Author: Hui (huixie90)

Changes

fixes #30023


Patch is 24.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78707.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGAtomic.cpp (+69-23)
  • (added) clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp (+126)
  • (added) libcxx/test/std/atomics/atomics.types.generic/30023.pass.cpp (+65)
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 51f0799a792fd..9eee834a520e0 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -376,6 +376,7 @@ bool AtomicInfo::emitMemSetZeroIfNecessary() const {
 static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
                               Address Dest, Address Ptr,
                               Address Val1, Address Val2,
+                              Address ExpectedResult,
                               uint64_t Size,
                               llvm::AtomicOrdering SuccessOrder,
                               llvm::AtomicOrdering FailureOrder,
@@ -411,7 +412,48 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 
   CGF.Builder.SetInsertPoint(StoreExpectedBB);
   // Update the memory at Expected with Old's value.
-  CGF.Builder.CreateStore(Old, Val1);
+llvm::Type *ExpectedType = ExpectedResult.getElementType();
+const llvm::DataLayout &DL = CGF.CGM.getDataLayout();
+uint64_t ExpectedSizeInBytes = DL.getTypeStoreSize(ExpectedType);
+
+if (ExpectedSizeInBytes == Size) {
+  // Sizes match: store directly
+  CGF.Builder.CreateStore(Old, ExpectedResult);
+
+} else {
+  // store only the first ExpectedSizeInBytes bytes of Old
+  llvm::Type *OldType = Old->getType();
+
+  llvm::Align SrcAlignLLVM = DL.getABITypeAlign(OldType);
+  llvm::Align DstAlignLLVM = DL.getABITypeAlign(ExpectedType);
+
+  clang::CharUnits SrcAlign = clang::CharUnits::fromQuantity(SrcAlignLLVM.value());
+  clang::CharUnits DstAlign = clang::CharUnits::fromQuantity(DstAlignLLVM.value());
+
+  // Allocate temporary storage for Old value
+  llvm::AllocaInst *Alloca = CGF.CreateTempAlloca(OldType, "old.tmp");
+
+  // Wrap into clang::CodeGen::Address with proper type and alignment
+  Address OldStorage(Alloca, OldType, SrcAlign);
+
+  // Store Old into this temporary
+  CGF.Builder.CreateStore(Old, OldStorage);
+
+  // Bitcast pointers to i8*
+  llvm::Type *I8PtrTy = llvm::PointerType::getUnqual(CGF.getLLVMContext());
+
+  llvm::Value *SrcPtr = CGF.Builder.CreateBitCast(OldStorage.getBasePointer(), I8PtrTy);
+  llvm::Value *DstPtr = CGF.Builder.CreateBitCast(ExpectedResult.getBasePointer(), I8PtrTy);
+
+  // Perform memcpy for first ExpectedSizeInBytes bytes
+  CGF.Builder.CreateMemCpy(
+    DstPtr, DstAlignLLVM,
+    SrcPtr, SrcAlignLLVM,
+    llvm::ConstantInt::get(CGF.IntPtrTy, ExpectedSizeInBytes),
+    /*isVolatile=*/false);
+}
+
+  
   // Finally, branch to the exit point.
   CGF.Builder.CreateBr(ContinueBB);
 
@@ -426,6 +468,7 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,
 static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
                                         bool IsWeak, Address Dest, Address Ptr,
                                         Address Val1, Address Val2,
+                                        Address ExpectedResult,
                                         llvm::Value *FailureOrderVal,
                                         uint64_t Size,
                                         llvm::AtomicOrdering SuccessOrder,
@@ -456,7 +499,7 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
     // success argument". This condition has been lifted and the only
     // precondition is 31.7.2.18. Effectively treat this as a DR and skip
     // language version checks.
-    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+    emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                       FailureOrder, Scope);
     return;
   }
@@ -481,17 +524,17 @@ static void emitAtomicCmpXchgFailureSet(CodeGenFunction &CGF, AtomicExpr *E,
 
   // Emit all the different atomics
   CGF.Builder.SetInsertPoint(MonotonicBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult,
                     Size, SuccessOrder, llvm::AtomicOrdering::Monotonic, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(AcquireBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                     llvm::AtomicOrdering::Acquire, Scope);
   CGF.Builder.CreateBr(ContBB);
 
   CGF.Builder.SetInsertPoint(SeqCstBB);
-  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, Size, SuccessOrder,
+  emitAtomicCmpXchg(CGF, E, IsWeak, Dest, Ptr, Val1, Val2, ExpectedResult, Size, SuccessOrder,
                     llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   CGF.Builder.CreateBr(ContBB);
 
@@ -524,6 +567,7 @@ static llvm::Value *EmitPostAtomicMinMax(CGBuilderTy &Builder,
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
+                         Address ExpectedResult,
                          llvm::Value *IsWeak, llvm::Value *FailureOrder,
                          uint64_t Size, llvm::AtomicOrdering Order,
                          llvm::SyncScope::ID Scope) {
@@ -540,13 +584,13 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
     emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order, Scope);
     return;
   case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
   case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
   case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
     emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                FailureOrder, Size, Order, Scope);
+                                ExpectedResult, FailureOrder, Size, Order, Scope);
     return;
   case AtomicExpr::AO__atomic_compare_exchange:
   case AtomicExpr::AO__atomic_compare_exchange_n:
@@ -554,7 +598,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
   case AtomicExpr::AO__scoped_atomic_compare_exchange_n: {
     if (llvm::ConstantInt *IsWeakC = dyn_cast<llvm::ConstantInt>(IsWeak)) {
       emitAtomicCmpXchgFailureSet(CGF, E, IsWeakC->getZExtValue(), Dest, Ptr,
-                                  Val1, Val2, FailureOrder, Size, Order, Scope);
+                                  Val1, Val2, ExpectedResult, FailureOrder, Size, Order, Scope);
     } else {
       // Create all the relevant BB's
       llvm::BasicBlock *StrongBB =
@@ -568,12 +612,12 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
 
       CGF.Builder.SetInsertPoint(StrongBB);
       emitAtomicCmpXchgFailureSet(CGF, E, false, Dest, Ptr, Val1, Val2,
-                                  FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order, Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(WeakBB);
       emitAtomicCmpXchgFailureSet(CGF, E, true, Dest, Ptr, Val1, Val2,
-                                  FailureOrder, Size, Order, Scope);
+                                  ExpectedResult, FailureOrder, Size, Order, Scope);
       CGF.Builder.CreateBr(ContBB);
 
       CGF.Builder.SetInsertPoint(ContBB);
@@ -777,6 +821,7 @@ EmitValToTemp(CodeGenFunction &CGF, Expr *E) {
 
 static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                          Address Ptr, Address Val1, Address Val2,
+                         Address OriginalVal1,
                          llvm::Value *IsWeak, llvm::Value *FailureOrder,
                          uint64_t Size, llvm::AtomicOrdering Order,
                          llvm::Value *Scope) {
@@ -796,7 +841,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
                                                    Order, CGF.getLLVMContext());
     else
       SS = llvm::SyncScope::System;
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order, SS);
     return;
   }
@@ -806,7 +851,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
     auto SCID = CGF.getTargetHooks().getLLVMSyncScopeID(
         CGF.CGM.getLangOpts(), ScopeModel->map(SC->getZExtValue()),
         Order, CGF.CGM.getLLVMContext());
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order, SCID);
     return;
   }
@@ -832,7 +877,7 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
       SI->addCase(Builder.getInt32(S), B);
 
     Builder.SetInsertPoint(B);
-    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
+    EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, FailureOrder, Size,
                  Order,
                  CGF.getTargetHooks().getLLVMSyncScopeID(CGF.CGM.getLangOpts(),
                                                          ScopeModel->map(S),
@@ -1036,6 +1081,7 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   LValue AtomicVal = MakeAddrLValue(Ptr, AtomicTy);
   AtomicInfo Atomics(*this, AtomicVal);
 
+  Address OriginalVal1 = Val1;
   if (ShouldCastToIntPtrTy) {
     Ptr = Atomics.castToAtomicIntPointer(Ptr);
     if (Val1.isValid())
@@ -1279,30 +1325,30 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     if (llvm::isValidAtomicOrderingCABI(ord))
       switch ((llvm::AtomicOrderingCABI)ord) {
       case llvm::AtomicOrderingCABI::relaxed:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Monotonic, Scope);
         break;
       case llvm::AtomicOrderingCABI::consume:
       case llvm::AtomicOrderingCABI::acquire:
         if (IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Acquire, Scope);
         break;
       case llvm::AtomicOrderingCABI::release:
         if (IsLoad)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::Release, Scope);
         break;
       case llvm::AtomicOrderingCABI::acq_rel:
         if (IsLoad || IsStore)
           break; // Avoid crashing on code with undefined behavior
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::AcquireRelease, Scope);
         break;
       case llvm::AtomicOrderingCABI::seq_cst:
-        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+        EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                      llvm::AtomicOrdering::SequentiallyConsistent, Scope);
         break;
       }
@@ -1338,12 +1384,12 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
 
   // Emit all the different atomics
   Builder.SetInsertPoint(MonotonicBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                llvm::AtomicOrdering::Monotonic, Scope);
   Builder.CreateBr(ContBB);
   if (!IsStore) {
     Builder.SetInsertPoint(AcquireBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::Acquire, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::consume),
@@ -1353,7 +1399,7 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad) {
     Builder.SetInsertPoint(ReleaseBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::Release, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::release),
@@ -1361,14 +1407,14 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
   }
   if (!IsLoad && !IsStore) {
     Builder.SetInsertPoint(AcqRelBB);
-    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+    EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                  llvm::AtomicOrdering::AcquireRelease, Scope);
     Builder.CreateBr(ContBB);
     SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::acq_rel),
                 AcqRelBB);
   }
   Builder.SetInsertPoint(SeqCstBB);
-  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, IsWeak, OrderFail, Size,
+  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, OriginalVal1, IsWeak, OrderFail, Size,
                llvm::AtomicOrdering::SequentiallyConsistent, Scope);
   Builder.CreateBr(ContBB);
   SI->addCase(Builder.getInt32((int)llvm::AtomicOrderingCABI::seq_cst),
diff --git a/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp b/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp
new file mode 100644
index 0000000000000..4e2830e7a66cd
--- /dev/null
+++ b/clang/test/CodeGenCXX/builtin-atomic-compare_exchange.cpp
@@ -0,0 +1,126 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -std=c++20 -triple=x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
+
+
+template <unsigned Size>
+struct S {
+    char data[Size];
+};
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj3EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i24 [[DESIRED_COERCE:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S]], align 1
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca { [[STRUCT_S]], [1 x i8] }, align 4
+// CHECK-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca { [[STRUCT_S]], [1 x i8] }, align 4
+// CHECK-NEXT:    [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[OLD_TMP:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[DESIRED]], i32 0, i32 0
+// CHECK-NEXT:    store i24 [[DESIRED_COERCE]], ptr [[COERCE_DIVE]], align 1
+// CHECK-NEXT:    store ptr [[A]], ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[EXPECTED]], ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DOTATOMICTMP]], ptr align 1 [[DESIRED]], i64 3, i1 false)
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[ATOMIC_TEMP]], ptr align 1 [[TMP1]], i64 3, i1 false)
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[ATOMIC_TEMP1]], ptr align 1 [[DOTATOMICTMP]], i64 3, i1 false)
+// CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[ATOMIC_TEMP1]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = cmpxchg ptr [[TMP0]], i32 [[TMP2]], i32 [[TMP3]] monotonic monotonic, align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = extractvalue { i32, i1 } [[TMP4]], 0
+// CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { i32, i1 } [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[TMP6]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK:       [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT:    store i32 [[TMP5]], ptr [[OLD_TMP]], align 4
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[TMP1]], ptr align 4 [[OLD_TMP]], i64 3, i1 false)
+// CHECK-NEXT:    br label %[[CMPXCHG_CONTINUE]]
+// CHECK:       [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT:    [[STOREDV:%.*]] = zext i1 [[TMP6]] to i8
+// CHECK-NEXT:    store i8 [[STOREDV]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[TMP7:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP7]] to i1
+// CHECK-NEXT:    ret i1 [[LOADEDV]]
+//
+bool test_compare_exchange(_Atomic(S<3>)* a, S<3>* expected, S<3> desired) {
+    return __c11_atomic_compare_exchange_strong(a, expected, desired, 0, 0);
+}
+
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj4EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i32 [[DESIRED_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S_0:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S_0]], align 1
+// CHECK-NEXT:    [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT:    [[COERCE_DIVE:%.*]] = getelementptr inbounds nuw [[STRUCT_S_0]], ptr [[DESIRED]], i32 0, i32 0
+// CHECK-NEXT:    store i32 [[DESIRED_COERCE]], ptr [[COERCE_DIVE]], align 1
+// CHECK-NEXT:    store ptr [[A]], ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[EXPECTED]], ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[EXPECTED_ADDR]], align 8
+// CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 1 [[DOTATOMICTMP]], ptr align 1 [[DESIRED]], i64 4, i1 false)
+// CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[DOTATOMICTMP]], align 1
+// CHECK-NEXT:    [[TMP4:%.*]] = cmpxchg ptr [[TMP0]], i32 [[TMP2]], i32 [[TMP3]] monotonic monotonic, align 4
+// CHECK-NEXT:    [[TMP5:%.*]] = extractvalue { i32, i1 } [[TMP4]], 0
+// CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { i32, i1 } [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[TMP6]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK:       [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT:    store i32 [[TMP5]], ptr [[TMP1]], align 1
+// CHECK-NEXT:    br label %[[CMPXCHG_CONTINUE]]
+// CHECK:       [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT:    [[STOREDV:%.*]] = zext i1 [[TMP6]] to i8
+// CHECK-NEXT:    store i8 [[STOREDV]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[TMP7:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP7]] to i1
+// CHECK-NEXT:    ret i1 [[LOADEDV]]
+//
+bool test_compare_exchange(_Atomic(S<4>)* a, S<4>* expected, S<4> desired) {
+    return __c11_atomic_compare_exchange_strong(a, expected, desired, 0, 0);
+}
+
+// CHECK-LABEL: define dso_local noundef zeroext i1 @_Z21test_compare_exchangePU7_Atomic1SILj6EEPS0_S0_(
+// CHECK-SAME: ptr noundef [[A:%.*]], ptr noundef [[EXPECTED:%.*]], i48 [[DESIRED_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[DESIRED:%.*]] = alloca [[STRUCT_S_1:%.*]], align 1
+// CHECK-NEXT:    [[A_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[EXPECTED_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[DOTATOMICTMP:%.*]] = alloca [[STRUCT_S_1]], align 1
+// CHECK-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca { [[STRUCT_S_1]], [2 x i8] }, align 8
+// CHECK-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca { [[STRUCT_S_1]], [2 x i8...
[truncated]

llvm::Type *I8PtrTy = llvm::PointerType::getUnqual(CGF.getLLVMContext());

llvm::Value *SrcPtr = CGF.Builder.CreateBitCast(OldStorage.getBasePointer(), I8PtrTy);
llvm::Value *DstPtr = CGF.Builder.CreateBitCast(ExpectedResult.getBasePointer(), I8PtrTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitcasts don't do anything with opaque pointers.

Copy link
Member Author

@huixie90 huixie90 Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efriedma-quic Hi thanks for the review. Could you please point me the right way of doing this? The entire code in the else branch is just doing one thing. Copy the first N bytes from the Old to ExpectedResults.

The rational is as follows. when we do compare exchange , if the input type is not of size power of two, say, it is 36 bytes. we first allocate 48 bytes temporary, do the compare exchange with the temporary. Then we need to also update the "expected" when the exchange does Not happen. but we cannot copy the full 48 bytes back to "expected". because user only passed in an address with 36 bytes.

What is the right way of copying first N bytes from Old to ExpectedResults?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here is doing the right thing, roughly: doing an alloca+store+memcpy is fine. (SROA will usually end up optimizing it, but the exact sequence of instructions you need isn't consistent, so it's fine to rely on that.)

My suggestions are basically code cleanup: don't do redundant operations, compute alignment consistently, use appropriate helper methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks. I had another go. @efriedma-quic could you please see if it is something look more reasonable to you? thanks

llvm::Align DstAlignLLVM = DL.getABITypeAlign(ExpectedType);

clang::CharUnits SrcAlign = clang::CharUnits::fromQuantity(SrcAlignLLVM.value());
clang::CharUnits DstAlign = clang::CharUnits::fromQuantity(DstAlignLLVM.value());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct way to query alignment in general.

An Address should already have alignment encoded in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I replaced the DstAlignLLVM thing with ExpectedResult.getAlignment(). However, for Old, I don't have an Address, could you please advice the right way to get the alignment from an llvm::Value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Old is the result of the compare exchange. I assume it has the same alignment of the atomic address itself. so I used the Ptr.getAlignment() here. would that be fine?

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite reasonable to me, and it seems we're fixing a pretty bad bug. However, someone else with more Clang/Codegen experience needs to look at this. Thanks for working on this!

CC @efriedma-quic

@@ -411,7 +412,33 @@ static void emitAtomicCmpXchg(CodeGenFunction &CGF, AtomicExpr *E, bool IsWeak,

CGF.Builder.SetInsertPoint(StoreExpectedBB);
// Update the memory at Expected with Old's value.
CGF.Builder.CreateStore(Old, Val1);
llvm::Type *ExpectedType = ExpectedResult.getElementType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: this looks mis-formatted.

assert(expected.b);
}

int main() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int main() {
int main(int, char**) {

Comment on lines +64 to +65
test<6>();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test<6>();
}
test<6>();
return 0;
}

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to require something like

// XFAIL: clang-20, apple-clang-15, apple-clang-16

@ldionne
Copy link
Member

ldionne commented Jul 4, 2025

CC @philnik777 in case you want to take a look as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

atomic::compare_exchange returns wrong value
4 participants