[X86] Prefer `lock or` over mfence #106555

vchuravy · 2024-08-29T13:24:07Z

Originally opened as https://reviews.llvm.org/D129947

LLVM currently emits mfence for __atomic_thread_fence(seq_cst). On
modern CPUs lock or is more efficient and provides the same sequential
consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html)
and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632
moved into this direction as well, but didn't touch fence seq_cst.

Amusingly this came up elsewhere: https://www.reddit.com/r/cpp_questions/comments/16uer2g/how_do_i_stop_clang_generating_mfence/

After another 2 years it doesn't look like anyone complained about the
GCC switch. And there is still __builtin_ia32_mfence for folks who
want this precise instruction.

Fixes #91731

llvmbot · 2024-08-29T13:24:27Z

@llvm/pr-subscribers-backend-x86

Author: Valentin Churavy (vchuravy)

Changes

Originally opened as https://reviews.llvm.org/D129947

LLVM currently emits mfence for __atomic_thread_fence(seq_cst). On
modern CPUs lock or is more efficient and provides the same sequential
consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html)
and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632
moved into this direction as well, but didn't touch fence seq_cst.

Amusingly this came up elsewhere: https://www.reddit.com/r/cpp_questions/comments/16uer2g/how_do_i_stop_clang_generating_mfence/

After another 2 years it doesn't look like anyone complained about the
GCC switch. And there is still __builtin_ia32_mfence for folks who
want this precise instruction.

Full diff: https://github.com/llvm/llvm-project/pull/106555.diff

3 Files Affected:

(modified) llvm/lib/Target/X86/X86.td (+28-12)
(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+1-1)
(modified) llvm/test/CodeGen/X86/atomic-unordered.ll (+5-5)

diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 988966fa6a6c46..dfa534a69e7024 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -754,6 +754,10 @@ def TuningUseGLMDivSqrtCosts
 def TuningBranchHint: SubtargetFeature<"branch-hint", "HasBranchHint", "true",
                                         "Target has branch hint feature">;
 
+def TuningAvoidMFENCE
+   : SubtargetFeature<"avoid-mfence", "AvoidMFence", "true",
+        "Avoid MFENCE for fence seq_cst, and instead use lock or">;
+
 //===----------------------------------------------------------------------===//
 // X86 CPU Families
 // TODO: Remove these - use general tuning features to determine codegen.
@@ -882,7 +886,8 @@ def ProcessorFeatures {
   list<SubtargetFeature> NHMTuning = [TuningMacroFusion,
                                       TuningSlowDivide64,
                                       TuningInsertVZEROUPPER,
-                                      TuningNoDomainDelayMov];
+                                      TuningNoDomainDelayMov,
+                                      TuningAvoidMFENCE];
 
   // Westmere
   list<SubtargetFeature> WSMAdditionalFeatures = [FeaturePCLMUL];
@@ -903,7 +908,8 @@ def ProcessorFeatures {
                                       TuningFast15ByteNOP,
                                       TuningPOPCNTFalseDeps,
                                       TuningInsertVZEROUPPER,
-                                      TuningNoDomainDelayMov];
+                                      TuningNoDomainDelayMov,
+                                      TuningAvoidMFENCE];
   list<SubtargetFeature> SNBFeatures =
     !listconcat(WSMFeatures, SNBAdditionalFeatures);
 
@@ -969,7 +975,8 @@ def ProcessorFeatures {
                                       TuningAllowLight256Bit,
                                       TuningNoDomainDelayMov,
                                       TuningNoDomainDelayShuffle,
-                                      TuningNoDomainDelayBlend];
+                                      TuningNoDomainDelayBlend,
+                                      TuningAvoidMFENCE];
   list<SubtargetFeature> SKLFeatures =
     !listconcat(BDWFeatures, SKLAdditionalFeatures);
 
@@ -1004,7 +1011,8 @@ def ProcessorFeatures {
                                       TuningNoDomainDelayMov,
                                       TuningNoDomainDelayShuffle,
                                       TuningNoDomainDelayBlend,
-                                      TuningFastImmVectorShift];
+                                      TuningFastImmVectorShift,
+                                      TuningAvoidMFENCE];
   list<SubtargetFeature> SKXFeatures =
     !listconcat(BDWFeatures, SKXAdditionalFeatures);
 
@@ -1047,7 +1055,8 @@ def ProcessorFeatures {
                                       TuningNoDomainDelayMov,
                                       TuningNoDomainDelayShuffle,
                                       TuningNoDomainDelayBlend,
-                                      TuningFastImmVectorShift];
+                                      TuningFastImmVectorShift,
+                                      TuningAvoidMFENCE];
   list<SubtargetFeature> CNLFeatures =
     !listconcat(SKLFeatures, CNLAdditionalFeatures);
 
@@ -1076,7 +1085,8 @@ def ProcessorFeatures {
                                       TuningNoDomainDelayMov,
                                       TuningNoDomainDelayShuffle,
                                       TuningNoDomainDelayBlend,
-                                      TuningFastImmVectorShift];
+                                      TuningFastImmVectorShift,
+                                      TuningAvoidMFENCE];
   list<SubtargetFeature> ICLFeatures =
     !listconcat(CNLFeatures, ICLAdditionalFeatures);
 
@@ -1222,7 +1232,8 @@ def ProcessorFeatures {
   // Tremont
   list<SubtargetFeature> TRMAdditionalFeatures = [FeatureCLWB,
                                                   FeatureGFNI];
-  list<SubtargetFeature> TRMTuning = GLPTuning;
+  list<SubtargetFeature> TRMAdditionalTuning = [TuningAvoidMFENCE];
+  list<SubtargetFeature> TRMTuning = !listconcat(GLPTuning, TRMAdditionalTuning);
   list<SubtargetFeature> TRMFeatures =
     !listconcat(GLPFeatures, TRMAdditionalFeatures);
 
@@ -1429,7 +1440,8 @@ def ProcessorFeatures {
                                          TuningFastScalarShiftMasks,
                                          TuningBranchFusion,
                                          TuningSBBDepBreaking,
-                                         TuningInsertVZEROUPPER];
+                                         TuningInsertVZEROUPPER,
+                                         TuningAvoidMFENCE];
 
   // PileDriver
   list<SubtargetFeature> BdVer2AdditionalFeatures = [FeatureF16C,
@@ -1509,7 +1521,8 @@ def ProcessorFeatures {
                                      TuningSlowSHLD,
                                      TuningSBBDepBreaking,
                                      TuningInsertVZEROUPPER,
-                                     TuningAllowLight256Bit];
+                                     TuningAllowLight256Bit,
+                                     TuningAvoidMFENCE];
   list<SubtargetFeature> ZN2AdditionalFeatures = [FeatureCLWB,
                                                   FeatureRDPID,
                                                   FeatureRDPRU,
@@ -1664,7 +1677,8 @@ def : ProcModel<"nocona", GenericPostRAModel, [
 ],
 [
   TuningSlowUAMem16,
-  TuningInsertVZEROUPPER
+  TuningInsertVZEROUPPER,
+  TuningAvoidMFENCE
 ]>;
 
 // Intel Core 2 Solo/Duo.
@@ -1684,7 +1698,8 @@ def : ProcModel<P, SandyBridgeModel, [
 [
   TuningMacroFusion,
   TuningSlowUAMem16,
-  TuningInsertVZEROUPPER
+  TuningInsertVZEROUPPER,
+  TuningAvoidMFENCE
 ]>;
 }
 foreach P = ["penryn", "core_2_duo_sse4_1"] in {
@@ -1703,7 +1718,8 @@ def : ProcModel<P, SandyBridgeModel, [
 [
   TuningMacroFusion,
   TuningSlowUAMem16,
-  TuningInsertVZEROUPPER
+  TuningInsertVZEROUPPER,
+  TuningAvoidMFENCE
 ]>;
 }
 
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index f011249d295040..aade718c1efe80 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -31103,7 +31103,7 @@ static SDValue LowerATOMIC_FENCE(SDValue Op, const X86Subtarget &Subtarget,
   // cross-thread fence.
   if (FenceOrdering == AtomicOrdering::SequentiallyConsistent &&
       FenceSSID == SyncScope::System) {
-    if (Subtarget.hasMFence())
+    if (!Subtarget.avoidMFence() && Subtarget.hasMFence())
       return DAG.getNode(X86ISD::MFENCE, dl, MVT::Other, Op.getOperand(0));
 
     SDValue Chain = Op.getOperand(0);
diff --git a/llvm/test/CodeGen/X86/atomic-unordered.ll b/llvm/test/CodeGen/X86/atomic-unordered.ll
index 3fb994cdb751a3..e8e0ee0b7ef492 100644
--- a/llvm/test/CodeGen/X86/atomic-unordered.ll
+++ b/llvm/test/CodeGen/X86/atomic-unordered.ll
@@ -2096,7 +2096,7 @@ define i64 @nofold_fence(ptr %p) {
 ; CHECK-LABEL: nofold_fence:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movq (%rdi), %rax
-; CHECK-NEXT:    mfence
+; CHECK-NEXT:    lock orl $0, -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    addq $15, %rax
 ; CHECK-NEXT:    retq
   %v = load atomic i64, ptr %p unordered, align 8
@@ -2170,7 +2170,7 @@ define i64 @fold_constant_fence(i64 %arg) {
 ; CHECK-LABEL: fold_constant_fence:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movq Constant(%rip), %rax
-; CHECK-NEXT:    mfence
+; CHECK-NEXT:    lock orl $0, -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    addq %rdi, %rax
 ; CHECK-NEXT:    retq
   %v = load atomic i64, ptr @Constant unordered, align 8
@@ -2197,7 +2197,7 @@ define i64 @fold_invariant_fence(ptr dereferenceable(8) %p, i64 %arg) {
 ; CHECK-LABEL: fold_invariant_fence:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movq (%rdi), %rax
-; CHECK-NEXT:    mfence
+; CHECK-NEXT:    lock orl $0, -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    addq %rsi, %rax
 ; CHECK-NEXT:    retq
   %v = load atomic i64, ptr %p unordered, align 8, !invariant.load !{}
@@ -2321,7 +2321,7 @@ define i1 @fold_cmp_over_fence(ptr %p, i32 %v1) {
 ; CHECK-O0-LABEL: fold_cmp_over_fence:
 ; CHECK-O0:       # %bb.0:
 ; CHECK-O0-NEXT:    movl (%rdi), %eax
-; CHECK-O0-NEXT:    mfence
+; CHECK-O0-NEXT:    lock orl $0, -{{[0-9]+}}(%rsp)
 ; CHECK-O0-NEXT:    cmpl %eax, %esi
 ; CHECK-O0-NEXT:    jne .LBB116_2
 ; CHECK-O0-NEXT:  # %bb.1: # %taken
@@ -2335,7 +2335,7 @@ define i1 @fold_cmp_over_fence(ptr %p, i32 %v1) {
 ; CHECK-O3-LABEL: fold_cmp_over_fence:
 ; CHECK-O3:       # %bb.0:
 ; CHECK-O3-NEXT:    movl (%rdi), %eax
-; CHECK-O3-NEXT:    mfence
+; CHECK-O3-NEXT:    lock orl $0, -{{[0-9]+}}(%rsp)
 ; CHECK-O3-NEXT:    cmpl %eax, %esi
 ; CHECK-O3-NEXT:    jne .LBB116_2
 ; CHECK-O3-NEXT:  # %bb.1: # %taken

RKSimon · 2024-08-29T15:19:20Z

Failed Tests (2):
  LLVM :: Transforms/Inline/X86/inline-target-cpu-i686.ll
  LLVM :: Transforms/Inline/X86/inline-target-cpu-x86_64.ll

llvm/lib/Target/X86/X86.td

llvm/test/CodeGen/X86/mfence.ll

ConorWilliams · 2024-08-31T19:05:37Z

Thank you

vchuravy · 2024-09-27T13:03:31Z

On a Genoa machine (AMD EPYC 9384X), a benchmark of mine takes 14.13s to execute with seq_cst defaulting to mfence and 9.99s with lock or. This is single-threaded...

llvm/lib/Target/X86/X86.td

Originally opened as https://reviews.llvm.org/D129947 LLVM currently emits `mfence` for `__atomic_thread_fence(seq_cst)`. On modern CPUs lock or is more efficient and provides the same sequential consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html) and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632 moved into this direction as well, but didn't touch fence seq_cst. Amusingly this came up elsewhere: https://www.reddit.com/r/cpp_questions/comments/16uer2g/how_do_i_stop_clang_generating_mfence/ After another 2 years it doesn't look like anyone complained about the GCC switch. And there is still `__builtin_ia32_mfence` for folks who want this precise instruction. (cherry picked from commit 4d502dd) (cherry picked from commit 707ca0e)

giordano · 2024-11-05T14:47:34Z

Friendly bump! Can we get another round of review on this PR? Thanks!

llvm/lib/Target/X86/X86ISelLowering.cpp

Originally opened as https://reviews.llvm.org/D129947 LLVM currently emits `mfence` for `__atomic_thread_fence(seq_cst)`. On modern CPUs lock or is more efficient and provides the same sequential consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html) and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632 moved into this direction as well, but didn't touch fence seq_cst. Amusingly this came up elsewhere: https://www.reddit.com/r/cpp_questions/comments/16uer2g/how_do_i_stop_clang_generating_mfence/ After another 2 years it doesn't look like anyone complained about the GCC switch. And there is still `__builtin_ia32_mfence` for folks who want this precise instruction. (cherry picked from commit 4d502dd) (cherry picked from commit 707ca0e)

This extends this optimization for scenarios where the subtarget has `!hasMFence` or we have SyncScope SingleThread, by avoiding the direct usage of `llvm.x64.sse2.mfence`. Originally part of #106555

We try to only use X32 for gnux32 triple Noticed while reviewing #106555

RKSimon

Instead of a tuning flag I'm very tempted to say that we create a general avoidMFence() method that always returns true for 64-bit targets - similar to what we do for hasCLFLUSH() etc.

@phoebewang any thoughts?

phoebewang · 2025-03-11T02:08:10Z

Instead of a tuning flag I'm very tempted to say that we create a general avoidMFence() method that always returns true for 64-bit targets - similar to what we do for hasCLFLUSH() etc.

@phoebewang any thoughts?

Sounds good. I don't like the tedious tunings for all modern targets.

Originally opened as https://reviews.llvm.org/D129947 LLVM currently emits `mfence` for `__atomic_thread_fence(seq_cst)`. On modern CPUs lock or is more efficient and provides the same sequential consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html) and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632 moved into this direction as well, but didn't touch fence seq_cst. Amusingly this came up elsewhere: https://www.reddit.com/r/cpp_questions/comments/16uer2g/how_do_i_stop_clang_generating_mfence/ After another 2 years it doesn't look like anyone complained about the GCC switch. And there is still `__builtin_ia32_mfence` for folks who want this precise instruction.

vchuravy · 2025-03-11T10:54:19Z

@phoebewang @RKSimon adjusted it to be that. The only thing "lost" is the ability to flip it back mfence, but I assume that is fine?

RKSimon

LGTM - cheers

We try to only use X32 for gnux32 triple Noticed while reviewing llvm#106555 (cherry picked from commit bd3bde8)

Originally discussed in https://reviews.llvm.org/D129947 LLVM currently emits `mfence` for `__atomic_thread_fence(seq_cst)`. On modern CPUs lock or is more efficient and provides the same sequential consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html) and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632 moved into this direction as well, but didn't touch fence seq_cst. This switches to `lock or` on all x64 systems, and leaves `__builtin_ia32_mfence` for folks who want this precise instruction. (cherry picked from commit b334321)

We try to only use X32 for gnux32 triple Noticed while reviewing llvm#106555

alexfh · 2025-04-02T10:09:05Z

We're seeing a ~40% regression in a macrobenchmark after this change. Is Skylake-X considered modern enough to benefit from lock or or would mfence still be better for this microarchitecture?

vchuravy · 2025-04-02T12:04:50Z

All Core architectures should benefit from this change. Is your macrobenchmark single-threaded or contention heavy?

Skylake-X has 33 cycles for mfence and 18 for lock or according to uops.info

alexfh · 2025-04-02T18:22:57Z

All Core architectures should benefit from this change. Is your macrobenchmark single-threaded or contention heavy?

Skylake-X has 33 cycles for mfence and 18 for lock or according to uops.info

Thanks for the information. I don't know much about the nature of the benchmark, except that it deals with I/O. It's being looked at now by someone else. I just wanted to give an early heads-up of a possible problem with this change.

vchuravy · 2025-04-02T21:05:15Z

One thing to confirm is if the benchmark uses non-temporal stores in some ways, and additional if GCC has a similar regression (they did this change a few years ahead of us)

We try to only use X32 for gnux32 triple Noticed while reviewing llvm#106555 (cherry picked from commit bd3bde8)

Originally discussed in https://reviews.llvm.org/D129947 LLVM currently emits `mfence` for `__atomic_thread_fence(seq_cst)`. On modern CPUs lock or is more efficient and provides the same sequential consistency. GCC 11 made this switch as well (see https://gcc.gnu.org/pipermail/gcc-cvs/2020-July/314418.html) and https://reviews.llvm.org/D61863 and https://reviews.llvm.org/D58632 moved into this direction as well, but didn't touch fence seq_cst. This switches to `lock or` on all x64 systems, and leaves `__builtin_ia32_mfence` for folks who want this precise instruction. (cherry picked from commit b334321)

vchuravy added backend:X86 julialang labels Aug 29, 2024

RKSimon requested review from RKSimon, phoebewang and topperc August 29, 2024 15:19

RKSimon reviewed Aug 29, 2024

View reviewed changes

llvm/lib/Target/X86/X86.td Outdated Show resolved Hide resolved

llvm/lib/Target/X86/X86.td Outdated Show resolved Hide resolved

RKSimon mentioned this pull request Aug 29, 2024

Missed optimization for std::atomic_thread_fence(std::memory_order_seq_cst) #91731

Closed

phoebewang reviewed Aug 30, 2024

View reviewed changes

llvm/lib/Target/X86/X86.td Outdated Show resolved Hide resolved

vchuravy force-pushed the vc/avoid_mfence branch from 0bfeaaf to 796b0c1 Compare August 31, 2024 06:04

RKSimon reviewed Aug 31, 2024

View reviewed changes

llvm/test/CodeGen/X86/mfence.ll Outdated Show resolved Hide resolved

vchuravy force-pushed the vc/avoid_mfence branch from 796b0c1 to d32af20 Compare September 2, 2024 15:04

aharrison24 mentioned this pull request Sep 23, 2024

[libc++] Incorrect memory order in atomic wait #109290

Open

RKSimon reviewed Sep 30, 2024

View reviewed changes

llvm/lib/Target/X86/X86.td Outdated Show resolved Hide resolved

vchuravy force-pushed the vc/avoid_mfence branch from d32af20 to 707ca0e Compare September 30, 2024 14:09

RKSimon reviewed Nov 6, 2024

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

vchuravy mentioned this pull request Nov 30, 2024

Implement UnsafeAtomics.fence JuliaConcurrent/UnsafeAtomics.jl#19

Merged

gbaraldi mentioned this pull request Jan 24, 2025

Refactor scheduler and implement spinner thread for Partr. JuliaLang/julia#56475

Open

vchuravy force-pushed the vc/avoid_mfence branch from 707ca0e to 522ff6e Compare February 10, 2025 13:55

vchuravy force-pushed the vc/avoid_mfence branch from 522ff6e to b5f6005 Compare March 7, 2025 07:35

vchuravy requested review from RKSimon and phoebewang March 7, 2025 07:37

RKSimon added a commit that referenced this pull request Mar 7, 2025

[X86] mfence.ll - add COMMON prefix + replace X32 with X86 prefix

bd3bde8

We try to only use X32 for gnux32 triple Noticed while reviewing #106555

RKSimon reviewed Mar 10, 2025

View reviewed changes

vchuravy added 2 commits March 11, 2025 09:16

don't use tuning for avoid mfence

422a1d1

vchuravy force-pushed the vc/avoid_mfence branch from b5f6005 to 422a1d1 Compare March 11, 2025 10:53

fixup! don't use tuning for avoid mfence

aef29a3

RKSimon approved these changes Mar 11, 2025

View reviewed changes

phoebewang approved these changes Mar 11, 2025

View reviewed changes

vchuravy merged commit b334321 into llvm:main Mar 11, 2025
11 checks passed

vchuravy deleted the vc/avoid_mfence branch March 11, 2025 15:12

giordano pushed a commit to JuliaLang/llvm-project that referenced this pull request Mar 11, 2025

[X86] mfence.ll - add COMMON prefix + replace X32 with X86 prefix

0c6be71

We try to only use X32 for gnux32 triple Noticed while reviewing llvm#106555 (cherry picked from commit bd3bde8)

jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025

[X86] mfence.ll - add COMMON prefix + replace X32 with X86 prefix

ede5df8

We try to only use X32 for gnux32 triple Noticed while reviewing llvm#106555

Zentrik pushed a commit to JuliaLang/llvm-project that referenced this pull request Apr 14, 2025

[X86] mfence.ll - add COMMON prefix + replace X32 with X86 prefix

8467c6f

We try to only use X32 for gnux32 triple Noticed while reviewing llvm#106555 (cherry picked from commit bd3bde8)

[X86] Prefer lock or over mfence #106555

[X86] Prefer lock or over mfence #106555

Uh oh!

Conversation

vchuravy commented Aug 29, 2024 • edited by RKSimon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 29, 2024

Uh oh!

RKSimon commented Aug 29, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ConorWilliams commented Aug 31, 2024

Uh oh!

vchuravy commented Sep 27, 2024

Uh oh!

Uh oh!

giordano commented Nov 5, 2024

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

phoebewang commented Mar 11, 2025

Uh oh!

vchuravy commented Mar 11, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexfh commented Apr 2, 2025

Uh oh!

vchuravy commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexfh commented Apr 2, 2025

Uh oh!

vchuravy commented Apr 2, 2025

Uh oh!

Uh oh!

[X86] Prefer `lock or` over mfence #106555

[X86] Prefer `lock or` over mfence #106555

vchuravy commented Aug 29, 2024 •

edited by RKSimon

Loading

vchuravy commented Apr 2, 2025 •

edited

Loading