-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU] Add an option to disable unsafe uses of atomic xor #69229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @jansvoboda11, @arsenm, I haven't yet got a commit merged to LLVM and can't request a review through the UI. Could I get a review on this patch? Thanks in advance and apologies if I've missed a step in the submission process. |
The command-line changes look good to me. Adding @arsenm as a reviewer proper. |
Thanks Jan! |
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-backend-amdgpu Author: Pierre-Andre Saulais (pasaulais) ChangesOn some systems and when accessing certain memory areas it is not safe to use atomic xor with AMDGPU. This may be the case when using fine-grained memory allocations (e.g. through The issue has been worked around in DPC++/HIP by prefetching memory before executing kernels 2, however this adds an overhead that can outweigh the performance benefit of using atomic xor over a cmpxchg loop. This is why a way to switch between 'native' atomic xor and a cmpxchg-loop solution is needed. These changes add Full diff: https://github.com/llvm/llvm-project/pull/69229.diff 10 Files Affected:
diff --git a/clang/include/clang/Basic/TargetInfo.h b/clang/include/clang/Basic/TargetInfo.h
index 9d56e97a3d4bb88..96416c4405115cd 100644
--- a/clang/include/clang/Basic/TargetInfo.h
+++ b/clang/include/clang/Basic/TargetInfo.h
@@ -253,6 +253,8 @@ class TargetInfo : public TransferrableTargetInfo,
unsigned AllowAMDGPUUnsafeFPAtomics : 1;
+ unsigned AllowAMDGPUUnsafeIntAtomics : 1;
+
unsigned ARMCDECoprocMask : 8;
unsigned MaxOpenCLWorkGroupSize;
@@ -995,6 +997,11 @@ class TargetInfo : public TransferrableTargetInfo,
/// allowed.
bool allowAMDGPUUnsafeFPAtomics() const { return AllowAMDGPUUnsafeFPAtomics; }
+ /// Returns whether or not the AMDGPU unsafe integer atomics are allowed.
+ bool allowAMDGPUUnsafeIntAtomics() const {
+ return AllowAMDGPUUnsafeIntAtomics;
+ }
+
/// For ARM targets returns a mask defining which coprocessors are configured
/// as Custom Datapath.
uint32_t getARMCDECoprocMask() const { return ARMCDECoprocMask; }
diff --git a/clang/include/clang/Basic/TargetOptions.h b/clang/include/clang/Basic/TargetOptions.h
index ba3acd029587160..4feb6f2dcd86b83 100644
--- a/clang/include/clang/Basic/TargetOptions.h
+++ b/clang/include/clang/Basic/TargetOptions.h
@@ -78,6 +78,9 @@ class TargetOptions {
/// \brief If enabled, allow AMDGPU unsafe floating point atomics.
bool AllowAMDGPUUnsafeFPAtomics = false;
+ /// \brief If enabled, allow AMDGPU unsafe integer atomics.
+ bool AllowAMDGPUUnsafeIntAtomics = true;
+
/// \brief Enumeration value for AMDGPU code object version, which is the
/// code object version times 100.
enum CodeObjectVersionKind {
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a89d6b6579f1176..265f5c23e7858ee 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4705,6 +4705,14 @@ defm unsafe_fp_atomics : BoolOption<"m", "unsafe-fp-atomics",
"for certain memory destinations. (AMDGPU only)">,
NegFlag<SetFalse>>, Group<m_Group>;
+defm unsafe_int_atomics : BoolOption<"m", "unsafe-int-atomics",
+ TargetOpts<"AllowAMDGPUUnsafeIntAtomics">, DefaultTrue,
+ PosFlag<SetTrue, [], [ClangOption, CC1Option],
+ "Enable generation of unsafe integer atomic instructions. May "
+ "generate more efficient code, but may give incorrect results "
+ "for certain memory destinations. (AMDGPU only)">,
+ NegFlag<SetFalse, [], [ClangOption, CC1Option]>>, Group<m_Group>;
+
def faltivec : Flag<["-"], "faltivec">, Group<f_Group>, Flags<[NoXarchOption]>;
def fno_altivec : Flag<["-"], "fno-altivec">, Group<f_Group>,
Flags<[NoXarchOption]>;
diff --git a/clang/lib/Basic/TargetInfo.cpp b/clang/lib/Basic/TargetInfo.cpp
index 6cd5d618a4acaa5..06997cc54642623 100644
--- a/clang/lib/Basic/TargetInfo.cpp
+++ b/clang/lib/Basic/TargetInfo.cpp
@@ -157,6 +157,7 @@ TargetInfo::TargetInfo(const llvm::Triple &T) : Triple(T) {
HasAArch64SVETypes = false;
HasRISCVVTypes = false;
AllowAMDGPUUnsafeFPAtomics = false;
+ AllowAMDGPUUnsafeIntAtomics = true;
ARMCDECoprocMask = 0;
// Default to no types using fpret.
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 409ae32ab424215..20ad8e5c59c2b66 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -232,6 +232,7 @@ AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple &Triple,
HasFloat16 = true;
WavefrontSize = GPUFeatures & llvm::AMDGPU::FEATURE_WAVE32 ? 32 : 64;
AllowAMDGPUUnsafeFPAtomics = Opts.AllowAMDGPUUnsafeFPAtomics;
+ AllowAMDGPUUnsafeIntAtomics = Opts.AllowAMDGPUUnsafeIntAtomics;
// Set pointer width and alignment for the generic address space.
PointerWidth = PointerAlign = getPointerWidthV(LangAS::Default);
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index f6a614b3e4d54dd..9be15d89d3f2f46 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -409,6 +409,9 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes(
if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");
+ if (!M.getContext().getTargetInfo().allowAMDGPUUnsafeIntAtomics())
+ F->addFnAttr("amdgpu-unsafe-int-atomics", "false");
+
if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
F->addFnAttr("amdgpu-ieee", "false");
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 94c184435ae14de..be3076cdbdc4a68 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7372,6 +7372,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
Args.addOptInFlag(CmdArgs, options::OPT_munsafe_fp_atomics,
options::OPT_mno_unsafe_fp_atomics);
+ Args.addOptOutFlag(CmdArgs, options::OPT_munsafe_int_atomics,
+ options::OPT_mno_unsafe_int_atomics);
Args.addOptOutFlag(CmdArgs, options::OPT_mamdgpu_ieee,
options::OPT_mno_amdgpu_ieee);
}
diff --git a/clang/test/CodeGenOpenCL/amdgpu-unsafe-int-atomics.cl b/clang/test/CodeGenOpenCL/amdgpu-unsafe-int-atomics.cl
new file mode 100644
index 000000000000000..02dfddd5791b405
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/amdgpu-unsafe-int-atomics.cl
@@ -0,0 +1,55 @@
+// REQUIRES: amdgpu-registered-target
+//
+// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -O3 -S -o - %s \
+// RUN: -emit-llvm \
+// RUN: | FileCheck -check-prefixes=COMMON,UNSAFE-INT-DEFAULT %s
+// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -O3 -S -o - %s \
+// RUN: -emit-llvm -munsafe-int-atomics \
+// RUN: | FileCheck -check-prefixes=COMMON,UNSAFE-INT-ON %s
+// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -O3 -S -o - %s \
+// RUN: -emit-llvm -mno-unsafe-int-atomics \
+// RUN: | FileCheck -check-prefixes=COMMON,UNSAFE-INT-OFF %s
+
+// Check AMDGCN ISA generation.
+
+// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -O3 -S -o - %s \
+// RUN: -munsafe-int-atomics \
+// RUN: | FileCheck -check-prefixes=COMMON-ISA,ISA-ON %s
+// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -O3 -S -o - %s \
+// RUN: -mno-unsafe-int-atomics \
+// RUN: | FileCheck -check-prefixes=COMMON-ISA,ISA-OFF %s
+
+#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
+#pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable
+
+typedef enum memory_order {
+ memory_order_relaxed = __ATOMIC_RELAXED,
+ memory_order_acquire = __ATOMIC_ACQUIRE,
+ memory_order_release = __ATOMIC_RELEASE,
+ memory_order_acq_rel = __ATOMIC_ACQ_REL,
+ memory_order_seq_cst = __ATOMIC_SEQ_CST
+} memory_order;
+
+typedef enum memory_scope {
+ memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM,
+ memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP,
+ memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE,
+ memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES,
+#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups)
+ memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP
+#endif
+} memory_scope;
+
+// COMMON-ISA: kern:
+// ISA-ON: flat_atomic_xor v{{[0-9]+}}, v[{{[0-9]+}}:{{[0-9]+}}], v{{[0-9]+}} glc
+// ISA-OFF: flat_atomic_cmpswap v{{[0-9]+}}, v[{{[0-9]+}}:{{[0-9]+}}], v[{{[0-9]+}}:{{[0-9]+}}] glc
+kernel void kern(global atomic_int *x, int y, global int *z) {
+ *z = __opencl_atomic_fetch_xor(x, y, memory_order_seq_cst, memory_scope_work_group);
+}
+
+// COMMON: define{{.*}} amdgpu_kernel void @kern{{.*}} [[ATTRS1:#[0-9]+]]
+// COMMON: atomicrmw xor ptr addrspace(1) %x, i32 %y syncscope("workgroup") seq_cst, align 4
+//
+// UNSAFE-INT-DEFAULT-NOT: attributes [[ATTRS1]] = {{.*}} "amdgpu-unsafe-int-atomics"
+// UNSAFE-INT-ON-NOT: attributes [[ATTRS1]] = {{.*}} "amdgpu-unsafe-int-atomics"
+// UNSAFE-INT-OFF: attributes [[ATTRS1]] = {{.*}} "amdgpu-unsafe-int-atomics"="false"
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 9c5b166c9652238..c379cf4dd64642f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -14975,6 +14975,14 @@ bool unsafeFPAtomicsDisabled(Function *F) {
"true";
}
+// The amdgpu-unsafe-int-atomics attribute disables generation of unsafe
+// integer atomic instructions. May generate more efficient code, but may
+// give incorrect results for certain memory destinations.
+bool unsafeIntAtomicsDisabled(Function *F) {
+ return F->getFnAttribute("amdgpu-unsafe-int-atomics").getValueAsString() ==
+ "false";
+}
+
TargetLowering::AtomicExpansionKind
SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
unsigned AS = RMW->getPointerAddressSpace();
@@ -15097,6 +15105,13 @@ SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
}
break;
}
+ case AtomicRMWInst::Xor: {
+ if (AMDGPU::isFlatGlobalAddrSpace(AS) &&
+ unsafeIntAtomicsDisabled(RMW->getFunction())) {
+ return AtomicExpansionKind::CmpXChg;
+ }
+ break;
+ }
default:
break;
}
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
index 92b70ad797b88ae..6cc97019d3ff5ce 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
@@ -1,8 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX908 %s
-; RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefix=GFX90A %s
-; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX940 %s
-; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX1100 %s
+; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX908 %s
+; RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX90A %s
+; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX940 %s
+; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX1100 %s
define float @syncscope_system(ptr %addr, float %val) #0 {
; GFX908-LABEL: syncscope_system:
@@ -391,4 +391,19 @@ define float @no_unsafe(ptr %addr, float %val) {
ret float %res
}
+define i32 @default_xor(ptr %addr, i32 %val) {
+ ; GCN-LABEL: default_xor:
+ ; GCN: flat_atomic_xor
+ %res = atomicrmw xor ptr %addr, i32 %val seq_cst
+ ret i32 %res
+}
+
+define i32 @no_unsafe_xor(ptr %addr, i32 %val) #1 {
+ ; GCN-LABEL: no_unsafe_xor:
+ ; GCN: flat_atomic_cmpswap
+ %res = atomicrmw xor ptr %addr, i32 %val seq_cst
+ ret i32 %res
+}
+
attributes #0 = { "amdgpu-unsafe-fp-atomics"="true" }
+attributes #1 = { "amdgpu-unsafe-int-atomics"="false" }
|
Ping @hdelan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am opposed to adding any additional attributes for this. I very much want to remove the current unsafe-fp-atomics attribute. It's broken in the presence of inlining as it is, and at some point acquired too many meanings. There's also should be no reason to have separate FP and integer controls, there's nothing fundamental about the type here.
What we really need to do is have instruction level metadata annotations, separating the FP restrictions and the communication restrictions (which may or may not be isomorphic to a different scope). I have an incomplete patch where I was working towards this
Thanks for the review. I agree that function-level attributes are not ideal for solving this issue and instruction-level metadata would work better with things like inlining. Is the incomplete patch you mentioned something I could take on and complete? Regarding int vs floating-point, I'm afraid there is a need for toggling one independently of the other (or at least special-casing operations like XOR that are not supported by PCIe 3.0). As the link I posted in the description mentions (see this comment ROCm/ROCm#2481 (comment)), there are configurations where using FP atomics like add would work whereas XOR doesn't, due to missing support in the PCIe 3.0 spec. I have reproduced this on a system with a RX 6700 XT GPU, where |
@arsenm, could you share this unfinished patch you were working on? I could start from scratch but I don't want to duplicate the work you've already done. |
…f atomic builtin (#11876) 1. Fix a race condition in cmpxchg 2. Change `atomic_xor` to use a CAS loop instead of atomic builtin. This is needed to merge this in UR oneapi-src/unified-runtime#936 so that perf regression can be fixed. The long term fix is to use a compiler flag to choose between builtin and safe CAS implementation, but talks upstream may take some time to figure out the details. See llvm/llvm-project#69229
508e577
to
24b0e52
Compare
24b0e52
to
cd33e84
Compare
We're gradually converging on something that looks like this, subject to bike shedding the name |
I did not know about this PR. It is interesting that our other discussions lead to similar solution. I agree that per-instruction metadata is needed, and the metadata should convey separate control for fp atomic and unsupported integer atomic across PCIE. The reason to use per-instruction metadata instead of per-function metadata or attribute is that per-function attribute does not work well for inlined functions. As for controlling FE emitting this metadata, a more-fine-grained control e.g. pragma is desirable, however, for this patch, a clang option is probably sufficient. We could consider pragma control later. We need to coin some good concise name for the metadata: For unsafe fp atomic - unsafe_fp ? For unsupported integer atomic across PCIE - unsafe_pcie ? fine_grained may not be suitable since fine_grained memory accessed across XGMI supports integer atomic AND/OR/XOR ops etc |
We're aware of the concerns regarding atomics handling. We're trying to find the approach which is least distasteful to all parties. I expect we will address the concern raised here, but not necessarily in the same way. |
In this case, MMRAs would only help in the sense that you won't need any new attributes and can just add an MMRA such as |
It is semantics. You get undefined behavior if you access fine grained memory for an atomic where the instruction doesn't handle it. I don't think it quite fits into the MMRA box, since it isn't about the synchronization properties of the memory access but where the underlying memory is |
@arsenm That makes sense, I don't think MMRA fits the fine-grained use case either. Does that mean we can stick with the approach from this PR? @b-sumner mentioned there was another similar approach being worked on. |
Something like this, but the naming and direction of this PR is backwards. The default should be assume fine grained memory is possible. We also have another orthogonal bit we need to track in addition to fine grained memory. I was envisioning this as a single integer interpreted as bitfields of the two, but this uses different key:value metadata pairs. It should be named "amdgpu.no.something" to assert the operation doesn't access fine grained |
Add a spec for yet-to-be-implemented metadata to allow the backend to fully handle atomicrmw lowering. This is the base of an alternative to llvm#69229, which inverts the direction to be correct by default, and extends to cover the peer device case. Could use a better name
@arsenm I agree that the default should be assuming fine-grained is possible. My thinking behind the original naming and direction was not wanting to introduce an unexpected performance regression by default. I'm happy for both to be changed, and this patch being rebased on top of #85052 once it is merged. One oustanding question for me is, although outside of the scope of this PR, how will the original 'no-unsafe-fp' option fit in the new metadata node in terms of constraints? Would it imply a new constraint or be covered by |
Also, this will need to be extended to other operations that can be impacted by fine-grained and remote access like |
The opting for fast-and-maybe-broken by default needs to be a frontend decision (i.e. the language can choose to add the metadata to all atomics). I believe @yxsamliu is going to be working on the frontend half which will preserve the HIP behavior
I believe we need an additional piece of metadata (which I have another draft for) to express the relaxable floating point. We can express the unsafe-fp-math option with the 2 combined, and then can drop the old IR attribute |
Will the metadata for unsafe-fp-atomics also be controlled by the pragma that controls the no-fine-grained and no-remote metadata? e.g. something like
|
Yes, I would expect something like that for consistency. The most problematic part is the denormal flushing in some address spaces, and secondarily the fixed rounding mode in other cases |
we might document that those are not supported for now. if users really need them, introducing more controls to support them |
Sure. I think it's easiest to make progress if we fix the integer cases as a first step |
AMDGPU reflect pass is needed to choose between safe and unsafe atomics at the libclc level. In the long run we will delete this patch as work is being done to ensure correct lowering of atomic instructions. See patches: llvm/llvm-project#85052 llvm/llvm-project#69229 This work is necessary as malloc shared atomics rely on PCIe atomics which can have patchy and unreliable support. We want to therefore be able to choose at compile time whether we should use safe atomics using CAS (which PCIe should support), or if we want to rely of the availability of the newest PCIe atomics, if malloc shared atomics are desired. Also changes the implementation of Or, And so that they can choose between the safe or unsafe version based on the AMDGPU reflect value.
… AMDGPU atomics (#11467) AMDGPU reflect pass is needed to choose between safe and unsafe atomics at the libclc level. In the long run we will delete this patch as work is being done to ensure correct lowering of atomic instructions. See patches: llvm/llvm-project#85052 llvm/llvm-project#69229 This work is necessary as malloc shared atomics rely on PCIe atomics which can have patchy and unreliable support. Therefore, we want to be able to choose at compile time whether we should use safe atomics using CAS (which PCIe should support), or if we want to rely of the availability of the newest PCIe atomics, if malloc shared atomics are desired. Also changes the implementation of `atomic_or`, `atomic_and` so that they can choose between the safe or unsafe version based on the AMDGPU reflect value.
Add a spec for yet-to-be-implemented metadata to allow the backend to fully handle atomicrmw lowering. This is the base of an alternative to #69229, which inverts the direction to be correct by default, and extends to cover the peer device case.
Add a spec for yet-to-be-implemented metadata to allow the backend to fully handle atomicrmw lowering. This is the base of an alternative to llvm#69229, which inverts the direction to be correct by default, and extends to cover the peer device case.
On some systems and when accessing certain memory areas it is not safe to use atomic xor with AMDGPU. This may be the case when using fine-grained memory allocations (e.g. through
hipAllocManaged
) and when the atomic operation needs to go through the PCIe bus, which does not support atomic xor (PCIe 3.0 does support other atomic operations like fetch_and_add and cmpxchg) 1.The issue has been worked around in DPC++/HIP by prefetching memory before executing kernels 2, however this adds an overhead that can outweigh the performance benefit of using atomic xor over a cmpxchg loop. This is why a way to switch between 'native' atomic xor and a cmpxchg-loop solution is needed.
These changes add a
-mamdgpu-fine-grained-mem
option to clang that can be used to switch between the two implementations. The default is the current behaviour of generating 'native' atomic xor instructions. When-mamdgpu-fine-grained-mem
is passed to clang, atomic xor instructions are marked with!amdgpu.atomic
instruction-level metadata at the IR level and this tells the backend to expand atomic xor to a cmpxchg loop. The options only affect the global and flat address spaces.A
!amdgpu.atomic
metadata node is a set of key-value pairs that provide more information about the atomic operation.fine_grained
is currently the only supported key. When the value is1
, the atomic operation can potentially take fine-grained memory pointers. The default value when this pair is not present is0
, i.e. fine-grained memory pointers are not passed to this instruction.