[RFC] IR: Define noalias.addrspace metadata #102461

arsenm · 2024-08-08T13:19:04Z

This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We therefore
need to codegen this by inserting a runtime address space check, performing
the private case without atomics, and fallback to issuing the real atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.

arsenm · 2024-08-08T13:19:24Z

[RFC] IR: Define noalias.addrspace metadata #102461 : 3 dependent PRs (#102462 , #102599 , #103938 ) 👈
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @arsenm and the rest of your teammates on Graphite

llvmbot · 2024-08-08T13:19:46Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)

Changes

This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We therefore
need to codegen this by inserting a runtime address space check, performing
the private case without atomics, and fallback to issuing the real atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.

Full diff: https://github.com/llvm/llvm-project/pull/102461.diff

6 Files Affected:

(modified) llvm/docs/LangRef.rst (+36)
(modified) llvm/docs/ReleaseNotes.rst (+2)
(modified) llvm/include/llvm/IR/FixedMetadataKinds.def (+1)
(modified) llvm/lib/IR/Verifier.cpp (+28-6)
(added) llvm/test/Assembler/noalias-addrspace-md.ll (+110)
(added) llvm/test/Verifier/noalias-addrspace.ll (+60)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b17e3c828ed3d5..f03b6508c649ce 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -8015,6 +8015,42 @@ it will contain a list of ids, including the ids of the callsites in the
 full inline sequence, in order from the leaf-most call's id to the outermost
 inlined call.
 
+
+'``noalias.addrspace``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``noalias.addrspace`` metadata is used to identify memory
+operations which cannot access a range of address spaces. It is
+attached to memory instructions, including :ref:`atomicrmw
+<i_atomicrmw>`, :ref:`cmpxchg <i_cmpxchg>`, and :ref:`call <i_call>`
+instructions.
+
+This follows the same form as :ref:`range metadata <_range-metadata>`,
+except the field entries must be of type `i32`. The interpretation is
+the same numeric address spaces as applied to IR values.
+
+Example:
+
+.. code-block:: llvm
+    ; %ptr cannot point to an object allocated in addrspace(5)
+    %rmw.valid = atomicrmw and ptr %ptr, i64 %value seq_cst, !noalias.addrspace !0
+
+    ; Undefined behavior. The underlying object is allocated in one of the listed
+    ; address spaces.
+    %alloca = alloca i64, addrspace(5)
+    %alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr
+    %rmw.ub = atomicrmw and ptr %alloca.cast, i64 %value seq_cst, !noalias.addrspace !0
+
+    !0 = !{i32 5, i32 6}
+
+
+This is intended for use on targets with a notion of generic address
+spaces, which at runtime resolve to different physical memory
+spaces. The interpretation of the address space values is target
+specific. The behavior is undefined if the runtime memory address does
+resolve to an object defined in one of the indicated address spaces.
+
+
 Module Flags Metadata
 =====================
 
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 1ed860de6b9dce..41460c4a75c148 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -53,6 +53,8 @@ Changes to the LLVM IR
 * The ``x86_mmx`` IR type has been removed. It will be translated to
   the standard vector type ``<1 x i64>`` in bitcode upgrade.
 
+* Introduced `noalias.addrspace` metadata.
+
 Changes to LLVM infrastructure
 ------------------------------
 
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index 5f4cc230a0f5ff..df572e8791e13b 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -52,3 +52,4 @@ LLVM_FIXED_MD_KIND(MD_pcsections, "pcsections", 37)
 LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
 LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
 LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
+LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41)
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 402af376651eb6..2b05ec358533c6 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -515,8 +515,9 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
   void visitFunction(const Function &F);
   void visitBasicBlock(BasicBlock &BB);
   void verifyRangeMetadata(const Value &V, const MDNode *Range, Type *Ty,
-                           bool IsAbsoluteSymbol);
+                           bool IsAbsoluteSymbol, bool IsAddrSpaceRange);
   void visitRangeMetadata(Instruction &I, MDNode *Range, Type *Ty);
+  void visitNoaliasAddrspaceMetadata(Instruction &I, MDNode *Range, Type *Ty);
   void visitDereferenceableMetadata(Instruction &I, MDNode *MD);
   void visitProfMetadata(Instruction &I, MDNode *MD);
   void visitCallStackMetadata(MDNode *MD);
@@ -760,7 +761,7 @@ void Verifier::visitGlobalValue(const GlobalValue &GV) {
     if (const MDNode *AbsoluteSymbol =
             GO->getMetadata(LLVMContext::MD_absolute_symbol)) {
       verifyRangeMetadata(*GO, AbsoluteSymbol, DL.getIntPtrType(GO->getType()),
-                          true);
+                          true, false);
     }
   }
 
@@ -4128,7 +4129,8 @@ static bool isContiguous(const ConstantRange &A, const ConstantRange &B) {
 /// Verify !range and !absolute_symbol metadata. These have the same
 /// restrictions, except !absolute_symbol allows the full set.
 void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
-                                   Type *Ty, bool IsAbsoluteSymbol) {
+                                   Type *Ty, bool IsAbsoluteSymbol,
+                                   bool IsAddrSpaceRange) {
   unsigned NumOperands = Range->getNumOperands();
   Check(NumOperands % 2 == 0, "Unfinished range!", Range);
   unsigned NumRanges = NumOperands / 2;
@@ -4145,8 +4147,14 @@ void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
 
     Check(High->getType() == Low->getType(), "Range pair types must match!",
           &I);
-    Check(High->getType() == Ty->getScalarType(),
-          "Range types must match instruction type!", &I);
+
+    if (IsAddrSpaceRange) {
+      Check(High->getType()->isIntegerTy(32),
+            "noalias.addrspace type must be i32!", &I);
+    } else {
+      Check(High->getType() == Ty->getScalarType(),
+            "Range types must match instruction type!", &I);
+    }
 
     APInt HighV = High->getValue();
     APInt LowV = Low->getValue();
@@ -4185,7 +4193,14 @@ void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
 void Verifier::visitRangeMetadata(Instruction &I, MDNode *Range, Type *Ty) {
   assert(Range && Range == I.getMetadata(LLVMContext::MD_range) &&
          "precondition violation");
-  verifyRangeMetadata(I, Range, Ty, false);
+  verifyRangeMetadata(I, Range, Ty, false, false);
+}
+
+void Verifier::visitNoaliasAddrspaceMetadata(Instruction &I, MDNode *Range,
+                                             Type *Ty) {
+  assert(Range && Range == I.getMetadata(LLVMContext::MD_noalias_addrspace) &&
+         "precondition violation");
+  verifyRangeMetadata(I, Range, Ty, false, true);
 }
 
 void Verifier::checkAtomicMemAccessSize(Type *Ty, const Instruction *I) {
@@ -5177,6 +5192,13 @@ void Verifier::visitInstruction(Instruction &I) {
     visitRangeMetadata(I, Range, I.getType());
   }
 
+  if (MDNode *Range = I.getMetadata(LLVMContext::MD_noalias_addrspace)) {
+    Check(isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicRMWInst>(I) ||
+              isa<AtomicCmpXchgInst>(I) || isa<CallInst>(I),
+          "noalias.addrspace are only for memory operations!", &I);
+    visitNoaliasAddrspaceMetadata(I, Range, I.getType());
+  }
+
   if (I.hasMetadata(LLVMContext::MD_invariant_group)) {
     Check(isa<LoadInst>(I) || isa<StoreInst>(I),
           "invariant.group metadata is only for loads and stores", &I);
diff --git a/llvm/test/Assembler/noalias-addrspace-md.ll b/llvm/test/Assembler/noalias-addrspace-md.ll
new file mode 100644
index 00000000000000..62fabad86f683a
--- /dev/null
+++ b/llvm/test/Assembler/noalias-addrspace-md.ll
@@ -0,0 +1,110 @@
+; RUN: llvm-as < %s | llvm-dis | FileCheck %s
+
+define i64 @atomicrmw_noalias_addrspace__0_1(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__0_1(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META0:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !0
+  ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__0_2(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__0_2(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META1:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !1
+  ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__1_3(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__1_3(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META2:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !2
+  ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__multiple_ranges(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__multiple_ranges(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META3:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !3
+  ret i64 %ret
+}
+
+define i64 @load_noalias_addrspace__5_6(ptr %ptr) {
+; CHECK-LABEL: define i64 @load_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = load i64, ptr [[PTR]], align 4, !noalias.addrspace [[META4:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = load i64, ptr %ptr, align 4, !noalias.addrspace !4
+  ret i64 %ret
+}
+
+define void @store_noalias_addrspace__5_6(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define void @store_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    store i64 [[VAL]], ptr [[PTR]], align 4, !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret void
+;
+  store i64 %val, ptr %ptr, align 4, !noalias.addrspace !4
+  ret void
+}
+
+define { i64, i1 } @cmpxchg_noalias_addrspace__5_6(ptr %ptr, i64 %val0, i64 %val1) {
+; CHECK-LABEL: define { i64, i1 } @cmpxchg_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL0:%.*]], i64 [[VAL1:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = cmpxchg ptr [[PTR]], i64 [[VAL0]], i64 [[VAL1]] monotonic monotonic, align 8, !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret { i64, i1 } [[RET]]
+;
+  %ret = cmpxchg ptr %ptr, i64 %val0, i64 %val1 monotonic monotonic, align 8, !noalias.addrspace !4
+  ret { i64, i1 } %ret
+}
+
+declare void @foo()
+
+define void @call_noalias_addrspace__5_6(ptr %ptr) {
+; CHECK-LABEL: define void @call_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT:    call void @foo(), !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret void
+;
+  call void @foo(), !noalias.addrspace !4
+  ret void
+}
+
+define void @call_memcpy_intrinsic_addrspace__5_6(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @call_memcpy_intrinsic_addrspace__5_6(
+; CHECK-SAME: ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) {
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i1 false), !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret void
+;
+  call void @llvm.memcpy.p0.p0.i64(ptr %dst, ptr %src, i64 %size, i1 false), !noalias.addrspace !4
+  ret void
+}
+
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #0
+
+attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
+
+!0 = !{i32 0, i32 1}
+!1 = !{i32 0, i32 2}
+!2 = !{i32 1, i32 3}
+!3 = !{i32 4, i32 6, i32 10, i32 55}
+!4 = !{i32 5, i32 6}
+;.
+; CHECK: [[META0]] = !{i32 0, i32 1}
+; CHECK: [[META1]] = !{i32 0, i32 2}
+; CHECK: [[META2]] = !{i32 1, i32 3}
+; CHECK: [[META3]] = !{i32 4, i32 6, i32 10, i32 55}
+; CHECK: [[META4]] = !{i32 5, i32 6}
+;.
diff --git a/llvm/test/Verifier/noalias-addrspace.ll b/llvm/test/Verifier/noalias-addrspace.ll
new file mode 100644
index 00000000000000..67a7293d2561cc
--- /dev/null
+++ b/llvm/test/Verifier/noalias-addrspace.ll
@@ -0,0 +1,60 @@
+; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
+
+; CHECK: It should have at least one range!
+; CHECK-NEXT: !0 = !{}
+define i64 @noalias_addrspace__empty(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !0
+  ret i64 %ret
+}
+
+; CHECK: Unfinished range!
+; CHECK-NEXT: !1 = !{i32 0}
+define i64 @noalias_addrspace__single_field(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !1
+  ret i64 %ret
+}
+
+; CHECK: Range must not be empty!
+; CHECK-NEXT: !2 = !{i32 0, i32 0}
+define i64 @noalias_addrspace__0_0(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !2
+  ret i64 %ret
+}
+
+; CHECK: noalias.addrspace type must be i32!
+; CHECK-NEXT: %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !3
+define i64 @noalias_addrspace__i64(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !3
+  ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__fp(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !4
+  ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__ptr(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !5
+  ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__nonconstant(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !6
+  ret i64 %ret
+}
+
+@gv0 = global i32 0
+@gv1 = global i32 1
+
+!0 = !{}
+!1 = !{i32 0}
+!2 = !{i32 0, i32 0}
+!3 = !{i64 1, i64 5}
+!4 = !{float 0.0, float 2.0}
+!5 = !{ptr null, ptr addrspace(1) null}
+!6 = !{i32 ptrtoint (ptr @gv0 to i32), i32 ptrtoint (ptr @gv1 to i32) }
+
+

shiltian

Does it have to be a range? What if the address spaces of interest are not consecutive?

arsenm · 2024-08-08T14:41:17Z

Does it have to be a range? What if the address spaces of interest are not consecutive?

You can specify as many ranges as you want

llvm/lib/IR/Verifier.cpp

llvm/test/Verifier/noalias-addrspace.ll

llvm/docs/LangRef.rst

yxsamliu · 2024-08-08T21:19:39Z

Is the MD a promise to the backend that the attached memory operation is not in a specific address space?

Who will attach the MD to the IR? FE or some LLVM passes? FE may not be able to figure out the real address space of some memory operations.

arsenm · 2024-08-09T11:05:16Z

Is the MD a promise to the backend that the attached memory operation is not in a specific address space?

Yes, for the underlying object definition.

Who will attach the MD to the IR? FE or some LLVM passes? FE may not be able to figure out the real address space of some memory operations.

The main user is the frontend since OpenCL/CUDA give the guarantee. We can also infer this in many cases in backend passes.

hvdijk · 2024-08-09T11:10:42Z

llvm/docs/LangRef.rst

+spaces, which at runtime resolve to different physical memory
+spaces. The interpretation of the address space values is target
+specific. The behavior is undefined if the runtime memory address does
+resolve to an object defined in one of the indicated address spaces.


It is not clear to me from this what the intended behaviour is when this metadata is used to specify an object is not in the generic address space. Does that effectively mean this code must be unreachable, or does that say nothing because the concrete address space any object is in will not be the generic address space?

I'm not exactly sure how to parse this question.

when this metadata is used to specify an object is not in the generic address space

Do you mean, in the amdgpu case, !{i32 0, i32 1}?

because the concrete address space any object is in will not be the generic address space?

We avoid creating objects in the generic address space (except functions), but it shouldn't really matter?

Do you mean, in the amdgpu case, !{i32 0, i32 1}?

Suppose you just have !noalias.addrspace !0 with !0 = !{i32 1}.

We avoid creating objects in the generic address space (except functions), but it shouldn't really matter?

If you have an object in the global address space 0, which is also addressable using the generic address space 1 (please correct me if I am getting the AMDGPU address spaces wrong), would the above !noalias.addrspace specify that it can or that it cannot alias that object? I cannot see a good reason why a frontend would generate that directly, but I can imagine future additions where a frontend would generate that based on user-specified address spaces in source code, so I think it would be good to have that answered upfront.

If you have an object in the global address space 0, which is also addressable using the generic address space 1

These are backwards, 0 is generic and 1 is global.

To simplify your question with an example, I would interpret

@globalvar.in.0 = addrspace(0) constant i64 @globalvar.in.1 = addrspace(1) constant i64 ; instant undefined behavior, can fold to trap %rmw.ub = atomicrmw and ptr @globalvar.in.0, i64 %value seq_cst, !noalias.addrspace !0 ; OK, underlying object is in addrspace(1) %rmw.ok0 = atomicrmw and ptr addrspace(1) @globalvar.in.1, i64 %value seq_cst, !noalias.addrspace !0 ; OK, underlying object is in addrspace(1) even though access is through forbidden addrspace(0) the object ; definition address space is what matters %cast.gv.in.1 = addrspacecast ptr addrspace(1) @globalvar.in.1 to ptr %rmw.ok0 = atomicrmw and ptr %cast.gv.in.1, i64 %value seq_cst, !noalias.addrspace !0 !0 = !{i32 0, i32 1} ; cannot alias 0

To simplify your question with an example, I would interpret

@globalvar.in.0 = addrspace(0) constant i64 @globalvar.in.1 = addrspace(1) constant i64 ; instant undefined behavior, can fold to trap %rmw.ub = atomicrmw and ptr @globalvar.in.0, i64 %value seq_cst, !noalias.addrspace !0 ; OK, underlying object is in addrspace(1) %rmw.ok0 = atomicrmw and ptr addrspace(1) @globalvar.in.1, i64 %value seq_cst, !noalias.addrspace !0 ; OK, underlying object is in addrspace(1) even though access is through forbidden addrspace(0) the object ; definition address space is what matters %cast.gv.in.1 = addrspacecast ptr addrspace(1) @globalvar.in.1 to ptr %rmw.ok0 = atomicrmw and ptr %cast.gv.in.1, i64 %value seq_cst, !noalias.addrspace !0 !0 = !{i32 0, i32 1} ; cannot alias 0

This example is confusing for me. Since addr space 0 is a union of all concrete addr spaces including 1, 2, 3, 5, etc, an actual global pointer with MD promising not alias to 0 is self-contradictory.

If in this example the MD is promising not alias to 5 then it has no confusion.

Then it comes the question: does it really meaningful to have a MD promising not alias to generic addr space 0? Should that be disallowed?

The IR does not have a concept of generic address spaces, and this description is not written in terms of a generic address space. It merely states the address space of the underlying object allocation

gonzalobg · 2024-08-09T11:11:27Z

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory.

This is not true for CUDA, an LLVM IR atomic instruction on thread private memory is fine.
(The LLVM NVPTX backend may have a bug that generates wrong PTX, but that's a backend bug).

hvdijk · 2024-08-09T11:18:11Z

This is not true for CUDA, an LLVM IR atomic instruction on thread private memory is fine.

Are you sure? The CUDA C++ Programming Guide appears to agree that atomic functions cannot act on private memory. From 10.14. Atomic Functions:

An atomic function performs a read-modify-write atomic operation on one 32-bit, 64-bit, or 128-bit word residing in global or shared memory.

gonzalobg · 2024-09-04T14:55:17Z

This extension looks good to me and it can be useful in a few places.
Looking forward to see how it ends up being used.

I hope we watch out for and resist the temptation to use it to reify backend bugs as programming model "features", i.e., instead of declaring that some backend not supporting some op to some statespace is a backend bug, we end up declaring that performing that op to that statespace is UB in the programming model, and end up using this feature to encode that.

arsenm · 2024-09-04T16:13:29Z

I hope we watch out for and resist the temptation to use it to reify backend bugs as programming model "features", i.e., instead of declaring that some backend not supporting some op to some statespace is a backend bug, we end up declaring that performing that op to that statespace is UB in the programming model, and end up using this feature to encode that.

But that's what this is doing? We're making the backend handle the IR operation with fully correct generality, and frontends have to opt-in to get the optimizations in the cases the language specifications say are not required

jrtc27 · 2024-09-06T11:53:56Z

From a CHERI perspective (one of the other big users of LLVM address spaces) this is entirely irrelevant (our address spaces are entirely coincident), so no objection there.

nikic

Looks reasonable to me.

nikic · 2024-09-06T12:58:06Z

llvm/docs/LangRef.rst

+    %alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr
+    %rmw.ub = atomicrmw and ptr %alloca.cast, i64 %value seq_cst, !noalias.addrspace !0
+
+    !0 = !{i32 5, i32 6}


It would probably make sense to add a comment here.

jurahul · 2024-09-09T14:13:13Z

llvm/docs/LangRef.rst

+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``noalias.addrspace`` metadata is used to identify memory
+operations which cannot access a range of address spaces. It is


Will/can alias analysis be extended to look at this and infer NoAlias/NoModRef where appropriate? I was thinking something like AddressSpaceNoAliasAA similar to ScopeNoAliasAA can be added to the default AA pipeline.

A bit more concretely:

A new AddressSpaceNoAliasAA pass that infers NoAlias based on address spaces of pointers.

Its constructor will query a NoAlias map DenseMap<int, SmallSet<int>> from the Target, so targets can provide base info about which address spaces do not alias with which other. Default will be empty, and then this AA will conservatively infer that all address spaces overlap/interfere with all others.

The new metadata essentially specifies, per pointer an overriding entry in the NoAlias map for that instruction.

Two pointers in address spaces A, B do not alias if NoAlias[A][B] = true, or if A has the noalias.addrspace metadata that lists B, or B has a noalias.addrespace metadata that lists A.

The new metadata can also be attached to calls and intrinsics that may have memory side effects.

AAMDNodes will start carrying this info as an additional field.

Please let me know if something like this might be useful.

I was not planning on touching alias analysis for this, but in principle it should be possible. We probably should have an address space AA. Currently we deal with this with target specific backend analyses, but this has the drawback that it's after the heavier AA passes in the pipeline when we can give cheaper addrspace checks in many cases.

TargetTransformInfo already has addrspacesMayAlias and isValidAddrSpaceCast

Thanks. I am working on a rough sketch for this based on above. Will try to post an RFC once I get some feedback internally at NVIDIA.

jurahul · 2024-09-10T16:59:48Z

llvm/lib/IR/Verifier.cpp

+    Check(isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicRMWInst>(I) ||
+              isa<AtomicCmpXchgInst>(I) || isa<CallInst>(I),
+          "noalias.addrspace are only for memory operations!", &I);
+    visitNoaliasAddrspaceMetadata(I, Range, I.getType());


Should we also add a check to the effect that if there is an operand in address space 'n', 'n' cannot be listed in the noalias.addrspace metadata for that instruction?

No, because the rule only is for the address space of the underlying object. You may be accessing it through an addrspacecasted pointer. If we wanted something like that, it would be more appropriate to put in the Lint analysis. Undefined behavior that transforms can happen to introduce generally should not be IR verifier errors

Ok. I can see that for say things like function or intrinsics, where they accept a pointer in address space 'n' but also assert that its underlying address space is not 'n' and access it internally using another address space (so the pointer is pure bit container to pass some data into the function or intrinsic). But for native LLVM insts, like LoadInst when would it make sense to have its pointer operand in addspace(n) and also have the noalias.addrspace metadata listing n? I guess we are calling it undefined behavior but still valid IR?

Address spaces may overlap, a pointer to one address space may point to an object defined in another address space. A load instruction that acts on a pointer to one address space, and asserts that the pointer does not point to an object defined in that address space, could presumably still have well-defined behaviour if the pointed-to object is defined in another address space but is accessible through multiple address spaces.

Just some thought:

The spec language is: The noalias.addrspace metadata is used to identify memory
operations which cannot access a range of address spaces.

In general, given a pointer in AS 'n', we can have set of address spaces that potentially alias with 'n', and then any memory access using AS n pointer is a potential access to any of the aliasing address spaces. And then this metadata is a guarantee that some of those aliasing address spaces can be dropped from the set, right? If Aliases(P) is all address spaces that can potentially alias with P (including P itself), then a LoadInst for example would be invalid if Aliases(P) - noalias.addrspace is an empty set, i.e., no address space is left to define the access?

Assuming address spaces are organized as a tree (or a DAG), where each node is an address space, and its children/directed neighbors are all address spaces that are contained within the parent, Aliases(P) is all address spaces k such that k is on a path from some root to some leaf that contains node (p). So, Aliases(p) capture all sub-address spaces within (p) as well as all address spaces that p is a sub-address space of. Note that we can have another definition where Aliases(P) is only its sub-address spaces looking down in this tree but not up (Descendents).

So, whenever you access a pointer in AS p, you can potentially access any of the address spaces in Aliases(P). And then noalias.addrspace drops some address spaces from that. If we have native LoadInst whose pointer AS is P, and then noalias.addrespace, the set of legal address spaces its allowed to access is:

Aliases(P) - noalias.addrspace

If this set happens to be empty, then I'd think the IR is invalid (we have not left any address space for the load to do its memory access. Or maybe undefined)?

And if Alias(P) includes parents as well as descendent address spaces, then

load with pointer p and noalias.addresspace = p can be well defined, since Alias(P) - {p) will include say a parent address space, and the load is "required" to upcast the pointer from AS p to that parent address space (say generic) to do the access.

If OTOH we define the set of legal address spaces to be:

Descendents(P) - (union(Descendents(noalias.addrspace))

which I think is the more logical meaning, then load with pointer p and noalias.addresspace = p will always be undefined as the set of legal address spaces left will always be null.

Essentially, I am defining legality by first defining the set of legal address spaces that a load can access, and then making a load/store legal only if that set is non-empty. Whether we use AscendentsAndDescendepts(p) or just Descendents(p) I guess depends on the intended semantics.

The intent was to only restrict the underlying object's address space, which is most obviously definable in terms of generic IR rules we already have.

The IR has no generic rules for address space aliasing relationships. Any address space is assumed to potentially alias any other address space, and this is only relaxed by target AliasAnalysis extensions. An accessing LoadInst with an explicit noalias.addrspace forbidding the exact address space of the load has to be assumed valid, as we know nothing about what address spaces can alias others.

If this set happens to be empty, then I'd think the IR is invalid (we have not left any address space for the load to do its memory access. Or maybe undefined)?

We can't make these situations invalid IR without requiring more target information in the IR, which I want to avoid. The assumption is the IR producer knows what it's doing for the target, and any address space value may have been produced through a valid chain of addrspacecasts.

I've tried to reword the LangRef to be more explicit this only says anything about the original object's address space, not the access

arsenm · 2024-09-30T08:36:24Z

ping

This is intended to solve a problem with lowering atomics in OpenMP and C++ common to AMDGPU and NVPTX. In OpenCL and CUDA, it is undefined behavior for an atomic instruction to modify an object in thread private memory. In OpenMP, it is defined. Correspondingly, the hardware does not handle this correctly. For AMDGPU, 32-bit atomics work and 64-bit atomics are silently dropped. We therefore need to codegen this by inserting a runtime address space check, performing the private case without atomics, and fallback to issuing the real atomic otherwise. This metadata allows us to avoid this extra check and branch. Handle this by introducing metadata intended to be applied to atomicrmw, indicating they cannot access the forbidden address space.

frasercrmck

This seems reasonable to me. I don't have any objections to the range format either.

Perhaps those with the original questions/objections should chime in.

nikic

LGTM

nikic · 2024-10-07T10:02:18Z

llvm/test/Transforms/SimplifyCFG/hoist-with-metadata.ll

@@ -316,10 +316,80 @@ out:
  ret void
 }

+define void @hoist_noalias_addrspace_both(i1 %c, ptr %p, i64 %val) {


I assume the various tests you added here are just to show conservatively correct behavior, and you plan to implement proper merging support (via combineMetadataForCSE and friends) in a followup PR?

arsenm mentioned this pull request Aug 8, 2024

clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw #102462

Merged

arsenm added the llvm:ir label Aug 8, 2024 — with Graphite App

arsenm requested review from AlexVlx, Artem-B, bogner, Endilll, frasercrmck, hvdijk, jdoerfert, jh7370, jhuber6, jyknight, lei137, shiltian and yxsamliu August 8, 2024 13:23

arsenm marked this pull request as ready for review August 8, 2024 13:23

shiltian reviewed Aug 8, 2024

View reviewed changes

llvm/lib/IR/Verifier.cpp Outdated Show resolved Hide resolved

Artem-B reviewed Aug 8, 2024

View reviewed changes

llvm/test/Verifier/noalias-addrspace.ll Show resolved Hide resolved

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved

arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch from e403d08 to 1e084c6 Compare August 9, 2024 11:06

arsenm mentioned this pull request Aug 9, 2024

AMDGPU: Add noalias.addrspace metadata when autoupgrading atomic intrinsics #102599

Merged

hvdijk reviewed Aug 9, 2024

View reviewed changes

Endilll removed their request for review August 9, 2024 11:26

nikic reviewed Sep 6, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch 2 times, most recently from 16dd092 to c3dbcec Compare September 6, 2024 17:40

jurahul reviewed Sep 9, 2024

View reviewed changes

jurahul reviewed Sep 10, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch 2 times, most recently from 79e1e8b to f84e99b Compare September 20, 2024 11:54

arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch 3 times, most recently from fb9d412 to d7b148d Compare September 30, 2024 07:09

arsenm added 6 commits October 7, 2024 13:01

Use enum for range like metadata kinds

a2e9181

Fix doc build

d1384fe

Add baseline tests for some optimization handling

4e9ec21

Add another comment

763f178

Reword langref slightly

4a6df18

arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch from d7b148d to 4a6df18 Compare October 7, 2024 09:15

frasercrmck reviewed Oct 7, 2024

View reviewed changes

nikic approved these changes Oct 7, 2024

View reviewed changes

arsenm merged commit a8e1311 into main Oct 7, 2024
8 of 10 checks passed

arsenm deleted the users/arsenm/noalias-addrspace-metadata branch October 7, 2024 19:21

[RFC] IR: Define noalias.addrspace metadata #102461

[RFC] IR: Define noalias.addrspace metadata #102461

Uh oh!

Conversation

arsenm commented Aug 8, 2024

Uh oh!

arsenm commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Aug 8, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yxsamliu commented Aug 8, 2024

Uh oh!

arsenm commented Aug 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gonzalobg commented Aug 9, 2024

Uh oh!

hvdijk commented Aug 9, 2024

Uh oh!

gonzalobg commented Sep 4, 2024

Uh oh!

arsenm commented Sep 4, 2024

Uh oh!

jrtc27 commented Sep 6, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jurahul Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jurahul Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Sep 30, 2024

Uh oh!

frasercrmck left a comment

arsenm commented Aug 8, 2024 •

edited

Loading

llvmbot commented Aug 8, 2024 •

edited

Loading

jurahul Sep 9, 2024 •

edited

Loading

jurahul Sep 11, 2024 •

edited

Loading