Skip to content

[RFC] IR: Define noalias.addrspace metadata #102461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 7, 2024

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Aug 8, 2024

This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We therefore
need to codegen this by inserting a runtime address space check, performing
the private case without atomics, and fallback to issuing the real atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.

Copy link
Contributor Author

arsenm commented Aug 8, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @arsenm and the rest of your teammates on Graphite Graphite

@llvmbot
Copy link
Member

llvmbot commented Aug 8, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Matt Arsenault (arsenm)

Changes

This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We therefore
need to codegen this by inserting a runtime address space check, performing
the private case without atomics, and fallback to issuing the real atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.


Full diff: https://github.com/llvm/llvm-project/pull/102461.diff

6 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+36)
  • (modified) llvm/docs/ReleaseNotes.rst (+2)
  • (modified) llvm/include/llvm/IR/FixedMetadataKinds.def (+1)
  • (modified) llvm/lib/IR/Verifier.cpp (+28-6)
  • (added) llvm/test/Assembler/noalias-addrspace-md.ll (+110)
  • (added) llvm/test/Verifier/noalias-addrspace.ll (+60)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b17e3c828ed3d5..f03b6508c649ce 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -8015,6 +8015,42 @@ it will contain a list of ids, including the ids of the callsites in the
 full inline sequence, in order from the leaf-most call's id to the outermost
 inlined call.
 
+
+'``noalias.addrspace``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``noalias.addrspace`` metadata is used to identify memory
+operations which cannot access a range of address spaces. It is
+attached to memory instructions, including :ref:`atomicrmw
+<i_atomicrmw>`, :ref:`cmpxchg <i_cmpxchg>`, and :ref:`call <i_call>`
+instructions.
+
+This follows the same form as :ref:`range metadata <_range-metadata>`,
+except the field entries must be of type `i32`. The interpretation is
+the same numeric address spaces as applied to IR values.
+
+Example:
+
+.. code-block:: llvm
+    ; %ptr cannot point to an object allocated in addrspace(5)
+    %rmw.valid = atomicrmw and ptr %ptr, i64 %value seq_cst, !noalias.addrspace !0
+
+    ; Undefined behavior. The underlying object is allocated in one of the listed
+    ; address spaces.
+    %alloca = alloca i64, addrspace(5)
+    %alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr
+    %rmw.ub = atomicrmw and ptr %alloca.cast, i64 %value seq_cst, !noalias.addrspace !0
+
+    !0 = !{i32 5, i32 6}
+
+
+This is intended for use on targets with a notion of generic address
+spaces, which at runtime resolve to different physical memory
+spaces. The interpretation of the address space values is target
+specific. The behavior is undefined if the runtime memory address does
+resolve to an object defined in one of the indicated address spaces.
+
+
 Module Flags Metadata
 =====================
 
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 1ed860de6b9dce..41460c4a75c148 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -53,6 +53,8 @@ Changes to the LLVM IR
 * The ``x86_mmx`` IR type has been removed. It will be translated to
   the standard vector type ``<1 x i64>`` in bitcode upgrade.
 
+* Introduced `noalias.addrspace` metadata.
+
 Changes to LLVM infrastructure
 ------------------------------
 
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index 5f4cc230a0f5ff..df572e8791e13b 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -52,3 +52,4 @@ LLVM_FIXED_MD_KIND(MD_pcsections, "pcsections", 37)
 LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
 LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
 LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
+LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41)
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 402af376651eb6..2b05ec358533c6 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -515,8 +515,9 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
   void visitFunction(const Function &F);
   void visitBasicBlock(BasicBlock &BB);
   void verifyRangeMetadata(const Value &V, const MDNode *Range, Type *Ty,
-                           bool IsAbsoluteSymbol);
+                           bool IsAbsoluteSymbol, bool IsAddrSpaceRange);
   void visitRangeMetadata(Instruction &I, MDNode *Range, Type *Ty);
+  void visitNoaliasAddrspaceMetadata(Instruction &I, MDNode *Range, Type *Ty);
   void visitDereferenceableMetadata(Instruction &I, MDNode *MD);
   void visitProfMetadata(Instruction &I, MDNode *MD);
   void visitCallStackMetadata(MDNode *MD);
@@ -760,7 +761,7 @@ void Verifier::visitGlobalValue(const GlobalValue &GV) {
     if (const MDNode *AbsoluteSymbol =
             GO->getMetadata(LLVMContext::MD_absolute_symbol)) {
       verifyRangeMetadata(*GO, AbsoluteSymbol, DL.getIntPtrType(GO->getType()),
-                          true);
+                          true, false);
     }
   }
 
@@ -4128,7 +4129,8 @@ static bool isContiguous(const ConstantRange &A, const ConstantRange &B) {
 /// Verify !range and !absolute_symbol metadata. These have the same
 /// restrictions, except !absolute_symbol allows the full set.
 void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
-                                   Type *Ty, bool IsAbsoluteSymbol) {
+                                   Type *Ty, bool IsAbsoluteSymbol,
+                                   bool IsAddrSpaceRange) {
   unsigned NumOperands = Range->getNumOperands();
   Check(NumOperands % 2 == 0, "Unfinished range!", Range);
   unsigned NumRanges = NumOperands / 2;
@@ -4145,8 +4147,14 @@ void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
 
     Check(High->getType() == Low->getType(), "Range pair types must match!",
           &I);
-    Check(High->getType() == Ty->getScalarType(),
-          "Range types must match instruction type!", &I);
+
+    if (IsAddrSpaceRange) {
+      Check(High->getType()->isIntegerTy(32),
+            "noalias.addrspace type must be i32!", &I);
+    } else {
+      Check(High->getType() == Ty->getScalarType(),
+            "Range types must match instruction type!", &I);
+    }
 
     APInt HighV = High->getValue();
     APInt LowV = Low->getValue();
@@ -4185,7 +4193,14 @@ void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
 void Verifier::visitRangeMetadata(Instruction &I, MDNode *Range, Type *Ty) {
   assert(Range && Range == I.getMetadata(LLVMContext::MD_range) &&
          "precondition violation");
-  verifyRangeMetadata(I, Range, Ty, false);
+  verifyRangeMetadata(I, Range, Ty, false, false);
+}
+
+void Verifier::visitNoaliasAddrspaceMetadata(Instruction &I, MDNode *Range,
+                                             Type *Ty) {
+  assert(Range && Range == I.getMetadata(LLVMContext::MD_noalias_addrspace) &&
+         "precondition violation");
+  verifyRangeMetadata(I, Range, Ty, false, true);
 }
 
 void Verifier::checkAtomicMemAccessSize(Type *Ty, const Instruction *I) {
@@ -5177,6 +5192,13 @@ void Verifier::visitInstruction(Instruction &I) {
     visitRangeMetadata(I, Range, I.getType());
   }
 
+  if (MDNode *Range = I.getMetadata(LLVMContext::MD_noalias_addrspace)) {
+    Check(isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicRMWInst>(I) ||
+              isa<AtomicCmpXchgInst>(I) || isa<CallInst>(I),
+          "noalias.addrspace are only for memory operations!", &I);
+    visitNoaliasAddrspaceMetadata(I, Range, I.getType());
+  }
+
   if (I.hasMetadata(LLVMContext::MD_invariant_group)) {
     Check(isa<LoadInst>(I) || isa<StoreInst>(I),
           "invariant.group metadata is only for loads and stores", &I);
diff --git a/llvm/test/Assembler/noalias-addrspace-md.ll b/llvm/test/Assembler/noalias-addrspace-md.ll
new file mode 100644
index 00000000000000..62fabad86f683a
--- /dev/null
+++ b/llvm/test/Assembler/noalias-addrspace-md.ll
@@ -0,0 +1,110 @@
+; RUN: llvm-as < %s | llvm-dis | FileCheck %s
+
+define i64 @atomicrmw_noalias_addrspace__0_1(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__0_1(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META0:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !0
+  ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__0_2(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__0_2(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META1:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !1
+  ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__1_3(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__1_3(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META2:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !2
+  ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__multiple_ranges(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__multiple_ranges(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META3:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !3
+  ret i64 %ret
+}
+
+define i64 @load_noalias_addrspace__5_6(ptr %ptr) {
+; CHECK-LABEL: define i64 @load_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = load i64, ptr [[PTR]], align 4, !noalias.addrspace [[META4:![0-9]+]]
+; CHECK-NEXT:    ret i64 [[RET]]
+;
+  %ret = load i64, ptr %ptr, align 4, !noalias.addrspace !4
+  ret i64 %ret
+}
+
+define void @store_noalias_addrspace__5_6(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define void @store_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT:    store i64 [[VAL]], ptr [[PTR]], align 4, !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret void
+;
+  store i64 %val, ptr %ptr, align 4, !noalias.addrspace !4
+  ret void
+}
+
+define { i64, i1 } @cmpxchg_noalias_addrspace__5_6(ptr %ptr, i64 %val0, i64 %val1) {
+; CHECK-LABEL: define { i64, i1 } @cmpxchg_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL0:%.*]], i64 [[VAL1:%.*]]) {
+; CHECK-NEXT:    [[RET:%.*]] = cmpxchg ptr [[PTR]], i64 [[VAL0]], i64 [[VAL1]] monotonic monotonic, align 8, !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret { i64, i1 } [[RET]]
+;
+  %ret = cmpxchg ptr %ptr, i64 %val0, i64 %val1 monotonic monotonic, align 8, !noalias.addrspace !4
+  ret { i64, i1 } %ret
+}
+
+declare void @foo()
+
+define void @call_noalias_addrspace__5_6(ptr %ptr) {
+; CHECK-LABEL: define void @call_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT:    call void @foo(), !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret void
+;
+  call void @foo(), !noalias.addrspace !4
+  ret void
+}
+
+define void @call_memcpy_intrinsic_addrspace__5_6(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @call_memcpy_intrinsic_addrspace__5_6(
+; CHECK-SAME: ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) {
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i1 false), !noalias.addrspace [[META4]]
+; CHECK-NEXT:    ret void
+;
+  call void @llvm.memcpy.p0.p0.i64(ptr %dst, ptr %src, i64 %size, i1 false), !noalias.addrspace !4
+  ret void
+}
+
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #0
+
+attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
+
+!0 = !{i32 0, i32 1}
+!1 = !{i32 0, i32 2}
+!2 = !{i32 1, i32 3}
+!3 = !{i32 4, i32 6, i32 10, i32 55}
+!4 = !{i32 5, i32 6}
+;.
+; CHECK: [[META0]] = !{i32 0, i32 1}
+; CHECK: [[META1]] = !{i32 0, i32 2}
+; CHECK: [[META2]] = !{i32 1, i32 3}
+; CHECK: [[META3]] = !{i32 4, i32 6, i32 10, i32 55}
+; CHECK: [[META4]] = !{i32 5, i32 6}
+;.
diff --git a/llvm/test/Verifier/noalias-addrspace.ll b/llvm/test/Verifier/noalias-addrspace.ll
new file mode 100644
index 00000000000000..67a7293d2561cc
--- /dev/null
+++ b/llvm/test/Verifier/noalias-addrspace.ll
@@ -0,0 +1,60 @@
+; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
+
+; CHECK: It should have at least one range!
+; CHECK-NEXT: !0 = !{}
+define i64 @noalias_addrspace__empty(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !0
+  ret i64 %ret
+}
+
+; CHECK: Unfinished range!
+; CHECK-NEXT: !1 = !{i32 0}
+define i64 @noalias_addrspace__single_field(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !1
+  ret i64 %ret
+}
+
+; CHECK: Range must not be empty!
+; CHECK-NEXT: !2 = !{i32 0, i32 0}
+define i64 @noalias_addrspace__0_0(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !2
+  ret i64 %ret
+}
+
+; CHECK: noalias.addrspace type must be i32!
+; CHECK-NEXT: %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !3
+define i64 @noalias_addrspace__i64(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !3
+  ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__fp(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !4
+  ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__ptr(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !5
+  ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__nonconstant(ptr %ptr, i64 %val) {
+  %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !6
+  ret i64 %ret
+}
+
+@gv0 = global i32 0
+@gv1 = global i32 1
+
+!0 = !{}
+!1 = !{i32 0}
+!2 = !{i32 0, i32 0}
+!3 = !{i64 1, i64 5}
+!4 = !{float 0.0, float 2.0}
+!5 = !{ptr null, ptr addrspace(1) null}
+!6 = !{i32 ptrtoint (ptr @gv0 to i32), i32 ptrtoint (ptr @gv1 to i32) }
+
+

@arsenm arsenm marked this pull request as ready for review August 8, 2024 13:23
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be a range? What if the address spaces of interest are not consecutive?

@arsenm
Copy link
Contributor Author

arsenm commented Aug 8, 2024

Does it have to be a range? What if the address spaces of interest are not consecutive?

You can specify as many ranges as you want

@yxsamliu
Copy link
Collaborator

yxsamliu commented Aug 8, 2024

Is the MD a promise to the backend that the attached memory operation is not in a specific address space?

Who will attach the MD to the IR? FE or some LLVM passes? FE may not be able to figure out the real address space of some memory operations.

@arsenm
Copy link
Contributor Author

arsenm commented Aug 9, 2024

Is the MD a promise to the backend that the attached memory operation is not in a specific address space?

Yes, for the underlying object definition.

Who will attach the MD to the IR? FE or some LLVM passes? FE may not be able to figure out the real address space of some memory operations.

The main user is the frontend since OpenCL/CUDA give the guarantee. We can also infer this in many cases in backend passes.

spaces, which at runtime resolve to different physical memory
spaces. The interpretation of the address space values is target
specific. The behavior is undefined if the runtime memory address does
resolve to an object defined in one of the indicated address spaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me from this what the intended behaviour is when this metadata is used to specify an object is not in the generic address space. Does that effectively mean this code must be unreachable, or does that say nothing because the concrete address space any object is in will not be the generic address space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how to parse this question.

when this metadata is used to specify an object is not in the generic address space

Do you mean, in the amdgpu case, !{i32 0, i32 1}?

because the concrete address space any object is in will not be the generic address space?

We avoid creating objects in the generic address space (except functions), but it shouldn't really matter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean, in the amdgpu case, !{i32 0, i32 1}?

Suppose you just have !noalias.addrspace !0 with !0 = !{i32 1}.

We avoid creating objects in the generic address space (except functions), but it shouldn't really matter?

If you have an object in the global address space 0, which is also addressable using the generic address space 1 (please correct me if I am getting the AMDGPU address spaces wrong), would the above !noalias.addrspace specify that it can or that it cannot alias that object? I cannot see a good reason why a frontend would generate that directly, but I can imagine future additions where a frontend would generate that based on user-specified address spaces in source code, so I think it would be good to have that answered upfront.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have an object in the global address space 0, which is also addressable using the generic address space 1

These are backwards, 0 is generic and 1 is global.

To simplify your question with an example, I would interpret

@globalvar.in.0 = addrspace(0) constant i64

@globalvar.in.1 = addrspace(1) constant i64

; instant undefined behavior, can fold to trap
%rmw.ub = atomicrmw and ptr @globalvar.in.0, i64 %value seq_cst, !noalias.addrspace !0

; OK, underlying object is in addrspace(1)
%rmw.ok0 = atomicrmw and ptr addrspace(1) @globalvar.in.1, i64 %value seq_cst, !noalias.addrspace !0

; OK, underlying object is in addrspace(1) even though access is through forbidden addrspace(0) the object ; definition address space is what matters 
%cast.gv.in.1 = addrspacecast ptr addrspace(1) @globalvar.in.1 to ptr
%rmw.ok0 = atomicrmw and ptr %cast.gv.in.1, i64 %value seq_cst, !noalias.addrspace !0

!0 = !{i32 0, i32 1} ; cannot alias 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify your question with an example, I would interpret

@globalvar.in.0 = addrspace(0) constant i64

@globalvar.in.1 = addrspace(1) constant i64

; instant undefined behavior, can fold to trap
%rmw.ub = atomicrmw and ptr @globalvar.in.0, i64 %value seq_cst, !noalias.addrspace !0

; OK, underlying object is in addrspace(1)
%rmw.ok0 = atomicrmw and ptr addrspace(1) @globalvar.in.1, i64 %value seq_cst, !noalias.addrspace !0

; OK, underlying object is in addrspace(1) even though access is through forbidden addrspace(0) the object ; definition address space is what matters 
%cast.gv.in.1 = addrspacecast ptr addrspace(1) @globalvar.in.1 to ptr
%rmw.ok0 = atomicrmw and ptr %cast.gv.in.1, i64 %value seq_cst, !noalias.addrspace !0

!0 = !{i32 0, i32 1} ; cannot alias 0

This example is confusing for me. Since addr space 0 is a union of all concrete addr spaces including 1, 2, 3, 5, etc, an actual global pointer with MD promising not alias to 0 is self-contradictory.

If in this example the MD is promising not alias to 5 then it has no confusion.

Then it comes the question: does it really meaningful to have a MD promising not alias to generic addr space 0? Should that be disallowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IR does not have a concept of generic address spaces, and this description is not written in terms of a generic address space. It merely states the address space of the underlying object allocation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. got it

@gonzalobg
Copy link
Contributor

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory.

This is not true for CUDA, an LLVM IR atomic instruction on thread private memory is fine.
(The LLVM NVPTX backend may have a bug that generates wrong PTX, but that's a backend bug).

@hvdijk
Copy link
Contributor

hvdijk commented Aug 9, 2024

This is not true for CUDA, an LLVM IR atomic instruction on thread private memory is fine.

Are you sure? The CUDA C++ Programming Guide appears to agree that atomic functions cannot act on private memory. From 10.14. Atomic Functions:

An atomic function performs a read-modify-write atomic operation on one 32-bit, 64-bit, or 128-bit word residing in global or shared memory.

@Endilll Endilll removed their request for review August 9, 2024 11:26
@gonzalobg
Copy link
Contributor

This extension looks good to me and it can be useful in a few places.
Looking forward to see how it ends up being used.

I hope we watch out for and resist the temptation to use it to reify backend bugs as programming model "features", i.e., instead of declaring that some backend not supporting some op to some statespace is a backend bug, we end up declaring that performing that op to that statespace is UB in the programming model, and end up using this feature to encode that.

@arsenm
Copy link
Contributor Author

arsenm commented Sep 4, 2024

I hope we watch out for and resist the temptation to use it to reify backend bugs as programming model "features", i.e., instead of declaring that some backend not supporting some op to some statespace is a backend bug, we end up declaring that performing that op to that statespace is UB in the programming model, and end up using this feature to encode that.

But that's what this is doing? We're making the backend handle the IR operation with fully correct generality, and frontends have to opt-in to get the optimizations in the cases the language specifications say are not required

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 6, 2024

From a CHERI perspective (one of the other big users of LLVM address spaces) this is entirely irrelevant (our address spaces are entirely coincident), so no objection there.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

%alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr
%rmw.ub = atomicrmw and ptr %alloca.cast, i64 %value seq_cst, !noalias.addrspace !0

!0 = !{i32 5, i32 6}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably make sense to add a comment here.

@arsenm arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch 2 times, most recently from 16dd092 to c3dbcec Compare September 6, 2024 17:40
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``noalias.addrspace`` metadata is used to identify memory
operations which cannot access a range of address spaces. It is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will/can alias analysis be extended to look at this and infer NoAlias/NoModRef where appropriate? I was thinking something like AddressSpaceNoAliasAA similar to ScopeNoAliasAA can be added to the default AA pipeline.

Copy link
Contributor

@jurahul jurahul Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more concretely:

  • A new AddressSpaceNoAliasAA pass that infers NoAlias based on address spaces of pointers.
  • Its constructor will query a NoAlias map DenseMap<int, SmallSet<int>> from the Target, so targets can provide base info about which address spaces do not alias with which other. Default will be empty, and then this AA will conservatively infer that all address spaces overlap/interfere with all others.
  • The new metadata essentially specifies, per pointer an overriding entry in the NoAlias map for that instruction.
  • Two pointers in address spaces A, B do not alias if NoAlias[A][B] = true, or if A has the noalias.addrspace metadata that lists B, or B has a noalias.addrespace metadata that lists A.
  • The new metadata can also be attached to calls and intrinsics that may have memory side effects.
  • AAMDNodes will start carrying this info as an additional field.

Please let me know if something like this might be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not planning on touching alias analysis for this, but in principle it should be possible. We probably should have an address space AA. Currently we deal with this with target specific backend analyses, but this has the drawback that it's after the heavier AA passes in the pipeline when we can give cheaper addrspace checks in many cases.

TargetTransformInfo already has addrspacesMayAlias and isValidAddrSpaceCast

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am working on a rough sketch for this based on above. Will try to post an RFC once I get some feedback internally at NVIDIA.

Check(isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicRMWInst>(I) ||
isa<AtomicCmpXchgInst>(I) || isa<CallInst>(I),
"noalias.addrspace are only for memory operations!", &I);
visitNoaliasAddrspaceMetadata(I, Range, I.getType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a check to the effect that if there is an operand in address space 'n', 'n' cannot be listed in the noalias.addrspace metadata for that instruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the rule only is for the address space of the underlying object. You may be accessing it through an addrspacecasted pointer. If we wanted something like that, it would be more appropriate to put in the Lint analysis. Undefined behavior that transforms can happen to introduce generally should not be IR verifier errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I can see that for say things like function or intrinsics, where they accept a pointer in address space 'n' but also assert that its underlying address space is not 'n' and access it internally using another address space (so the pointer is pure bit container to pass some data into the function or intrinsic). But for native LLVM insts, like LoadInst when would it make sense to have its pointer operand in addspace(n) and also have the noalias.addrspace metadata listing n? I guess we are calling it undefined behavior but still valid IR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address spaces may overlap, a pointer to one address space may point to an object defined in another address space. A load instruction that acts on a pointer to one address space, and asserts that the pointer does not point to an object defined in that address space, could presumably still have well-defined behaviour if the pointed-to object is defined in another address space but is accessible through multiple address spaces.

Copy link
Contributor

@jurahul jurahul Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some thought:

The spec language is: The noalias.addrspace metadata is used to identify memory
operations which cannot access a range of address spaces
.

In general, given a pointer in AS 'n', we can have set of address spaces that potentially alias with 'n', and then any memory access using AS n pointer is a potential access to any of the aliasing address spaces. And then this metadata is a guarantee that some of those aliasing address spaces can be dropped from the set, right? If Aliases(P) is all address spaces that can potentially alias with P (including P itself), then a LoadInst for example would be invalid if Aliases(P) - noalias.addrspace is an empty set, i.e., no address space is left to define the access?

Assuming address spaces are organized as a tree (or a DAG), where each node is an address space, and its children/directed neighbors are all address spaces that are contained within the parent, Aliases(P) is all address spaces k such that k is on a path from some root to some leaf that contains node (p). So, Aliases(p) capture all sub-address spaces within (p) as well as all address spaces that p is a sub-address space of. Note that we can have another definition where Aliases(P) is only its sub-address spaces looking down in this tree but not up (Descendents).

So, whenever you access a pointer in AS p, you can potentially access any of the address spaces in Aliases(P). And then noalias.addrspace drops some address spaces from that. If we have native LoadInst whose pointer AS is P, and then noalias.addrespace, the set of legal address spaces its allowed to access is:

Aliases(P) - noalias.addrspace

If this set happens to be empty, then I'd think the IR is invalid (we have not left any address space for the load to do its memory access. Or maybe undefined)?

And if Alias(P) includes parents as well as descendent address spaces, then

load with pointer p and noalias.addresspace = p can be well defined, since Alias(P) - {p) will include say a parent address space, and the load is "required" to upcast the pointer from AS p to that parent address space (say generic) to do the access.

If OTOH we define the set of legal address spaces to be:

Descendents(P) - (union(Descendents(noalias.addrspace))

which I think is the more logical meaning, then load with pointer p and noalias.addresspace = p will always be undefined as the set of legal address spaces left will always be null.

Essentially, I am defining legality by first defining the set of legal address spaces that a load can access, and then making a load/store legal only if that set is non-empty. Whether we use AscendentsAndDescendepts(p) or just Descendents(p) I guess depends on the intended semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was to only restrict the underlying object's address space, which is most obviously definable in terms of generic IR rules we already have.

The IR has no generic rules for address space aliasing relationships. Any address space is assumed to potentially alias any other address space, and this is only relaxed by target AliasAnalysis extensions. An accessing LoadInst with an explicit noalias.addrspace forbidding the exact address space of the load has to be assumed valid, as we know nothing about what address spaces can alias others.

If this set happens to be empty, then I'd think the IR is invalid (we have not left any address space for the load to do its memory access. Or maybe undefined)?

We can't make these situations invalid IR without requiring more target information in the IR, which I want to avoid. The assumption is the IR producer knows what it's doing for the target, and any address space value may have been produced through a valid chain of addrspacecasts.

I've tried to reword the LangRef to be more explicit this only says anything about the original object's address space, not the access

@arsenm
Copy link
Contributor Author

arsenm commented Sep 30, 2024

ping

arsenm added 6 commits October 7, 2024 13:01
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We therefore
need to codegen this by inserting a runtime address space check, performing
the private case without atomics, and fallback to issuing the real atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.
@arsenm arsenm force-pushed the users/arsenm/noalias-addrspace-metadata branch from d7b148d to 4a6df18 Compare October 7, 2024 09:15
Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. I don't have any objections to the range format either.

Perhaps those with the original questions/objections should chime in.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -316,10 +316,80 @@ out:
ret void
}

define void @hoist_noalias_addrspace_both(i1 %c, ptr %p, i64 %val) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the various tests you added here are just to show conservatively correct behavior, and you plan to implement proper merging support (via combineMetadataForCSE and friends) in a followup PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@arsenm arsenm merged commit a8e1311 into main Oct 7, 2024
8 of 10 checks passed
@arsenm arsenm deleted the users/arsenm/noalias-addrspace-metadata branch October 7, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.