-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RFC] IR: Define noalias.addrspace metadata #102461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-ir Author: Matt Arsenault (arsenm) ChangesThis is intended to solve a problem with lowering atomics in In OpenCL and CUDA, it is undefined behavior for an atomic instruction Handle this by introducing metadata intended to be applied to atomicrmw, Full diff: https://github.com/llvm/llvm-project/pull/102461.diff 6 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b17e3c828ed3d5..f03b6508c649ce 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -8015,6 +8015,42 @@ it will contain a list of ids, including the ids of the callsites in the
full inline sequence, in order from the leaf-most call's id to the outermost
inlined call.
+
+'``noalias.addrspace``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``noalias.addrspace`` metadata is used to identify memory
+operations which cannot access a range of address spaces. It is
+attached to memory instructions, including :ref:`atomicrmw
+<i_atomicrmw>`, :ref:`cmpxchg <i_cmpxchg>`, and :ref:`call <i_call>`
+instructions.
+
+This follows the same form as :ref:`range metadata <_range-metadata>`,
+except the field entries must be of type `i32`. The interpretation is
+the same numeric address spaces as applied to IR values.
+
+Example:
+
+.. code-block:: llvm
+ ; %ptr cannot point to an object allocated in addrspace(5)
+ %rmw.valid = atomicrmw and ptr %ptr, i64 %value seq_cst, !noalias.addrspace !0
+
+ ; Undefined behavior. The underlying object is allocated in one of the listed
+ ; address spaces.
+ %alloca = alloca i64, addrspace(5)
+ %alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr
+ %rmw.ub = atomicrmw and ptr %alloca.cast, i64 %value seq_cst, !noalias.addrspace !0
+
+ !0 = !{i32 5, i32 6}
+
+
+This is intended for use on targets with a notion of generic address
+spaces, which at runtime resolve to different physical memory
+spaces. The interpretation of the address space values is target
+specific. The behavior is undefined if the runtime memory address does
+resolve to an object defined in one of the indicated address spaces.
+
+
Module Flags Metadata
=====================
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 1ed860de6b9dce..41460c4a75c148 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -53,6 +53,8 @@ Changes to the LLVM IR
* The ``x86_mmx`` IR type has been removed. It will be translated to
the standard vector type ``<1 x i64>`` in bitcode upgrade.
+* Introduced `noalias.addrspace` metadata.
+
Changes to LLVM infrastructure
------------------------------
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index 5f4cc230a0f5ff..df572e8791e13b 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -52,3 +52,4 @@ LLVM_FIXED_MD_KIND(MD_pcsections, "pcsections", 37)
LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
+LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41)
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 402af376651eb6..2b05ec358533c6 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -515,8 +515,9 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
void visitFunction(const Function &F);
void visitBasicBlock(BasicBlock &BB);
void verifyRangeMetadata(const Value &V, const MDNode *Range, Type *Ty,
- bool IsAbsoluteSymbol);
+ bool IsAbsoluteSymbol, bool IsAddrSpaceRange);
void visitRangeMetadata(Instruction &I, MDNode *Range, Type *Ty);
+ void visitNoaliasAddrspaceMetadata(Instruction &I, MDNode *Range, Type *Ty);
void visitDereferenceableMetadata(Instruction &I, MDNode *MD);
void visitProfMetadata(Instruction &I, MDNode *MD);
void visitCallStackMetadata(MDNode *MD);
@@ -760,7 +761,7 @@ void Verifier::visitGlobalValue(const GlobalValue &GV) {
if (const MDNode *AbsoluteSymbol =
GO->getMetadata(LLVMContext::MD_absolute_symbol)) {
verifyRangeMetadata(*GO, AbsoluteSymbol, DL.getIntPtrType(GO->getType()),
- true);
+ true, false);
}
}
@@ -4128,7 +4129,8 @@ static bool isContiguous(const ConstantRange &A, const ConstantRange &B) {
/// Verify !range and !absolute_symbol metadata. These have the same
/// restrictions, except !absolute_symbol allows the full set.
void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
- Type *Ty, bool IsAbsoluteSymbol) {
+ Type *Ty, bool IsAbsoluteSymbol,
+ bool IsAddrSpaceRange) {
unsigned NumOperands = Range->getNumOperands();
Check(NumOperands % 2 == 0, "Unfinished range!", Range);
unsigned NumRanges = NumOperands / 2;
@@ -4145,8 +4147,14 @@ void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
Check(High->getType() == Low->getType(), "Range pair types must match!",
&I);
- Check(High->getType() == Ty->getScalarType(),
- "Range types must match instruction type!", &I);
+
+ if (IsAddrSpaceRange) {
+ Check(High->getType()->isIntegerTy(32),
+ "noalias.addrspace type must be i32!", &I);
+ } else {
+ Check(High->getType() == Ty->getScalarType(),
+ "Range types must match instruction type!", &I);
+ }
APInt HighV = High->getValue();
APInt LowV = Low->getValue();
@@ -4185,7 +4193,14 @@ void Verifier::verifyRangeMetadata(const Value &I, const MDNode *Range,
void Verifier::visitRangeMetadata(Instruction &I, MDNode *Range, Type *Ty) {
assert(Range && Range == I.getMetadata(LLVMContext::MD_range) &&
"precondition violation");
- verifyRangeMetadata(I, Range, Ty, false);
+ verifyRangeMetadata(I, Range, Ty, false, false);
+}
+
+void Verifier::visitNoaliasAddrspaceMetadata(Instruction &I, MDNode *Range,
+ Type *Ty) {
+ assert(Range && Range == I.getMetadata(LLVMContext::MD_noalias_addrspace) &&
+ "precondition violation");
+ verifyRangeMetadata(I, Range, Ty, false, true);
}
void Verifier::checkAtomicMemAccessSize(Type *Ty, const Instruction *I) {
@@ -5177,6 +5192,13 @@ void Verifier::visitInstruction(Instruction &I) {
visitRangeMetadata(I, Range, I.getType());
}
+ if (MDNode *Range = I.getMetadata(LLVMContext::MD_noalias_addrspace)) {
+ Check(isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicRMWInst>(I) ||
+ isa<AtomicCmpXchgInst>(I) || isa<CallInst>(I),
+ "noalias.addrspace are only for memory operations!", &I);
+ visitNoaliasAddrspaceMetadata(I, Range, I.getType());
+ }
+
if (I.hasMetadata(LLVMContext::MD_invariant_group)) {
Check(isa<LoadInst>(I) || isa<StoreInst>(I),
"invariant.group metadata is only for loads and stores", &I);
diff --git a/llvm/test/Assembler/noalias-addrspace-md.ll b/llvm/test/Assembler/noalias-addrspace-md.ll
new file mode 100644
index 00000000000000..62fabad86f683a
--- /dev/null
+++ b/llvm/test/Assembler/noalias-addrspace-md.ll
@@ -0,0 +1,110 @@
+; RUN: llvm-as < %s | llvm-dis | FileCheck %s
+
+define i64 @atomicrmw_noalias_addrspace__0_1(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__0_1(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT: [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META0:![0-9]+]]
+; CHECK-NEXT: ret i64 [[RET]]
+;
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !0
+ ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__0_2(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__0_2(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT: [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META1:![0-9]+]]
+; CHECK-NEXT: ret i64 [[RET]]
+;
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !1
+ ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__1_3(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__1_3(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT: [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META2:![0-9]+]]
+; CHECK-NEXT: ret i64 [[RET]]
+;
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !2
+ ret i64 %ret
+}
+
+define i64 @atomicrmw_noalias_addrspace__multiple_ranges(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define i64 @atomicrmw_noalias_addrspace__multiple_ranges(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT: [[RET:%.*]] = atomicrmw add ptr [[PTR]], i64 [[VAL]] seq_cst, align 8, !noalias.addrspace [[META3:![0-9]+]]
+; CHECK-NEXT: ret i64 [[RET]]
+;
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !3
+ ret i64 %ret
+}
+
+define i64 @load_noalias_addrspace__5_6(ptr %ptr) {
+; CHECK-LABEL: define i64 @load_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT: [[RET:%.*]] = load i64, ptr [[PTR]], align 4, !noalias.addrspace [[META4:![0-9]+]]
+; CHECK-NEXT: ret i64 [[RET]]
+;
+ %ret = load i64, ptr %ptr, align 4, !noalias.addrspace !4
+ ret i64 %ret
+}
+
+define void @store_noalias_addrspace__5_6(ptr %ptr, i64 %val) {
+; CHECK-LABEL: define void @store_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL:%.*]]) {
+; CHECK-NEXT: store i64 [[VAL]], ptr [[PTR]], align 4, !noalias.addrspace [[META4]]
+; CHECK-NEXT: ret void
+;
+ store i64 %val, ptr %ptr, align 4, !noalias.addrspace !4
+ ret void
+}
+
+define { i64, i1 } @cmpxchg_noalias_addrspace__5_6(ptr %ptr, i64 %val0, i64 %val1) {
+; CHECK-LABEL: define { i64, i1 } @cmpxchg_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[VAL0:%.*]], i64 [[VAL1:%.*]]) {
+; CHECK-NEXT: [[RET:%.*]] = cmpxchg ptr [[PTR]], i64 [[VAL0]], i64 [[VAL1]] monotonic monotonic, align 8, !noalias.addrspace [[META4]]
+; CHECK-NEXT: ret { i64, i1 } [[RET]]
+;
+ %ret = cmpxchg ptr %ptr, i64 %val0, i64 %val1 monotonic monotonic, align 8, !noalias.addrspace !4
+ ret { i64, i1 } %ret
+}
+
+declare void @foo()
+
+define void @call_noalias_addrspace__5_6(ptr %ptr) {
+; CHECK-LABEL: define void @call_noalias_addrspace__5_6(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT: call void @foo(), !noalias.addrspace [[META4]]
+; CHECK-NEXT: ret void
+;
+ call void @foo(), !noalias.addrspace !4
+ ret void
+}
+
+define void @call_memcpy_intrinsic_addrspace__5_6(ptr %dst, ptr %src, i64 %size) {
+; CHECK-LABEL: define void @call_memcpy_intrinsic_addrspace__5_6(
+; CHECK-SAME: ptr [[DST:%.*]], ptr [[SRC:%.*]], i64 [[SIZE:%.*]]) {
+; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[DST]], ptr [[SRC]], i64 [[SIZE]], i1 false), !noalias.addrspace [[META4]]
+; CHECK-NEXT: ret void
+;
+ call void @llvm.memcpy.p0.p0.i64(ptr %dst, ptr %src, i64 %size, i1 false), !noalias.addrspace !4
+ ret void
+}
+
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #0
+
+attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
+
+!0 = !{i32 0, i32 1}
+!1 = !{i32 0, i32 2}
+!2 = !{i32 1, i32 3}
+!3 = !{i32 4, i32 6, i32 10, i32 55}
+!4 = !{i32 5, i32 6}
+;.
+; CHECK: [[META0]] = !{i32 0, i32 1}
+; CHECK: [[META1]] = !{i32 0, i32 2}
+; CHECK: [[META2]] = !{i32 1, i32 3}
+; CHECK: [[META3]] = !{i32 4, i32 6, i32 10, i32 55}
+; CHECK: [[META4]] = !{i32 5, i32 6}
+;.
diff --git a/llvm/test/Verifier/noalias-addrspace.ll b/llvm/test/Verifier/noalias-addrspace.ll
new file mode 100644
index 00000000000000..67a7293d2561cc
--- /dev/null
+++ b/llvm/test/Verifier/noalias-addrspace.ll
@@ -0,0 +1,60 @@
+; RUN: not llvm-as < %s -o /dev/null 2>&1 | FileCheck %s
+
+; CHECK: It should have at least one range!
+; CHECK-NEXT: !0 = !{}
+define i64 @noalias_addrspace__empty(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !0
+ ret i64 %ret
+}
+
+; CHECK: Unfinished range!
+; CHECK-NEXT: !1 = !{i32 0}
+define i64 @noalias_addrspace__single_field(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !1
+ ret i64 %ret
+}
+
+; CHECK: Range must not be empty!
+; CHECK-NEXT: !2 = !{i32 0, i32 0}
+define i64 @noalias_addrspace__0_0(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !2
+ ret i64 %ret
+}
+
+; CHECK: noalias.addrspace type must be i32!
+; CHECK-NEXT: %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, align 8, !noalias.addrspace !3
+define i64 @noalias_addrspace__i64(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !3
+ ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__fp(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !4
+ ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__ptr(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !5
+ ret i64 %ret
+}
+
+; CHECK: The lower limit must be an integer!
+define i64 @noalias_addrspace__nonconstant(ptr %ptr, i64 %val) {
+ %ret = atomicrmw add ptr %ptr, i64 %val seq_cst, !noalias.addrspace !6
+ ret i64 %ret
+}
+
+@gv0 = global i32 0
+@gv1 = global i32 1
+
+!0 = !{}
+!1 = !{i32 0}
+!2 = !{i32 0, i32 0}
+!3 = !{i64 1, i64 5}
+!4 = !{float 0.0, float 2.0}
+!5 = !{ptr null, ptr addrspace(1) null}
+!6 = !{i32 ptrtoint (ptr @gv0 to i32), i32 ptrtoint (ptr @gv1 to i32) }
+
+
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be a range? What if the address spaces of interest are not consecutive?
You can specify as many ranges as you want |
Is the MD a promise to the backend that the attached memory operation is not in a specific address space? Who will attach the MD to the IR? FE or some LLVM passes? FE may not be able to figure out the real address space of some memory operations. |
Yes, for the underlying object definition.
The main user is the frontend since OpenCL/CUDA give the guarantee. We can also infer this in many cases in backend passes. |
e403d08
to
1e084c6
Compare
spaces, which at runtime resolve to different physical memory | ||
spaces. The interpretation of the address space values is target | ||
specific. The behavior is undefined if the runtime memory address does | ||
resolve to an object defined in one of the indicated address spaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me from this what the intended behaviour is when this metadata is used to specify an object is not in the generic address space. Does that effectively mean this code must be unreachable, or does that say nothing because the concrete address space any object is in will not be the generic address space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure how to parse this question.
when this metadata is used to specify an object is not in the generic address space
Do you mean, in the amdgpu case, !{i32 0, i32 1}?
because the concrete address space any object is in will not be the generic address space?
We avoid creating objects in the generic address space (except functions), but it shouldn't really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean, in the amdgpu case, !{i32 0, i32 1}?
Suppose you just have !noalias.addrspace !0 with !0 = !{i32 1}.
We avoid creating objects in the generic address space (except functions), but it shouldn't really matter?
If you have an object in the global address space 0, which is also addressable using the generic address space 1 (please correct me if I am getting the AMDGPU address spaces wrong), would the above !noalias.addrspace specify that it can or that it cannot alias that object? I cannot see a good reason why a frontend would generate that directly, but I can imagine future additions where a frontend would generate that based on user-specified address spaces in source code, so I think it would be good to have that answered upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have an object in the global address space 0, which is also addressable using the generic address space 1
These are backwards, 0 is generic and 1 is global.
To simplify your question with an example, I would interpret
@globalvar.in.0 = addrspace(0) constant i64
@globalvar.in.1 = addrspace(1) constant i64
; instant undefined behavior, can fold to trap
%rmw.ub = atomicrmw and ptr @globalvar.in.0, i64 %value seq_cst, !noalias.addrspace !0
; OK, underlying object is in addrspace(1)
%rmw.ok0 = atomicrmw and ptr addrspace(1) @globalvar.in.1, i64 %value seq_cst, !noalias.addrspace !0
; OK, underlying object is in addrspace(1) even though access is through forbidden addrspace(0) the object ; definition address space is what matters
%cast.gv.in.1 = addrspacecast ptr addrspace(1) @globalvar.in.1 to ptr
%rmw.ok0 = atomicrmw and ptr %cast.gv.in.1, i64 %value seq_cst, !noalias.addrspace !0
!0 = !{i32 0, i32 1} ; cannot alias 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify your question with an example, I would interpret
@globalvar.in.0 = addrspace(0) constant i64 @globalvar.in.1 = addrspace(1) constant i64 ; instant undefined behavior, can fold to trap %rmw.ub = atomicrmw and ptr @globalvar.in.0, i64 %value seq_cst, !noalias.addrspace !0 ; OK, underlying object is in addrspace(1) %rmw.ok0 = atomicrmw and ptr addrspace(1) @globalvar.in.1, i64 %value seq_cst, !noalias.addrspace !0 ; OK, underlying object is in addrspace(1) even though access is through forbidden addrspace(0) the object ; definition address space is what matters %cast.gv.in.1 = addrspacecast ptr addrspace(1) @globalvar.in.1 to ptr %rmw.ok0 = atomicrmw and ptr %cast.gv.in.1, i64 %value seq_cst, !noalias.addrspace !0 !0 = !{i32 0, i32 1} ; cannot alias 0
This example is confusing for me. Since addr space 0 is a union of all concrete addr spaces including 1, 2, 3, 5, etc, an actual global pointer with MD promising not alias to 0 is self-contradictory.
If in this example the MD is promising not alias to 5 then it has no confusion.
Then it comes the question: does it really meaningful to have a MD promising not alias to generic addr space 0? Should that be disallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IR does not have a concept of generic address spaces, and this description is not written in terms of a generic address space. It merely states the address space of the underlying object allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. got it
This is not true for CUDA, an LLVM IR atomic instruction on thread private memory is fine. |
Are you sure? The CUDA C++ Programming Guide appears to agree that atomic functions cannot act on private memory. From 10.14. Atomic Functions:
|
This extension looks good to me and it can be useful in a few places. I hope we watch out for and resist the temptation to use it to reify backend bugs as programming model "features", i.e., instead of declaring that some backend not supporting some op to some statespace is a backend bug, we end up declaring that performing that op to that statespace is UB in the programming model, and end up using this feature to encode that. |
But that's what this is doing? We're making the backend handle the IR operation with fully correct generality, and frontends have to opt-in to get the optimizations in the cases the language specifications say are not required |
From a CHERI perspective (one of the other big users of LLVM address spaces) this is entirely irrelevant (our address spaces are entirely coincident), so no objection there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
%alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr | ||
%rmw.ub = atomicrmw and ptr %alloca.cast, i64 %value seq_cst, !noalias.addrspace !0 | ||
|
||
!0 = !{i32 5, i32 6} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably make sense to add a comment here.
16dd092
to
c3dbcec
Compare
llvm/docs/LangRef.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The ``noalias.addrspace`` metadata is used to identify memory | ||
operations which cannot access a range of address spaces. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will/can alias analysis be extended to look at this and infer NoAlias/NoModRef where appropriate? I was thinking something like AddressSpaceNoAliasAA
similar to ScopeNoAliasAA
can be added to the default AA pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more concretely:
- A new AddressSpaceNoAliasAA pass that infers NoAlias based on address spaces of pointers.
- Its constructor will query a NoAlias map
DenseMap<int, SmallSet<int>>
from the Target, so targets can provide base info about which address spaces do not alias with which other. Default will be empty, and then this AA will conservatively infer that all address spaces overlap/interfere with all others. - The new metadata essentially specifies, per pointer an overriding entry in the NoAlias map for that instruction.
- Two pointers in address spaces A, B do not alias if NoAlias[A][B] = true, or if A has the
noalias.addrspace
metadata that lists B, or B has anoalias.addrespace
metadata that lists A. - The new metadata can also be attached to calls and intrinsics that may have memory side effects.
AAMDNodes
will start carrying this info as an additional field.
Please let me know if something like this might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not planning on touching alias analysis for this, but in principle it should be possible. We probably should have an address space AA. Currently we deal with this with target specific backend analyses, but this has the drawback that it's after the heavier AA passes in the pipeline when we can give cheaper addrspace checks in many cases.
TargetTransformInfo already has addrspacesMayAlias and isValidAddrSpaceCast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I am working on a rough sketch for this based on above. Will try to post an RFC once I get some feedback internally at NVIDIA.
Check(isa<LoadInst>(I) || isa<StoreInst>(I) || isa<AtomicRMWInst>(I) || | ||
isa<AtomicCmpXchgInst>(I) || isa<CallInst>(I), | ||
"noalias.addrspace are only for memory operations!", &I); | ||
visitNoaliasAddrspaceMetadata(I, Range, I.getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add a check to the effect that if there is an operand in address space 'n', 'n' cannot be listed in the noalias.addrspace metadata for that instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the rule only is for the address space of the underlying object. You may be accessing it through an addrspacecasted pointer. If we wanted something like that, it would be more appropriate to put in the Lint analysis. Undefined behavior that transforms can happen to introduce generally should not be IR verifier errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I can see that for say things like function or intrinsics, where they accept a pointer in address space 'n' but also assert that its underlying address space is not 'n' and access it internally using another address space (so the pointer is pure bit container to pass some data into the function or intrinsic). But for native LLVM insts, like LoadInst
when would it make sense to have its pointer operand in addspace(n) and also have the noalias.addrspace
metadata listing n? I guess we are calling it undefined behavior but still valid IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address spaces may overlap, a pointer to one address space may point to an object defined in another address space. A load
instruction that acts on a pointer to one address space, and asserts that the pointer does not point to an object defined in that address space, could presumably still have well-defined behaviour if the pointed-to object is defined in another address space but is accessible through multiple address spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thought:
The spec language is: The noalias.addrspace
metadata is used to identify memory
operations which cannot access a range of address spaces.
In general, given a pointer in AS 'n', we can have set of address spaces that potentially alias with 'n', and then any memory access using AS n
pointer is a potential access to any of the aliasing address spaces. And then this metadata is a guarantee that some of those aliasing address spaces can be dropped from the set, right? If Aliases(P)
is all address spaces that can potentially alias with P (including P itself), then a LoadInst
for example would be invalid if Aliases(P) - noalias.addrspace
is an empty set, i.e., no address space is left to define the access?
Assuming address spaces are organized as a tree (or a DAG), where each node is an address space, and its children/directed neighbors are all address spaces that are contained within the parent, Aliases(P) is all address spaces k such that k is on a path from some root to some leaf that contains node (p). So, Aliases(p) capture all sub-address spaces within (p) as well as all address spaces that p is a sub-address space of. Note that we can have another definition where Aliases(P) is only its sub-address spaces looking down in this tree but not up (Descendents).
So, whenever you access a pointer in AS p, you can potentially access any of the address spaces in Aliases(P). And then noalias.addrspace drops some address spaces from that. If we have native LoadInst whose pointer AS is P, and then noalias.addrespace, the set of legal address spaces its allowed to access is:
Aliases(P) - noalias.addrspace
If this set happens to be empty, then I'd think the IR is invalid (we have not left any address space for the load to do its memory access. Or maybe undefined)?
And if Alias(P) includes parents as well as descendent address spaces, then
load with pointer p and noalias.addresspace = p can be well defined, since Alias(P) - {p) will include say a parent address space, and the load is "required" to upcast the pointer from AS p to that parent address space (say generic) to do the access.
If OTOH we define the set of legal address spaces to be:
Descendents(P) - (union(Descendents(noalias.addrspace))
which I think is the more logical meaning, then load with pointer p and noalias.addresspace = p will always be undefined as the set of legal address spaces left will always be null.
Essentially, I am defining legality by first defining the set of legal address spaces that a load can access, and then making a load/store legal only if that set is non-empty. Whether we use AscendentsAndDescendepts(p) or just Descendents(p) I guess depends on the intended semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was to only restrict the underlying object's address space, which is most obviously definable in terms of generic IR rules we already have.
The IR has no generic rules for address space aliasing relationships. Any address space is assumed to potentially alias any other address space, and this is only relaxed by target AliasAnalysis extensions. An accessing LoadInst with an explicit noalias.addrspace forbidding the exact address space of the load has to be assumed valid, as we know nothing about what address spaces can alias others.
If this set happens to be empty, then I'd think the IR is invalid (we have not left any address space for the load to do its memory access. Or maybe undefined)?
We can't make these situations invalid IR without requiring more target information in the IR, which I want to avoid. The assumption is the IR producer knows what it's doing for the target, and any address space value may have been produced through a valid chain of addrspacecasts.
I've tried to reword the LangRef to be more explicit this only says anything about the original object's address space, not the access
79e1e8b
to
f84e99b
Compare
fb9d412
to
d7b148d
Compare
ping |
This is intended to solve a problem with lowering atomics in OpenMP and C++ common to AMDGPU and NVPTX. In OpenCL and CUDA, it is undefined behavior for an atomic instruction to modify an object in thread private memory. In OpenMP, it is defined. Correspondingly, the hardware does not handle this correctly. For AMDGPU, 32-bit atomics work and 64-bit atomics are silently dropped. We therefore need to codegen this by inserting a runtime address space check, performing the private case without atomics, and fallback to issuing the real atomic otherwise. This metadata allows us to avoid this extra check and branch. Handle this by introducing metadata intended to be applied to atomicrmw, indicating they cannot access the forbidden address space.
d7b148d
to
4a6df18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. I don't have any objections to the range format either.
Perhaps those with the original questions/objections should chime in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -316,10 +316,80 @@ out: | |||
ret void | |||
} | |||
|
|||
define void @hoist_noalias_addrspace_both(i1 %c, ptr %p, i64 %val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the various tests you added here are just to show conservatively correct behavior, and you plan to implement proper merging support (via combineMetadataForCSE and friends) in a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.
In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We therefore
need to codegen this by inserting a runtime address space check, performing
the private case without atomics, and fallback to issuing the real atomic
otherwise. This metadata allows us to avoid this extra check and branch.
Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.