Skip to content

AMDGPU: Reduce readfirstlane for single demanded vector element #128647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Feb 25, 2025

If we are only extracting a single element, rewrite the intrinsic call
to use the element type. We should extend this to arbitrary extract
shuffles.

Copy link
Contributor Author

arsenm commented Feb 25, 2025

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

If we are only extracting a single element, rewrite the intrinsic call
to use the element type. We should extend this to arbitrary extract
shuffles.


Full diff: https://github.com/llvm/llvm-project/pull/128647.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+44-2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+6)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll (+28-29)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index 617974713d6f0..99016fdd0ff91 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1538,6 +1538,49 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
   return NewCall;
 }
 
+Value *GCNTTIImpl::simplifyAMDGCNLaneIntrinsicDemanded(
+    InstCombiner &IC, IntrinsicInst &II, const APInt &DemandedElts,
+    APInt &UndefElts) const {
+  auto *VT = dyn_cast<FixedVectorType>(II.getType());
+  if (!VT)
+    return nullptr;
+
+  const unsigned FirstElt = DemandedElts.countr_zero();
+  const unsigned LastElt = DemandedElts.getActiveBits() - 1;
+  const unsigned MaskLen = LastElt - FirstElt + 1;
+
+  // TODO: Handle general subvector extract.
+  if (MaskLen != 1)
+    return nullptr;
+
+  Type *EltTy = VT->getElementType();
+  if (!isTypeLegal(EltTy))
+    return nullptr;
+
+  Value *Src = II.getArgOperand(0);
+
+  assert(FirstElt == LastElt);
+  Value *Extract = IC.Builder.CreateExtractElement(Src, FirstElt);
+
+  // Make sure convergence tokens are preserved.
+  // TODO: CreateIntrinsic should allow directly copying bundles
+  SmallVector<OperandBundleDef, 2> OpBundles;
+  II.getOperandBundlesAsDefs(OpBundles);
+
+  Module *M = IC.Builder.GetInsertBlock()->getModule();
+  Function *Remangled = Intrinsic::getOrInsertDeclaration(
+      M, II.getIntrinsicID(), {Extract->getType()});
+
+  // TODO: Preserve callsite attributes?
+  CallInst *NewCall = IC.Builder.CreateCall(Remangled, {Extract}, OpBundles);
+
+  Value *Result = IC.Builder.CreateInsertElement(PoisonValue::get(II.getType()),
+                                                 NewCall, FirstElt);
+  IC.replaceInstUsesWith(II, Result);
+  IC.eraseInstFromFunction(II);
+  return Result;
+}
+
 std::optional<Value *> GCNTTIImpl::simplifyDemandedVectorEltsIntrinsic(
     InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
     APInt &UndefElts2, APInt &UndefElts3,
@@ -1545,9 +1588,8 @@ std::optional<Value *> GCNTTIImpl::simplifyDemandedVectorEltsIntrinsic(
         SimplifyAndSetOp) const {
   switch (II.getIntrinsicID()) {
   case Intrinsic::amdgcn_readfirstlane:
-    // TODO: For a vector extract, should reduce the intrinsic call type.
     SimplifyAndSetOp(&II, 0, DemandedElts, UndefElts);
-    return std::nullopt;
+    return simplifyAMDGCNLaneIntrinsicDemanded(IC, II, DemandedElts, UndefElts);
   case Intrinsic::amdgcn_raw_buffer_load:
   case Intrinsic::amdgcn_raw_ptr_buffer_load:
   case Intrinsic::amdgcn_raw_buffer_load_format:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index a0d62008d9ddc..f5062070ac6f4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -226,6 +226,12 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
 
   std::optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,
                                                     IntrinsicInst &II) const;
+
+  Value *simplifyAMDGCNLaneIntrinsicDemanded(InstCombiner &IC,
+                                             IntrinsicInst &II,
+                                             const APInt &DemandedElts,
+                                             APInt &UndefElts) const;
+
   std::optional<Value *> simplifyDemandedVectorEltsIntrinsic(
       InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
       APInt &UndefElts2, APInt &UndefElts3,
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
index 836c739048411..e9d3b5e963b35 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
@@ -4,8 +4,8 @@
 define i16 @extract_elt0_v2i16_readfirstlane(<2 x i16> %src) {
 ; CHECK-LABEL: define i16 @extract_elt0_v2i16_readfirstlane(
 ; CHECK-SAME: <2 x i16> [[SRC:%.*]]) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i16> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i16> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i16 @llvm.amdgcn.readfirstlane.i16(i16 [[TMP1]])
 ; CHECK-NEXT:    ret i16 [[ELT]]
 ;
   %vec = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> %src)
@@ -16,8 +16,8 @@ define i16 @extract_elt0_v2i16_readfirstlane(<2 x i16> %src) {
 define i16 @extract_elt0_v1i16_readfirstlane(<1 x i16> %src) {
 ; CHECK-LABEL: define i16 @extract_elt0_v1i16_readfirstlane(
 ; CHECK-SAME: <1 x i16> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <1 x i16> @llvm.amdgcn.readfirstlane.v1i16(<1 x i16> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <1 x i16> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <1 x i16> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i16 @llvm.amdgcn.readfirstlane.i16(i16 [[TMP1]])
 ; CHECK-NEXT:    ret i16 [[ELT]]
 ;
   %vec = call <1 x i16> @llvm.amdgcn.readfirstlane.v1i16(<1 x i16> %src)
@@ -28,8 +28,8 @@ define i16 @extract_elt0_v1i16_readfirstlane(<1 x i16> %src) {
 define i16 @extract_elt1_v2i16_readfirstlane(<2 x i16> %src) {
 ; CHECK-LABEL: define i16 @extract_elt1_v2i16_readfirstlane(
 ; CHECK-SAME: <2 x i16> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i16> [[VEC]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i16> [[SRC]], i64 1
+; CHECK-NEXT:    [[ELT:%.*]] = call i16 @llvm.amdgcn.readfirstlane.i16(i16 [[TMP1]])
 ; CHECK-NEXT:    ret i16 [[ELT]]
 ;
   %vec = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> %src)
@@ -40,8 +40,8 @@ define i16 @extract_elt1_v2i16_readfirstlane(<2 x i16> %src) {
 define i16 @extract_elt0_v4i16_readfirstlane(<4 x i16> %src) {
 ; CHECK-LABEL: define i16 @extract_elt0_v4i16_readfirstlane(
 ; CHECK-SAME: <4 x i16> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <4 x i16> @llvm.amdgcn.readfirstlane.v4i16(<4 x i16> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <4 x i16> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <4 x i16> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i16 @llvm.amdgcn.readfirstlane.i16(i16 [[TMP1]])
 ; CHECK-NEXT:    ret i16 [[ELT]]
 ;
   %vec = call <4 x i16> @llvm.amdgcn.readfirstlane.v4i16(<4 x i16> %src)
@@ -52,8 +52,8 @@ define i16 @extract_elt0_v4i16_readfirstlane(<4 x i16> %src) {
 define i16 @extract_elt2_v4i16_readfirstlane(<4 x i16> %src) {
 ; CHECK-LABEL: define i16 @extract_elt2_v4i16_readfirstlane(
 ; CHECK-SAME: <4 x i16> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <4 x i16> @llvm.amdgcn.readfirstlane.v4i16(<4 x i16> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <4 x i16> [[VEC]], i64 2
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <4 x i16> [[SRC]], i64 2
+; CHECK-NEXT:    [[ELT:%.*]] = call i16 @llvm.amdgcn.readfirstlane.i16(i16 [[TMP1]])
 ; CHECK-NEXT:    ret i16 [[ELT]]
 ;
   %vec = call <4 x i16> @llvm.amdgcn.readfirstlane.v4i16(<4 x i16> %src)
@@ -136,8 +136,8 @@ define <2 x i16> @extract_elt30_v4i16_readfirstlane(<4 x i16> %src) {
 define half @extract_elt0_v2f16_readfirstlane(<2 x half> %src) {
 ; CHECK-LABEL: define half @extract_elt0_v2f16_readfirstlane(
 ; CHECK-SAME: <2 x half> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x half> @llvm.amdgcn.readfirstlane.v2f16(<2 x half> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x half> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x half> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call half @llvm.amdgcn.readfirstlane.f16(half [[TMP1]])
 ; CHECK-NEXT:    ret half [[ELT]]
 ;
   %vec = call <2 x half> @llvm.amdgcn.readfirstlane.v2i16(<2 x half> %src)
@@ -148,8 +148,8 @@ define half @extract_elt0_v2f16_readfirstlane(<2 x half> %src) {
 define half @extract_elt1_v2f16_readfirstlane(<2 x half> %src) {
 ; CHECK-LABEL: define half @extract_elt1_v2f16_readfirstlane(
 ; CHECK-SAME: <2 x half> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x half> @llvm.amdgcn.readfirstlane.v2f16(<2 x half> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x half> [[VEC]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x half> [[SRC]], i64 1
+; CHECK-NEXT:    [[ELT:%.*]] = call half @llvm.amdgcn.readfirstlane.f16(half [[TMP1]])
 ; CHECK-NEXT:    ret half [[ELT]]
 ;
   %vec = call <2 x half> @llvm.amdgcn.readfirstlane.v2i16(<2 x half> %src)
@@ -186,8 +186,8 @@ define i32 @extract_elt0_nxv4i32_readfirstlane(<vscale x 2 x i32> %src) {
 define i32 @extract_elt0_v2i32_readfirstlane(<2 x i32> %src) {
 ; CHECK-LABEL: define i32 @extract_elt0_v2i32_readfirstlane(
 ; CHECK-SAME: <2 x i32> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i32> @llvm.amdgcn.readfirstlane.v2i32(<2 x i32> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i32> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i32> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP1]])
 ; CHECK-NEXT:    ret i32 [[ELT]]
 ;
   %vec = call <2 x i32> @llvm.amdgcn.readfirstlane.v2i32(<2 x i32> %src)
@@ -198,8 +198,8 @@ define i32 @extract_elt0_v2i32_readfirstlane(<2 x i32> %src) {
 define ptr addrspace(3) @extract_elt0_v2p3_readfirstlane(<2 x ptr addrspace(3)> %src) {
 ; CHECK-LABEL: define ptr addrspace(3) @extract_elt0_v2p3_readfirstlane(
 ; CHECK-SAME: <2 x ptr addrspace(3)> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x ptr addrspace(3)> @llvm.amdgcn.readfirstlane.v2p3(<2 x ptr addrspace(3)> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x ptr addrspace(3)> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x ptr addrspace(3)> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call ptr addrspace(3) @llvm.amdgcn.readfirstlane.p3(ptr addrspace(3) [[TMP1]])
 ; CHECK-NEXT:    ret ptr addrspace(3) [[ELT]]
 ;
   %vec = call <2 x ptr addrspace(3)> @llvm.amdgcn.readfirstlane.v2p3(<2 x ptr addrspace(3)> %src)
@@ -210,8 +210,8 @@ define ptr addrspace(3) @extract_elt0_v2p3_readfirstlane(<2 x ptr addrspace(3)>
 define i64 @extract_elt0_v2i64_readfirstlane(<2 x i64> %src) {
 ; CHECK-LABEL: define i64 @extract_elt0_v2i64_readfirstlane(
 ; CHECK-SAME: <2 x i64> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i64> @llvm.amdgcn.readfirstlane.v2i64(<2 x i64> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i64> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i64> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i64 @llvm.amdgcn.readfirstlane.i64(i64 [[TMP1]])
 ; CHECK-NEXT:    ret i64 [[ELT]]
 ;
   %vec = call <2 x i64> @llvm.amdgcn.readfirstlane.v2i64(<2 x i64> %src)
@@ -222,8 +222,8 @@ define i64 @extract_elt0_v2i64_readfirstlane(<2 x i64> %src) {
 define i64 @extract_elt1_v2i64_readfirstlane(<2 x i64> %src) {
 ; CHECK-LABEL: define i64 @extract_elt1_v2i64_readfirstlane(
 ; CHECK-SAME: <2 x i64> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i64> @llvm.amdgcn.readfirstlane.v2i64(<2 x i64> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i64> [[VEC]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i64> [[SRC]], i64 1
+; CHECK-NEXT:    [[ELT:%.*]] = call i64 @llvm.amdgcn.readfirstlane.i64(i64 [[TMP1]])
 ; CHECK-NEXT:    ret i64 [[ELT]]
 ;
   %vec = call <2 x i64> @llvm.amdgcn.readfirstlane.v2i64(<2 x i64> %src)
@@ -306,9 +306,8 @@ define <2 x i16> @extract_elt13_v4i16readfirstlane(<4 x i16> %src) {
 define <2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify0(i32 %src0, i32 %src2) {
 ; CHECK-LABEL: define <2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify0(
 ; CHECK-SAME: i32 [[SRC0:%.*]], i32 [[SRC2:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 1
-; CHECK-NEXT:    [[VEC:%.*]] = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> [[INS_1]])
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 poison>
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[SRC0]])
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = insertelement <2 x i32> poison, i32 [[TMP1]], i64 0
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %ins.0 = insertelement <4 x i32> poison, i32 %src0, i32 0
@@ -350,8 +349,8 @@ define i32 @extract_elt0_v2i32_readfirstlane_convergencetoken(<2 x i32> %src) co
 ; CHECK-LABEL: define i32 @extract_elt0_v2i32_readfirstlane_convergencetoken(
 ; CHECK-SAME: <2 x i32> [[SRC:%.*]]) #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:    [[T:%.*]] = call token @llvm.experimental.convergence.entry()
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i32> @llvm.amdgcn.readfirstlane.v2i32(<2 x i32> [[SRC]]) [ "convergencectrl"(token [[T]]) ]
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i32> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i32> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP1]]) [ "convergencectrl"(token [[T]]) ]
 ; CHECK-NEXT:    ret i32 [[ELT]]
 ;
   %t = call token @llvm.experimental.convergence.entry()
@@ -381,8 +380,8 @@ define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify1_convergenc
 define i1 @extract_elt0_v2i1_readfirstlane(<2 x i1> %src) {
 ; CHECK-LABEL: define i1 @extract_elt0_v2i1_readfirstlane(
 ; CHECK-SAME: <2 x i1> [[SRC:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[VEC:%.*]] = call <2 x i1> @llvm.amdgcn.readfirstlane.v2i1(<2 x i1> [[SRC]])
-; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x i1> [[VEC]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i1> [[SRC]], i64 0
+; CHECK-NEXT:    [[ELT:%.*]] = call i1 @llvm.amdgcn.readfirstlane.i1(i1 [[TMP1]])
 ; CHECK-NEXT:    ret i1 [[ELT]]
 ;
   %vec = call <2 x i1> @llvm.amdgcn.readfirstlane.v2i1(<2 x i1> %src)

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, but this handling applies way more generally than just readfirstlane (or just amdgcbn lane intrinsics). It applies to any intrinsic that operates elementwise on vectors. Isn't there a generic helper for that somewhere?

@arsenm
Copy link
Contributor Author

arsenm commented Feb 25, 2025

Looks OK, but this handling applies way more generally than just readfirstlane (or just amdgcbn lane intrinsics). It applies to any intrinsic that operates elementwise on vectors. Isn't there a generic helper for that somewhere?

Yes, this should be expanded to cover all of those cases (we should do the same for demanded bits). I was going to leave this as the first sample one, and open an issue to expand to the other cases

@arsenm arsenm force-pushed the users/arsenm/amdgpu/simplify-demanded-vector-elts-readfirstlane-src branch from ca0b3a4 to e32caff Compare February 28, 2025 05:57
Base automatically changed from users/arsenm/amdgpu/simplify-demanded-vector-elts-readfirstlane-src to main February 28, 2025 06:01
If we are only extracting a single element, rewrite the intrinsic call
to use the element type. We should extend this to arbitrary extract
shuffles.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/simplify-demanded-vector-elts-readfirstlane branch from bfe67bc to c80695c Compare February 28, 2025 06:03
Copy link
Contributor Author

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add test with <2 x i8> and see if it runs to the actual codegen?

@rampitec
Copy link
Collaborator

rampitec commented Mar 4, 2025

Can you also add test with <2 x i8> and see if it runs to the actual codegen?

Actually I think it does not work either way, <2 x i8> or just i8. So LGTM, but I think it would be better to also apply #114887. It will make i8 work, although <2 x i8> still does not.

Copy link
Contributor Author

arsenm commented Mar 5, 2025

Merge activity

  • Mar 4, 8:35 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Mar 4, 8:35 PM EST: A user merged this pull request with Graphite.

@arsenm arsenm merged commit 95c64b7 into main Mar 5, 2025
11 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/simplify-demanded-vector-elts-readfirstlane branch March 5, 2025 01:35
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 5, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-hwasan running on sanitizer-buildbot11 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/7930

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86882 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.
FAIL: LLVM :: Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll (66456 of 86882)
******************** TEST 'LLVM :: Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=instcombine < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll | /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=instcombine
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=instcombine
1.	Running pass "function(instcombine<max-iterations=1;verify-fixpoint>)" on module "<stdin>"
2.	Running pass "instcombine<max-iterations=1;verify-fixpoint>" on function "extract_elt0_v1i16_readfirstlane"
 #0 0x0000bd8d6fa52b74 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:804:13
 #1 0x0000bd8d6fa4d30c llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x0000bd8d6fa54180 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3
 #3 0x0000f425cb3568f8 (linux-vdso.so.1+0x8f8)
 #4 0x0000f425cac87608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608)
 #5 0x0000f425cac3cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c)
 #6 0x0000f425cac27e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00)
 #7 0x0000bd8d6f98c670 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10
 #8 0x0000bd8d6f98a494 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
 #9 0x0000bd8d6f9753b8 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16
#10 0x0000bd8d6f9753b8 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51
#11 0x0000bd8d6f9753b8 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5
#12 0x0000bd8d6f974b48 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:476:7
#13 0x0000bd8d6f9728dc __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1091:1
#14 0x0000bd8d6f95ecc4 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31
#15 0x0000bd8d6f95ecc4 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56
#16 0x0000bd8d6f95ecc4 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1
#17 0x0000bd8d6f961078 __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1
#18 0x0000bd8d6f9760bc __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0
#19 0x0000bd8d6fcbc398 getValueID /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/IR/Value.h:533:12
#20 0x0000bd8d6fcbc398 doit /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/IR/Value.h:1031:16
#21 0x0000bd8d6fcbc398 doit /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:81:12
#22 0x0000bd8d6fcbc398 doit /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:137:12
#23 0x0000bd8d6fcbc398 isPossible /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:255:12
#24 0x0000bd8d6fcbc398 isa<llvm::GlobalVariable, llvm::Value> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:549:10
Step 11 (stage2/hwasan check) failure: stage2/hwasan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:512: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86882 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.
FAIL: LLVM :: Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll (66456 of 86882)
******************** TEST 'LLVM :: Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=instcombine < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll | /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=instcombine
+ /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -passes=instcombine
1.	Running pass "function(instcombine<max-iterations=1;verify-fixpoint>)" on module "<stdin>"
2.	Running pass "instcombine<max-iterations=1;verify-fixpoint>" on function "extract_elt0_v1i16_readfirstlane"
 #0 0x0000bd8d6fa52b74 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:804:13
 #1 0x0000bd8d6fa4d30c llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x0000bd8d6fa54180 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3
 #3 0x0000f425cb3568f8 (linux-vdso.so.1+0x8f8)
 #4 0x0000f425cac87608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608)
 #5 0x0000f425cac3cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c)
 #6 0x0000f425cac27e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00)
 #7 0x0000bd8d6f98c670 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10
 #8 0x0000bd8d6f98a494 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
 #9 0x0000bd8d6f9753b8 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16
#10 0x0000bd8d6f9753b8 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51
#11 0x0000bd8d6f9753b8 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5
#12 0x0000bd8d6f974b48 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:476:7
#13 0x0000bd8d6f9728dc __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1091:1
#14 0x0000bd8d6f95ecc4 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31
#15 0x0000bd8d6f95ecc4 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56
#16 0x0000bd8d6f95ecc4 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1
#17 0x0000bd8d6f961078 __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1
#18 0x0000bd8d6f9760bc __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0
#19 0x0000bd8d6fcbc398 getValueID /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/IR/Value.h:533:12
#20 0x0000bd8d6fcbc398 doit /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/IR/Value.h:1031:16
#21 0x0000bd8d6fcbc398 doit /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:81:12
#22 0x0000bd8d6fcbc398 doit /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:137:12
#23 0x0000bd8d6fcbc398 isPossible /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:255:12
#24 0x0000bd8d6fcbc398 isa<llvm::GlobalVariable, llvm::Value> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/include/llvm/Support/Casting.h:549:10

@arsenm
Copy link
Contributor Author

arsenm commented Mar 5, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-hwasan running on sanitizer-buildbot11 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/7930

Here is the relevant piece of the build log for the reference

Fixed by5c375c3283fcd2bf4f98fe8627658e056e25dc44

@jayfoad
Copy link
Contributor

jayfoad commented Mar 7, 2025

Looks OK, but this handling applies way more generally than just readfirstlane (or just amdgcbn lane intrinsics). It applies to any intrinsic that operates elementwise on vectors. Isn't there a generic helper for that somewhere?

Yes, this should be expanded to cover all of those cases (we should do the same for demanded bits). I was going to leave this as the first sample one, and open an issue to expand to the other cases

Should be able to do it in generic code for any isTriviallyScalarizable intrinsic. Then AMDGPU would just need to implement isTargetIntrinsicTriviallyScalarizable.

jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
…#128647)

If we are only extracting a single element, rewrite the intrinsic call
to use the element type. We should extend this to arbitrary extract
shuffles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants