Skip to content

AMDGPU: Simplify demanded vector elts of readfirstlane sources #128646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Feb 25, 2025

Stub implementation of simplifyDemandedVectorEltsIntrinsic for
readfirstlane.

Copy link
Contributor Author

arsenm commented Feb 25, 2025

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Stub implementation of simplifyDemandedVectorEltsIntrinsic for
readfirstlane.


Full diff: https://github.com/llvm/llvm-project/pull/128646.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+4)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll (+3-8)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index ebc00e59584ac..617974713d6f0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1544,6 +1544,10 @@ std::optional<Value *> GCNTTIImpl::simplifyDemandedVectorEltsIntrinsic(
     std::function<void(Instruction *, unsigned, APInt, APInt &)>
         SimplifyAndSetOp) const {
   switch (II.getIntrinsicID()) {
+  case Intrinsic::amdgcn_readfirstlane:
+    // TODO: For a vector extract, should reduce the intrinsic call type.
+    SimplifyAndSetOp(&II, 0, DemandedElts, UndefElts);
+    return std::nullopt;
   case Intrinsic::amdgcn_raw_buffer_load:
   case Intrinsic::amdgcn_raw_ptr_buffer_load:
   case Intrinsic::amdgcn_raw_buffer_load_format:
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
index 83d9d0d032ed1..836c739048411 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/simplify-demanded-vector-elts-lane-intrinsics.ll
@@ -306,10 +306,9 @@ define <2 x i16> @extract_elt13_v4i16readfirstlane(<4 x i16> %src) {
 define <2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify0(i32 %src0, i32 %src2) {
 ; CHECK-LABEL: define <2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify0(
 ; CHECK-SAME: i32 [[SRC0:%.*]], i32 [[SRC2:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[INS_0:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 0
-; CHECK-NEXT:    [[INS_1:%.*]] = shufflevector <4 x i32> [[INS_0]], <4 x i32> poison, <4 x i32> <i32 0, i32 0, i32 poison, i32 poison>
+; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 1
 ; CHECK-NEXT:    [[VEC:%.*]] = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> [[INS_1]])
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
+; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 poison>
 ; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
 ;
   %ins.0 = insertelement <4 x i32> poison, i32 %src0, i32 0
@@ -338,11 +337,7 @@ define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify1(i32 %src0,
 define < 2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify2(i32 %src0, i32 %src2) {
 ; CHECK-LABEL: define <2 x i32> @extract_elt13_v4i32_readfirstlane_source_simplify2(
 ; CHECK-SAME: i32 [[SRC0:%.*]], i32 [[SRC2:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[INS_0:%.*]] = insertelement <4 x i32> poison, i32 [[SRC0]], i64 0
-; CHECK-NEXT:    [[INS_1:%.*]] = shufflevector <4 x i32> [[INS_0]], <4 x i32> poison, <4 x i32> <i32 0, i32 poison, i32 0, i32 poison>
-; CHECK-NEXT:    [[VEC:%.*]] = call <4 x i32> @llvm.amdgcn.readfirstlane.v4i32(<4 x i32> [[INS_1]])
-; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[VEC]], <4 x i32> poison, <2 x i32> <i32 1, i32 3>
-; CHECK-NEXT:    ret <2 x i32> [[SHUFFLE]]
+; CHECK-NEXT:    ret <2 x i32> poison
 ;
   %ins.0 = insertelement <4 x i32> poison, i32 %src0, i32 0
   %ins.1 = insertelement <4 x i32> %ins.0, i32 %src0, i32 2

Copy link
Contributor

@pravinjagtap pravinjagtap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

arsenm commented Feb 28, 2025

Merge activity

  • Feb 28, 12:52 AM EST: A user started a stack merge that includes this pull request via Graphite.
  • Feb 28, 12:58 AM EST: Graphite rebased this pull request as part of a merge.
  • Feb 28, 1:01 AM EST: A user merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/amdgpu/add-baseline-tests-simplify-demanded-vector-elts-readfirstlane branch from fe33057 to 859a8cb Compare February 28, 2025 05:54
Base automatically changed from users/arsenm/amdgpu/add-baseline-tests-simplify-demanded-vector-elts-readfirstlane to main February 28, 2025 05:57
Stub implementation of simplifyDemandedVectorEltsIntrinsic for
readfirstlane.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/simplify-demanded-vector-elts-readfirstlane-src branch from ca0b3a4 to e32caff Compare February 28, 2025 05:57
@arsenm arsenm merged commit d410f09 into main Feb 28, 2025
6 of 10 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/simplify-demanded-vector-elts-readfirstlane-src branch February 28, 2025 06:01
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
…128646)

Stub implementation of simplifyDemandedVectorEltsIntrinsic for
readfirstlane.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants