[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y #130475

phoebewang · 2025-03-09T08:31:28Z

Though it only happens in v1i1 when we generate llvm.masked.load/store intrinsics for APX cload/cstore.

https://godbolt.org/z/vjsrofsqx

llvmbot · 2025-03-09T08:32:01Z

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

Though it only happens in v1i1 when we generate llvm.masked.load/store intrinsics for APX cload/cstore.

https://godbolt.org/z/vjsrofsqx

Full diff: https://github.com/llvm/llvm-project/pull/130475.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+5)
(modified) llvm/test/CodeGen/X86/apx/cf.ll (+15)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 1992ef67164d8..be45a678bbcfe 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -45486,6 +45486,11 @@ static SDValue combineBitcast(SDNode *N, SelectionDAG &DAG,
   if (SDValue V = combineCastedMaskArithmetic(N, DAG, DCI, Subtarget))
     return V;
 
+  // bitcast(v1Ty insert_vector_elt(X, Y, 0)) --> Y
+  if (N0.getOpcode() == ISD::INSERT_VECTOR_ELT && SrcVT.getScalarType() == VT &&
+      SrcVT.getVectorNumElements() == 1)
+    return N0.getOperand(1);
+
   // Convert a bitcasted integer logic operation that has one bitcasted
   // floating-point operand into a floating-point logic operation. This may
   // create a load of a constant, but that is cheaper than materializing the
diff --git a/llvm/test/CodeGen/X86/apx/cf.ll b/llvm/test/CodeGen/X86/apx/cf.ll
index c71d7768834f3..216f187d986d6 100644
--- a/llvm/test/CodeGen/X86/apx/cf.ll
+++ b/llvm/test/CodeGen/X86/apx/cf.ll
@@ -124,3 +124,18 @@ entry:
   call void @llvm.masked.store.v4i64.p0(<4 x i64> %0, ptr %p, i32 8, <4 x i1> %cond2)
   ret void
 }
+
+define void @no_xor(i32 %a, i32 %b, ptr %c, ptr %d) #2 {
+; CHECK-LABEL: no_xor:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    cmpl %esi, %edi
+; CHECK-NEXT:    cfcmovnew (%rdx), %ax
+; CHECK-NEXT:    cfcmovnew %ax, (%rcx)
+; CHECK-NEXT:    retq
+entry:
+  %0 = icmp ne i32 %a, %b
+  %1 = insertelement <1 x i1> poison, i1 %0, i64 0
+  %2 = tail call <1 x i16> @llvm.masked.load.v1i16.p0(ptr %c, i32 2, <1 x i1> %1, <1 x i16> poison)
+  tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %2, ptr %d, i32 2, <1 x i1> %1)
+  ret void
+}

Though it only happens in v1i1 when we generate llvm.masked.load/store intrinsics for APX cload/cstore. https://godbolt.org/z/vjsrofsqx

RKSimon

Does this work as a generic DAGCombiner fold?

phoebewang · 2025-03-09T11:53:08Z

Does this work as a generic DAGCombiner fold?

It does, but is not necessary for now, because only v1i1 is a legal type to X86.

RKSimon

LGTM

llvm-ci · 2025-03-09T12:13:27Z

LLVM Buildbot has detected a new failure on builder amdgpu-offload-rhel-9-cmake-build-only running on rocm-docker-rhel-9 while building llvm at step 2 "update-annotated-scripts".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/205/builds/2838

Here is the relevant piece of the build log for the reference

Step 2 (update-annotated-scripts) failure: update (failure)
git version 2.43.5
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com

llvm-ci · 2025-03-09T12:13:28Z

LLVM Buildbot has detected a new failure on builder amdgpu-offload-ubuntu-22-cmake-build-only running on rocm-docker-ubu-22 while building llvm at step 2 "update-annotated-scripts".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/203/builds/4047

Here is the relevant piece of the build log for the reference

Step 2 (update-annotated-scripts) failure: update (failure)
git version 2.34.1
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com

llvm-ci · 2025-03-09T12:14:04Z

LLVM Buildbot has detected a new failure on builder amdgpu-offload-rhel-8-cmake-build-only running on rocm-docker-rhel-8 while building llvm at step 2 "update-annotated-scripts".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/204/builds/2859

Here is the relevant piece of the build log for the reference

Step 2 (update-annotated-scripts) failure: update (failure)
git version 2.43.5
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com

llvm-ci · 2025-03-09T12:15:16Z

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 2 "update-annotated-scripts".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/15111

Here is the relevant piece of the build log for the reference

Step 2 (update-annotated-scripts) failure: update (failure)
git version 2.34.1
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com
fatal: unable to access 'https://github.com/llvm/llvm-zorg.git/': Could not resolve host: github.com

phoebewang requested review from RKSimon and KanRobert March 9, 2025 08:31

llvmbot added the backend:X86 label Mar 9, 2025

phoebewang force-pushed the APX branch from ffb0483 to b1b371a Compare March 9, 2025 08:34

[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y

e842c6e

Though it only happens in v1i1 when we generate llvm.masked.load/store intrinsics for APX cload/cstore. https://godbolt.org/z/vjsrofsqx

phoebewang force-pushed the APX branch from b1b371a to e842c6e Compare March 9, 2025 08:35

RKSimon reviewed Mar 9, 2025

View reviewed changes

RKSimon approved these changes Mar 9, 2025

View reviewed changes

phoebewang merged commit 107aa6a into llvm:main Mar 9, 2025
11 checks passed

phoebewang deleted the APX branch March 9, 2025 12:10

shiltian mentioned this pull request Mar 10, 2025

[AMDGPU] Fix test failures when expensive checks are enabled #130644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y #130475

[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y #130475

phoebewang commented Mar 9, 2025

llvmbot commented Mar 9, 2025

RKSimon left a comment

phoebewang commented Mar 9, 2025

RKSimon left a comment

llvm-ci commented Mar 9, 2025

llvm-ci commented Mar 9, 2025

llvm-ci commented Mar 9, 2025

llvm-ci commented Mar 9, 2025

[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y #130475

[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y #130475

Conversation

phoebewang commented Mar 9, 2025

llvmbot commented Mar 9, 2025

RKSimon left a comment

Choose a reason for hiding this comment

phoebewang commented Mar 9, 2025

RKSimon left a comment

Choose a reason for hiding this comment

llvm-ci commented Mar 9, 2025

llvm-ci commented Mar 9, 2025

llvm-ci commented Mar 9, 2025

llvm-ci commented Mar 9, 2025