[RISCV] Unprofitable select vectorization/lowering #109466

preames · 2024-09-20T20:10:00Z

This was brought up in discussion on #108419. This is the root cause of the reported regression on leela from spec2017 in the LTO configuration.

We are failing to recognize shifts disguised as selects in at least two contexts:

During vector lowering, as shown in test_vec4. In this case, the vector select is a disguised vector shift of the mask vector extended to the working type. Note that the shift amounts are not constant per lane.
During SLP vectorization, as shown in test_scalarized. If passed to SLP, we produce the form in test_vec4.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -mtriple=riscv64 -mattr=+v,+zba,+zbb < %s | FileCheck %s

define i32 @test_vec4(i16 zeroext %a, i16 zeroext %b, i16 zeroext %c, i16 zeroext %d) {
; CHECK-LABEL: test_vec4:
; CHECK:       # %bb.0:
; CHECK-NEXT:    slli a2, a2, 32
; CHECK-NEXT:    slli a3, a3, 48
; CHECK-NEXT:    or a2, a3, a2
; CHECK-NEXT:    slli a1, a1, 16
; CHECK-NEXT:    or a0, a0, a1
; CHECK-NEXT:    or a0, a0, a2
; CHECK-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
; CHECK-NEXT:    vmv.s.x v8, a0
; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT:    vmseq.vi v0, v8, 1
; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
; CHECK-NEXT:    vmv.v.i v8, 0
; CHECK-NEXT:    lui a0, %hi(.LCPI0_0)
; CHECK-NEXT:    addi a0, a0, %lo(.LCPI0_0)
; CHECK-NEXT:    vle32.v v8, (a0), v0.t
; CHECK-NEXT:    vredor.vs v8, v8, v8
; CHECK-NEXT:    vmv.x.s a0, v8
; CHECK-NEXT:    ret
  %t35 = insertelement <4 x i16> poison, i16 %a, i64 0
  %t36 = insertelement <4 x i16> %t35, i16 %b, i64 1
  %t37 = insertelement <4 x i16> %t36, i16 %c, i64 2
  %t38 = insertelement <4 x i16> %t37, i16 %d, i64 3
  %t39 = icmp eq <4 x i16> %t38, <i16 1, i16 1, i16 1, i16 1>
  %t40 = select <4 x i1> %t39, <4 x i32> <i32 524288, i32 262144, i32 131072, i32 65536>, <4 x i32> zeroinitializer
  %t41 = tail call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %t40)
  ret i32 %t41
}

define i32 @test_scalarized(i16 zeroext %a, i16 zeroext %b, i16 zeroext %c, i16 zeroext %d) {
; CHECK-LABEL: test_scalarized:
; CHECK:       # %bb.0:
; CHECK-NEXT:    addi a0, a0, -1
; CHECK-NEXT:    seqz a0, a0
; CHECK-NEXT:    addi a1, a1, -1
; CHECK-NEXT:    seqz a1, a1
; CHECK-NEXT:    addi a2, a2, -1
; CHECK-NEXT:    seqz a2, a2
; CHECK-NEXT:    addi a3, a3, -1
; CHECK-NEXT:    seqz a3, a3
; CHECK-NEXT:    slli a0, a0, 19
; CHECK-NEXT:    slli a1, a1, 18
; CHECK-NEXT:    slli a2, a2, 17
; CHECK-NEXT:    slli a3, a3, 16
; CHECK-NEXT:    or a0, a0, a1
; CHECK-NEXT:    or a2, a2, a3
; CHECK-NEXT:    or a0, a0, a2
; CHECK-NEXT:    ret
  %t39.i0 = icmp eq i16 %a, 1
  %t39.i1 = icmp eq i16 %b, 1
  %t39.i2 = icmp eq i16 %c, 1
  %t39.i3 = icmp eq i16 %d, 1
  %t40.i0 = select i1 %t39.i0, i32 524288, i32 0
  %t40.i1 = select i1 %t39.i1, i32 262144, i32 0
  %t40.i2 = select i1 %t39.i2, i32 131072, i32 0
  %t40.i3 = select i1 %t39.i3, i32 65536, i32 0
  %or.rdx0 = or i32 %t40.i0, %t40.i1
  %or.rdx1 = or i32 %t40.i2, %t40.i3
  %or.rdx2 = or i32 %or.rdx0, %or.rdx1
  ret i32 %or.rdx2
}

./opt -S example.ll -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-09-20T21:24:43Z

@llvm/issue-subscribers-backend-risc-v

Author: Philip Reames (preames)

This was brought up in discussion on https://github.com//pull/108419. This is the root cause of the reported regression on leela from spec2017 in the LTO configuration.

We are failing to recognize shifts disguised as selects in at least two contexts:

During vector lowering, as shown in test_vec4. In this case, the vector select is a disguised vector shift of the mask vector extended to the working type. Note that the shift amounts are not constant per lane.
During SLP vectorization, as shown in test_scalarized. If passed to SLP, we produce the form in test_vec4.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -mtriple=riscv64 -mattr=+v,+zba,+zbb &lt; %s | FileCheck %s

define i32 @<!-- -->test_vec4(i16 zeroext %a, i16 zeroext %b, i16 zeroext %c, i16 zeroext %d) {
; CHECK-LABEL: test_vec4:
; CHECK:       # %bb.0:
; CHECK-NEXT:    slli a2, a2, 32
; CHECK-NEXT:    slli a3, a3, 48
; CHECK-NEXT:    or a2, a3, a2
; CHECK-NEXT:    slli a1, a1, 16
; CHECK-NEXT:    or a0, a0, a1
; CHECK-NEXT:    or a0, a0, a2
; CHECK-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
; CHECK-NEXT:    vmv.s.x v8, a0
; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT:    vmseq.vi v0, v8, 1
; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
; CHECK-NEXT:    vmv.v.i v8, 0
; CHECK-NEXT:    lui a0, %hi(.LCPI0_0)
; CHECK-NEXT:    addi a0, a0, %lo(.LCPI0_0)
; CHECK-NEXT:    vle32.v v8, (a0), v0.t
; CHECK-NEXT:    vredor.vs v8, v8, v8
; CHECK-NEXT:    vmv.x.s a0, v8
; CHECK-NEXT:    ret
  %t35 = insertelement &lt;4 x i16&gt; poison, i16 %a, i64 0
  %t36 = insertelement &lt;4 x i16&gt; %t35, i16 %b, i64 1
  %t37 = insertelement &lt;4 x i16&gt; %t36, i16 %c, i64 2
  %t38 = insertelement &lt;4 x i16&gt; %t37, i16 %d, i64 3
  %t39 = icmp eq &lt;4 x i16&gt; %t38, &lt;i16 1, i16 1, i16 1, i16 1&gt;
  %t40 = select &lt;4 x i1&gt; %t39, &lt;4 x i32&gt; &lt;i32 524288, i32 262144, i32 131072, i32 65536&gt;, &lt;4 x i32&gt; zeroinitializer
  %t41 = tail call i32 @<!-- -->llvm.vector.reduce.or.v4i32(&lt;4 x i32&gt; %t40)
  ret i32 %t41
}

define i32 @<!-- -->test_scalarized(i16 zeroext %a, i16 zeroext %b, i16 zeroext %c, i16 zeroext %d) {
; CHECK-LABEL: test_scalarized:
; CHECK:       # %bb.0:
; CHECK-NEXT:    addi a0, a0, -1
; CHECK-NEXT:    seqz a0, a0
; CHECK-NEXT:    addi a1, a1, -1
; CHECK-NEXT:    seqz a1, a1
; CHECK-NEXT:    addi a2, a2, -1
; CHECK-NEXT:    seqz a2, a2
; CHECK-NEXT:    addi a3, a3, -1
; CHECK-NEXT:    seqz a3, a3
; CHECK-NEXT:    slli a0, a0, 19
; CHECK-NEXT:    slli a1, a1, 18
; CHECK-NEXT:    slli a2, a2, 17
; CHECK-NEXT:    slli a3, a3, 16
; CHECK-NEXT:    or a0, a0, a1
; CHECK-NEXT:    or a2, a2, a3
; CHECK-NEXT:    or a0, a0, a2
; CHECK-NEXT:    ret
  %t39.i0 = icmp eq i16 %a, 1
  %t39.i1 = icmp eq i16 %b, 1
  %t39.i2 = icmp eq i16 %c, 1
  %t39.i3 = icmp eq i16 %d, 1
  %t40.i0 = select i1 %t39.i0, i32 524288, i32 0
  %t40.i1 = select i1 %t39.i1, i32 262144, i32 0
  %t40.i2 = select i1 %t39.i2, i32 131072, i32 0
  %t40.i3 = select i1 %t39.i3, i32 65536, i32 0
  %or.rdx0 = or i32 %t40.i0, %t40.i1
  %or.rdx1 = or i32 %t40.i2, %t40.i3
  %or.rdx2 = or i32 %or.rdx0, %or.rdx1
  ret i32 %or.rdx2
}

./opt -S example.ll -passes=slp-vectorizer -mtriple=riscv64 -mattr=+v

…ares This follows in the spirit of 7d82c99, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. Fixes llvm#109466

…ares (#109824) This follows in the spirit of 7d82c99, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing #109466. A separate SLP patch will be required to utilize the new API.

Depending on the constant, selects with constant arms can have highly varying cost. This adjusts SLP to use the new API introduced in d288574. Fixes llvm#109466.

github-actions bot added the new issue label Sep 20, 2024

preames mentioned this issue Sep 20, 2024

[RISCV][TTI] Reduce cost of a build_vector pattern #108419

Merged

EugeneZelenko added backend:RISC-V and removed new issue labels Sep 20, 2024

preames mentioned this issue Sep 24, 2024

[TTI][RISCV] Model cost of loading constants arms of selects and compares #109824

Merged

preames mentioned this issue Sep 25, 2024

[SLP] Pass operand info to getCmpSelInstrInfo #109998

Merged

preames closed this as completed in #109998 Sep 25, 2024

preames closed this as completed in 556ec4a Sep 25, 2024

EugeneZelenko added llvm:SLPVectorizer and removed backend:RISC-V labels Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Unprofitable select vectorization/lowering #109466

[RISCV] Unprofitable select vectorization/lowering #109466

preames commented Sep 20, 2024 •

edited

Loading

llvmbot commented Sep 20, 2024

Uh oh!

[RISCV] Unprofitable select vectorization/lowering #109466

[RISCV] Unprofitable select vectorization/lowering #109466

Comments

preames commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llvmbot commented Sep 20, 2024

Uh oh!

preames commented Sep 20, 2024 •

edited

Loading