Skip to content

[RISCV] llvm::is_contained is suboptimal compared to X86/AArch64 #134272

@wangpc-pp

Description

@wangpc-pp

I found this in #134057.
The code is https://godbolt.org/z/6Tbqac9jW and I paste it here:

#include <initializer_list>

/// Returns true iff \p Element exists in \p Set. This overload takes \p Set as
/// an initializer list and is `constexpr`-friendly.
template <typename T, typename E>
constexpr bool is_contained(std::initializer_list<T> Set, const E &Element) {
  // TODO: Use std::find when we switch to C++20.
  for (const T &V : Set)
    if (V == Element)
      return true;
  return false;
}

bool bar2(unsigned v){
    return is_contained({
        1, 3}, v);
}

bool bar4(unsigned v){
    return is_contained({
        1, 3, 5, 6}, v);
}

bool bar8(unsigned v){
    return is_contained({
        1, 3, 5, 6, 7, 8, 9, 10}, v);
}

bool bar16(unsigned v){
    return is_contained({
        1, 3, 5, 6, 7, 8, 9, 10,
        5, 4, 8, 1, 3, 1, 2, 4}, v);
}

For RISC-V, it generates suboptimal instruction sequences, especially when the data size is small.
For example, when the data size is 2:
X86

bar2(unsigned int):
        dec     edi
        test    edi, -3
        sete    al
        ret

AArch64

bar2(unsigned int):
        cmp     w0, #1
        ccmp    w0, #3, #4, ne
        cset    w0, eq
        ret

The results are really simple.
But on RISC-V, it generates:

bar2(unsigned int):
        addi    sp, sp, -16
        li      a1, 1
        li      a3, 3
        li      a2, 4
        sw      a1, 8(sp)
        sw      a3, 12(sp)
        addi    a1, sp, 8
.LBB0_1:
        lw      a3, 0(a1)
        beq     a3, a0, .LBB0_3
        mv      a4, a2
        addi    a2, a2, -4
        addi    a1, a1, 4
        bnez    a4, .LBB0_1
.LBB0_3:
        xor     a0, a0, a3
        seqz    a0, a0
        addi    sp, sp, 16
        ret

The count of instructions is enormous!
I compared the compiler log, there are two points that may cause this divergence:

  1. The first point is when doing SLP. X86/AArch64 convert the array to vectors while RISC-V doesn't.
  2. The second point is SROA. The SROA can't see the offset because of PHI instruction and then these lifetime intrinsics can't be removed:
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable vscale_range(2,1024)
define dso_local noundef zeroext i1 @_Z3barj(i32 noundef signext %v) local_unnamed_addr #0 {
entry:
  %ref.tmp = alloca [2 x i32], align 4
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %ref.tmp) #2
  store i32 1, ptr %ref.tmp, align 4, !tbaa !9
  %arrayinit.element = getelementptr inbounds nuw i8, ptr %ref.tmp, i64 4
  store i32 3, ptr %arrayinit.element, align 4, !tbaa !9
  br label %for.body.i

for.body.i:                                       ; preds = %for.body.i, %entry
  %__begin0.013.i.idx = phi i64 [ 0, %entry ], [ %__begin0.013.i.add, %for.body.i ]
  %__begin0.013.i.ptr = getelementptr inbounds nuw i8, ptr %ref.tmp, i64 %__begin0.013.i.idx
  %0 = load i32, ptr %__begin0.013.i.ptr, align 4, !tbaa !9
  %cmp2.not.i = icmp eq i32 %0, %v
  %__begin0.013.i.add = add nuw nsw i64 %__begin0.013.i.idx, 4
  %cmp.not.not.i = icmp eq i64 %__begin0.013.i.add, 8
  %or.cond = select i1 %cmp2.not.i, i1 true, i1 %cmp.not.not.i
  br i1 %or.cond, label %_Z12is_containedIijEbSt16initializer_listIT_ERKT0_.exit, label %for.body.i

_Z12is_containedIijEbSt16initializer_listIT_ERKT0_.exit: ; preds = %for.body.i
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %ref.tmp) #2
  ret i1 %cmp2.not.i
}
SROA function: _Z3barj
SROA alloca:   %ref.tmp = alloca [2 x i32], align 4
  Rewriting FCA loads and stores...
Can't analyze slices for alloca:   %ref.tmp = alloca [2 x i32], align 4
  A pointer to this alloca escaped by:
    %0 = load i32, ptr %__begin0.013.i.ptr, align 4, !tbaa !9

I think this code pattern is really common in C++ code, but currently the RISC-V compiler can't compile it to the best binary code. I don't know which part I should focus on:

  1. Let SLP kick in earlier?
  2. Fix the SROA via SCEV? (I don't even know if it is by-design...)

I will appreciate it if someone can give me some suggestions, or, fix this issue!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions