Skip to content

[X86] Suboptimal lowering of short vectors equality check: could use scalar types instead #53419

Closed
@xortator

Description

@xortator

Motivating case: https://godbolt.org/z/rbE3TzqdP

The original test

define i1 @vector_version(i8* align 1 %arg, i8* align 1 %arg1, i32 %arg2) {
bb:
  %ptr1 = bitcast i8* %arg1 to <4 x i8>*
  %ptr2 = bitcast i8* %arg to <4 x i8>*
  %lhs = load <4 x i8>, <4 x i8>* %ptr1, align 1
  %rhs = load <4 x i8>, <4 x i8>* %ptr2, align 1
  %any_ne = icmp ne <4 x i8> %lhs, %rhs
  %any_ne_scalar = bitcast <4 x i1> %any_ne to i4
  %all_eq = icmp eq i4 %any_ne_scalar, 0
  ret i1 %all_eq
}

reads two short vector values and effectively checks that they are equal. Codegen generates vector code from it:

vector_version:                         # @vector_version
        vpmovzxbd       (%rsi), %xmm0           # xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
        vpmovzxbd       (%rdi), %xmm1           # xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
        vpsubd  %xmm1, %xmm0, %xmm0
        vptest  %xmm0, %xmm0
        sete    %al
        retq

This code is semantically equivalent to its scalar counterpart

define i1 @scalar_version(i8* align 1 %arg, i8* align 1 %arg1, i32 %arg2) {
bb:
  %ptr1 = bitcast i8* %arg1 to i32*
  %ptr2 = bitcast i8* %arg to i32*
  %lhs = load i32, i32* %ptr1, align 1
  %rhs = load i32, i32* %ptr2, align 1
  %all_eq = icmp eq i32 %lhs, %rhs
  ret i1 %all_eq
}

which produces neater asm:

scalar_version:                         # @scalar_version
        movl    (%rsi), %eax
        cmpl    (%rdi), %eax
        sete    %al
        retq

Unfortunately we cannot use RM vector sub here as stated in #53416, but it looks like we could give up using vector registers at all.

Not sure what is the proper place for this - codegen or instcombine.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions