Closed
Description
Motivating case: https://godbolt.org/z/rbE3TzqdP
The original test
define i1 @vector_version(i8* align 1 %arg, i8* align 1 %arg1, i32 %arg2) {
bb:
%ptr1 = bitcast i8* %arg1 to <4 x i8>*
%ptr2 = bitcast i8* %arg to <4 x i8>*
%lhs = load <4 x i8>, <4 x i8>* %ptr1, align 1
%rhs = load <4 x i8>, <4 x i8>* %ptr2, align 1
%any_ne = icmp ne <4 x i8> %lhs, %rhs
%any_ne_scalar = bitcast <4 x i1> %any_ne to i4
%all_eq = icmp eq i4 %any_ne_scalar, 0
ret i1 %all_eq
}
reads two short vector values and effectively checks that they are equal. Codegen generates vector code from it:
vector_version: # @vector_version
vpmovzxbd (%rsi), %xmm0 # xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
vpmovzxbd (%rdi), %xmm1 # xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
vpsubd %xmm1, %xmm0, %xmm0
vptest %xmm0, %xmm0
sete %al
retq
This code is semantically equivalent to its scalar counterpart
define i1 @scalar_version(i8* align 1 %arg, i8* align 1 %arg1, i32 %arg2) {
bb:
%ptr1 = bitcast i8* %arg1 to i32*
%ptr2 = bitcast i8* %arg to i32*
%lhs = load i32, i32* %ptr1, align 1
%rhs = load i32, i32* %ptr2, align 1
%all_eq = icmp eq i32 %lhs, %rhs
ret i1 %all_eq
}
which produces neater asm:
scalar_version: # @scalar_version
movl (%rsi), %eax
cmpl (%rdi), %eax
sete %al
retq
Unfortunately we cannot use RM vector sub here as stated in #53416, but it looks like we could give up using vector registers at all.
Not sure what is the proper place for this - codegen or instcombine.