Skip to content

[X86] Duplicate XMM/YMM constant data #70947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RKSimon opened this issue Nov 1, 2023 · 2 comments
Closed

[X86] Duplicate XMM/YMM constant data #70947

RKSimon opened this issue Nov 1, 2023 · 2 comments

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Nov 1, 2023

When we are working with different vector widths but the same constant data, we often see cases where the constants are repeated at each vector width:

void fabs_cvt(const double *src, int *dst) {
    for(int i = 0; i != 6; ++i) {
        *dst++ = __builtin_fabs(*src++);
    }
}
define void @fabs_cvt(ptr nocapture noundef readonly %src, ptr nocapture noundef writeonly %dst) {
entry:
  %incdec.ptr.3 = getelementptr inbounds double, ptr %src, i64 4
  %incdec.ptr1.3 = getelementptr inbounds i32, ptr %dst, i64 4
  %0 = load <4 x double>, ptr %src, align 8
  %1 = tail call <4 x double> @llvm.fabs.v4f64(<4 x double> %0)
  %2 = fptosi <4 x double> %1 to <4 x i32>
  store <4 x i32> %2, ptr %dst, align 4
  %3 = load <2 x double>, ptr %incdec.ptr.3, align 8
  %4 = tail call <2 x double> @llvm.fabs.v2f64(<2 x double> %3)
  %5 = fptosi <2 x double> %4 to <2 x i32>
  store <2 x i32> %5, ptr %incdec.ptr1.3, align 4
  ret void
}
declare <4 x double> @llvm.fabs.v4f64(<4 x double>)
declare <2 x double> @llvm.fabs.v2f64(<2 x double>)

AVX1:

.LCPI0_0:
  .quad 0x7fffffffffffffff # double NaN
  .quad 0x7fffffffffffffff # double NaN
  .quad 0x7fffffffffffffff # double NaN
  .quad 0x7fffffffffffffff # double NaN
.LCPI0_1:
  .quad 0x7fffffffffffffff # double NaN
  .quad 0x7fffffffffffffff # double NaN
fabs_cvt(double const*, int*): # @fabs_cvt(double const*, int*)
  vmovupd (%rdi), %ymm0
  vandpd .LCPI0_0(%rip), %ymm0, %ymm0
  vcvttpd2dq %ymm0, %xmm0
  vmovupd %xmm0, (%rsi)
  vmovupd 32(%rdi), %xmm0
  vandpd .LCPI0_1(%rip), %xmm0, %xmm0
  vcvttpd2dq %xmm0, %xmm0
  vmovlpd %xmm0, 16(%rsi)
  retq

AVX2:

.LCPI0_0:
  .quad 0x7fffffffffffffff # double NaN
.LCPI0_1:
  .quad 0x7fffffffffffffff # double NaN
  .quad 0x7fffffffffffffff # double NaN
fabs_cvt(double const*, int*): # @fabs_cvt(double const*, int*)
  vbroadcastsd .LCPI0_0(%rip), %ymm0 # ymm0 = [NaN,NaN,NaN,NaN]
  vandpd (%rdi), %ymm0, %ymm0
  vcvttpd2dq %ymm0, %xmm0
  vmovupd %xmm0, (%rsi)
  vmovupd 32(%rdi), %xmm0
  vandpd .LCPI0_1(%rip), %xmm0, %xmm0
  vcvttpd2dq %xmm0, %xmm0
  vmovlpd %xmm0, 16(%rsi)
  vzeroupper
  retq
@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2023

@llvm/issue-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

When we are working with different vector widths but the same constant data, we often see cases where the constants are repeated at each vector width: ```c void fabs_cvt(const double *src, int *dst) { for(int i = 0; i != 6; ++i) { *dst++ = __builtin_fabs(*src++); } } ``` ```ll define void @fabs_cvt(ptr nocapture noundef readonly %src, ptr nocapture noundef writeonly %dst) { entry: %incdec.ptr.3 = getelementptr inbounds double, ptr %src, i64 4 %incdec.ptr1.3 = getelementptr inbounds i32, ptr %dst, i64 4 %0 = load <4 x double>, ptr %src, align 8 %1 = tail call <4 x double> @llvm.fabs.v4f64(<4 x double> %0) %2 = fptosi <4 x double> %1 to <4 x i32> store <4 x i32> %2, ptr %dst, align 4 %3 = load <2 x double>, ptr %incdec.ptr.3, align 8 %4 = tail call <2 x double> @llvm.fabs.v2f64(<2 x double> %3) %5 = fptosi <2 x double> %4 to <2 x i32> store <2 x i32> %5, ptr %incdec.ptr1.3, align 4 ret void } declare <4 x double> @llvm.fabs.v4f64(<4 x double>) declare <2 x double> @llvm.fabs.v2f64(<2 x double>) ``` AVX1: ```s .LCPI0_0: .quad 0x7fffffffffffffff # double NaN .quad 0x7fffffffffffffff # double NaN .quad 0x7fffffffffffffff # double NaN .quad 0x7fffffffffffffff # double NaN .LCPI0_1: .quad 0x7fffffffffffffff # double NaN .quad 0x7fffffffffffffff # double NaN fabs_cvt(double const*, int*): # @fabs_cvt(double const*, int*) vmovupd (%rdi), %ymm0 vandpd .LCPI0_0(%rip), %ymm0, %ymm0 vcvttpd2dq %ymm0, %xmm0 vmovupd %xmm0, (%rsi) vmovupd 32(%rdi), %xmm0 vandpd .LCPI0_1(%rip), %xmm0, %xmm0 vcvttpd2dq %xmm0, %xmm0 vmovlpd %xmm0, 16(%rsi) retq ``` AVX2: ``` .LCPI0_0: .quad 0x7fffffffffffffff # double NaN .LCPI0_1: .quad 0x7fffffffffffffff # double NaN .quad 0x7fffffffffffffff # double NaN fabs_cvt(double const*, int*): # @fabs_cvt(double const*, int*) vbroadcastsd .LCPI0_0(%rip), %ymm0 # ymm0 = [NaN,NaN,NaN,NaN] vandpd (%rdi), %ymm0, %ymm0 vcvttpd2dq %ymm0, %xmm0 vmovupd %xmm0, (%rsi) vmovupd 32(%rdi), %xmm0 vandpd .LCPI0_1(%rip), %xmm0, %xmm0 vcvttpd2dq %xmm0, %xmm0 vmovlpd %xmm0, 16(%rsi) vzeroupper retq ```

RKSimon added a commit that referenced this issue Nov 13, 2023
…smaller vector load of the same constant

Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector

Fixes AVX2 case for Issue #70947
RKSimon added a commit that referenced this issue Nov 17, 2023
…maller vector constant data

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078).
sr-tream pushed a commit to sr-tream/llvm-project that referenced this issue Nov 20, 2023
…maller vector constant data

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both llvm#70947 and better X86FixupVectorConstantsPass usage for llvm#71078).
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Nov 20, 2023
…smaller vector load of the same constant

Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector

Fixes AVX2 case for Issue llvm#70947
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Nov 20, 2023
…maller vector constant data

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both llvm#70947 and better X86FixupVectorConstantsPass usage for llvm#71078).
RKSimon added a commit that referenced this issue Nov 20, 2023
…maller vector constant data (REAPPLIED)

If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry.

Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads.

This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078).

Reapplied with fix to ensure we don't 'flip-flop' between multiple matching constants - only perform the fold if the new constant pool entry is larger than the current entry.
@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 1, 2023

Resolving - combineLoad now handles this

@RKSimon RKSimon closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants