-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[X86] Duplicate XMM/YMM constant data #70947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
@llvm/issue-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon)
When we are working with different vector widths but the same constant data, we often see cases where the constants are repeated at each vector width:
```c
void fabs_cvt(const double *src, int *dst) {
for(int i = 0; i != 6; ++i) {
*dst++ = __builtin_fabs(*src++);
}
}
```
```ll
define void @fabs_cvt(ptr nocapture noundef readonly %src, ptr nocapture noundef writeonly %dst) {
entry:
%incdec.ptr.3 = getelementptr inbounds double, ptr %src, i64 4
%incdec.ptr1.3 = getelementptr inbounds i32, ptr %dst, i64 4
%0 = load <4 x double>, ptr %src, align 8
%1 = tail call <4 x double> @llvm.fabs.v4f64(<4 x double> %0)
%2 = fptosi <4 x double> %1 to <4 x i32>
store <4 x i32> %2, ptr %dst, align 4
%3 = load <2 x double>, ptr %incdec.ptr.3, align 8
%4 = tail call <2 x double> @llvm.fabs.v2f64(<2 x double> %3)
%5 = fptosi <2 x double> %4 to <2 x i32>
store <2 x i32> %5, ptr %incdec.ptr1.3, align 4
ret void
}
declare <4 x double> @llvm.fabs.v4f64(<4 x double>)
declare <2 x double> @llvm.fabs.v2f64(<2 x double>)
```
AVX1:
```s
.LCPI0_0:
.quad 0x7fffffffffffffff # double NaN
.quad 0x7fffffffffffffff # double NaN
.quad 0x7fffffffffffffff # double NaN
.quad 0x7fffffffffffffff # double NaN
.LCPI0_1:
.quad 0x7fffffffffffffff # double NaN
.quad 0x7fffffffffffffff # double NaN
fabs_cvt(double const*, int*): # @fabs_cvt(double const*, int*)
vmovupd (%rdi), %ymm0
vandpd .LCPI0_0(%rip), %ymm0, %ymm0
vcvttpd2dq %ymm0, %xmm0
vmovupd %xmm0, (%rsi)
vmovupd 32(%rdi), %xmm0
vandpd .LCPI0_1(%rip), %xmm0, %xmm0
vcvttpd2dq %xmm0, %xmm0
vmovlpd %xmm0, 16(%rsi)
retq
```
AVX2:
```
.LCPI0_0:
.quad 0x7fffffffffffffff # double NaN
.LCPI0_1:
.quad 0x7fffffffffffffff # double NaN
.quad 0x7fffffffffffffff # double NaN
fabs_cvt(double const*, int*): # @fabs_cvt(double const*, int*)
vbroadcastsd .LCPI0_0(%rip), %ymm0 # ymm0 = [NaN,NaN,NaN,NaN]
vandpd (%rdi), %ymm0, %ymm0
vcvttpd2dq %ymm0, %xmm0
vmovupd %xmm0, (%rsi)
vmovupd 32(%rdi), %xmm0
vandpd .LCPI0_1(%rip), %xmm0, %xmm0
vcvttpd2dq %xmm0, %xmm0
vmovlpd %xmm0, 16(%rsi)
vzeroupper
retq
```
|
RKSimon
added a commit
that referenced
this issue
Nov 8, 2023
RKSimon
added a commit
that referenced
this issue
Nov 13, 2023
…smaller vector load of the same constant Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector Fixes AVX2 case for Issue #70947
RKSimon
added a commit
that referenced
this issue
Nov 17, 2023
…maller vector constant data If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry. Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads. This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078).
sr-tream
pushed a commit
to sr-tream/llvm-project
that referenced
this issue
Nov 20, 2023
…maller vector constant data If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry. Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads. This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both llvm#70947 and better X86FixupVectorConstantsPass usage for llvm#71078).
zahiraam
pushed a commit
to zahiraam/llvm-project
that referenced
this issue
Nov 20, 2023
…smaller vector load of the same constant Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector Fixes AVX2 case for Issue llvm#70947
zahiraam
pushed a commit
to zahiraam/llvm-project
that referenced
this issue
Nov 20, 2023
…maller vector constant data If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry. Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads. This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both llvm#70947 and better X86FixupVectorConstantsPass usage for llvm#71078).
RKSimon
added a commit
that referenced
this issue
Nov 20, 2023
…maller vector constant data (REAPPLIED) If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry. Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads. This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078). Reapplied with fix to ensure we don't 'flip-flop' between multiple matching constants - only perform the fold if the new constant pool entry is larger than the current entry.
Resolving - combineLoad now handles this |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When we are working with different vector widths but the same constant data, we often see cases where the constants are repeated at each vector width:
AVX1:
AVX2:
The text was updated successfully, but these errors were encountered: