Skip to content

16 byte Aligned load generated to load 32 byte wide AVX register from stack memory #98044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Nirhar opened this issue Jul 8, 2024 · 3 comments · Fixed by #98176
Closed

16 byte Aligned load generated to load 32 byte wide AVX register from stack memory #98044

Nirhar opened this issue Jul 8, 2024 · 3 comments · Fixed by #98176
Labels
llvm:SelectionDAG SelectionDAGISel as well

Comments

@Nirhar
Copy link
Contributor

Nirhar commented Jul 8, 2024

Problematic IR:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
target triple = "x86_64-unknown-linux-gnu"

define i32 @foo(i32 %arg1) #0 {
bci_0:
  %a = extractelement <32 x i16> zeroinitializer, i32 %arg1
  %b = zext i16 %a to i32
  ret i32 %b
}

attributes #0 = { "no-realign-stack" "target-cpu"="znver2" }

generates:

foo:                                    # @foo
        vxorps  %xmm0, %xmm0, %xmm0
        andl    $31, %edi
        vmovaps %ymm0, -40(%rsp)
        vmovaps %ymm0, -72(%rsp)
        movzwl  -72(%rsp,%rdi,2), %eax
        vzeroupper
        retq

when run with llc. This leads to the generation of a vmovdqa instruction, which can lead to a General Protection Fault. Here is the link to the godbolt demo: https://godbolt.org/z/33h7YGc5K

The problem seems to be in instruction selection again, similar to #77730

@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2024

@llvm/issue-subscribers-backend-x86

Author: Manish Kausik H (Nirhar)

Problematic IR: ``` define i8 @foo(i64 %elemIdx, <32 x i1> %arr) { entry: br label %loop

loop: ; preds = %loop, %entry
%arr.i32 = bitcast <32 x i1> %arr to i32
%cmp = icmp ugt i32 %arr.i32, 0
br i1 %cmp, label %exit, label %loop

exit: ; preds = %loop
%elem = extractelement <32 x i8> zeroinitializer, i64 %elemIdx
ret i8 %elem
}

generates:

.LCPI0_0:
.quad 72340172838076673 # 0x101010101010101
foo: # @foo
vpbroadcastq .LCPI0_0(%rip), %ymm1 # ymm1 = [72340172838076673,72340172838076673,72340172838076673,72340172838076673]
.LBB0_1: # %loop
vptest %ymm1, %ymm0
je .LBB0_1
pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp
subq $64, %rsp
vpxor %xmm0, %xmm0, %xmm0
vmovdqa %ymm0, (%rsp)
andl $31, %edi
movzbl (%rsp,%rdi), %eax
movq %rbp, %rsp
popq %rbp
vzeroupper
retq

when run with `llc -mattr=avx2`. This leads to the generation of a `vmovdqa` instruction, which can lead to a General Protection Fault. Here is the link to the godbolt demo: https://godbolt.org/z/jaeE4vKr1

The problem seems to be in instruction selection again, similar to #<!-- -->77730 
</details>

@phoebewang
Copy link
Contributor

The andq $-32, %rsp has realigned the rsp to 32 byte. There should no GP fault for vmovdqa.

@Nirhar
Copy link
Contributor Author

Nirhar commented Jul 9, 2024

@phoebewang sorry, I think I made a mistake during the IR reduction. I have updated the Bug description with the problematic IR. The problem is similar to #77730, except that the vector is now of <32 x i16> instead of <32 x i8>

Nirhar pushed a commit to Nirhar/llvm-project that referenced this issue Jul 9, 2024
This patch ports the commit a6614ec to
SelectionDAG TypeLegalization.

Fixes llvm#98044
Nirhar pushed a commit to Nirhar/llvm-project that referenced this issue Jul 10, 2024
This patch ports the commit a6614ec to
SelectionDAG TypeLegalization.

Fixes llvm#98044
Nirhar pushed a commit to Nirhar/llvm-project that referenced this issue Jul 12, 2024
This patch sets the alignment of store instructions generated during type
legalization of extractelement instruction, after considering stack alignment.

Fixes llvm#98044
Nirhar pushed a commit to Nirhar/llvm-project that referenced this issue Jul 26, 2024
This patch sets the alignment of store instructions generated during type
legalization of extractelement instruction, after considering stack alignment.

Fixes llvm#98044
Nirhar pushed a commit to Nirhar/llvm-project that referenced this issue Jul 29, 2024
…ype when Stack is non-realignable

This patch sets the alignment of store instructions generated during type
legalization of extractelement instruction, after considering stack
alignment, if the stack is not realignable.

Fixes llvm#98044
@EugeneZelenko EugeneZelenko added llvm:SelectionDAG SelectionDAGISel as well and removed backend:X86 labels Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
4 participants