Skip to content

Some shufflevectors should emit as a single shld/shrd instruction #145276

@Validark

Description

@Validark

Zig Godbolt LLVM Godbolt

This code:

define dso_local <8 x i8> @foo(<8 x i8> %0, <8 x i8> %1) local_unnamed_addr {
Entry:
  %2 = shufflevector <8 x i8> %0, <8 x i8> %1, <8 x i32> <i32 15, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6>
  ret <8 x i8> %2
}

Compiles to the following on znver5:

.LCPI0_0:
        .byte   15                              # 0xf
        .byte   0                               # 0x0
        .byte   2                               # 0x2
        .byte   4                               # 0x4
        .byte   6                               # 0x6
        .byte   8                               # 0x8
        .byte   10                              # 0xa
        .byte   12                              # 0xc
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
        .zero   1
foo:                                    # @foo
        vpunpcklbw      xmm0, xmm0, xmm1        # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
        vpshufb xmm0, xmm0, xmmword ptr [rip + .LCPI0_0] # xmm0 = xmm0[15,0,2,4,6,8,10,12,u,u,u,u,u,u,u,u]
        ret

Zig trunk for some reason ends up with the following:

.LCPI0_0:
        .byte   23
        .byte   0
        .byte   1
        .byte   2
        .byte   3
        .byte   4
        .byte   5
        .byte   6
foo:
        vpbroadcastq    xmm2, qword ptr [rip + .LCPI0_0]
        vpermt2b        xmm0, xmm2, xmm1
        ret

It should be:

foo:
        vpshldq xmm0, xmm0, xmm1, 8
        ret

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions