Skip to content

s390x: missed optimization to vec_unpackl #129576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
folkertdev opened this issue Mar 3, 2025 · 0 comments
Closed

s390x: missed optimization to vec_unpackl #129576

folkertdev opened this issue Mar 3, 2025 · 0 comments

Comments

@folkertdev
Copy link

https://godbolt.org/z/Wxc8x8Tax

This LLVM IR

define range(i32 -32768, 32768) <4 x i32> @unpackh(<8 x i16> %a) unnamed_addr {
start:
  %0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %1 = sext <4 x i16> %0 to <4 x i32>
  ret <4 x i32> %1
}

define range(i32 -32768, 32768) <4 x i32> @unpackl(<8 x i16> %a) unnamed_addr {
start:
  %0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %1 = sext <4 x i16> %0 to <4 x i32>
  ret <4 x i32> %1
}

optimizes to

unpackh:
        vuphh   %v24, %v24
        br      %r14

unpackl:
        vmrlg   %v0, %v24, %v24
        vuphh   %v24, %v0
        br      %r14

this is already very good, and optimal for unpackh, but not for unpackl:

https://godbolt.org/z/xfobea8ee

vector signed int foo(vector signed short a) { 
    return vec_unpackh(a);
}

vector signed int unpack(vector signed short a) { 
    return vec_unpackl(a);
}

optimizes to

foo:
        vuphh   %v24, %v24
        br      %r14

unpack:
        vuplhw  %v24, %v24
        br      %r14

So it looks like a final step is missed where vmrlg + vuphh can be rewritten to just vuplhw (and similarly for the other vector types). Or maybe the shuffle vector should be recognized directly. Anyway, this seems achievable.

This came up in the context of the rust standard library

cc @uweigand

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Mar 15, 2025
Generate more efficient code for zero or sign extensions where
the source is a subvector generated via SHUFFLE_VECTOR.

Specifically, recognize patterns corresponding to (series of)
VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO
DOUBLEWORD instruction.

As a special case, also handle zero or sign extensions of a
vector element to i128.

Fixes: llvm/llvm-project#129576
Fixes: llvm/llvm-project#129899
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants