Skip to content

Inverted movemasks result in redundant logic #89533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Validark opened this issue Apr 21, 2024 · 2 comments · Fixed by #90050
Closed

Inverted movemasks result in redundant logic #89533

Validark opened this issue Apr 21, 2024 · 2 comments · Fixed by #90050
Assignees
Labels

Comments

@Validark
Copy link

I wrote this tokenize function: (https://zig.godbolt.org/z/oYosTb1zK)

export fn tokenize(source: [*]const u8) extern struct { start: [*]const u8, end: [*]const u8 } {
    var cur = source[0..];
    const start = cur;

    while (true) {
        const V = @Vector(@bitSizeOf(usize), u8);
        const vec: V = cur[0..@sizeOf(V)].*;

       const identifier_bitstring = ~(@as(usize, @bitCast(vec == @as(V, @splat('_')))));

        cur = cur[@ctz(identifier_bitstring)..];
        if (identifier_bitstring != 0) break;
    }

    // our token span is start..end
    const end = cur;
    return .{ .start = start, .end = end };
}

Next I made the following change:

-       const identifier_bitstring = ~(@as(usize, @bitCast(vec == @as(V, @splat('_')))));
+       const identifier_bitstring =  (@as(usize, @bitCast(vec != @as(V, @splat('_')))));

Unfortunately, this results in different emit.

First version (Zen 4):

.LCPI0_1:
        .byte   95
tokenize1:
        vpbroadcastb    zmm0, byte ptr [rip + .LCPI0_1]
        mov     rax, rdi
        mov     rdx, rdi
.LBB0_1:
        vmovdqu64       zmm1, zmmword ptr [rdx]
        mov     rcx, rdx
        vpcmpneqb       k1, zmm1, zmm0
        vpcmpeqb        k0, zmm1, zmm0 ; do the same work, but this time not inverted, so we can use jb rather than je?
        kmovq   rdx, k1
        tzcnt   rdx, rdx
        add     rdx, rcx
        kortestq        k0, k0
        jb      .LBB0_1
        vzeroupper
        ret

Second version (Zen 4):

LCPI1_1:
        .byte   95
tokenize2:
        vpbroadcastb    zmm0, byte ptr [rip + .LCPI1_1]
        mov     rax, rdi
        mov     rdx, rdi
.LBB1_1:
        vpcmpneqb       k0, zmm0, zmmword ptr [rdx]
        mov     rcx, rdx
        kmovq   rdx, k0
        tzcnt   rdx, rdx
        add     rdx, rcx
        kortestq        k0, k0
        je      .LBB1_1
        vzeroupper
        ret

First version (Zen 3):

.LCPI0_1:
        .byte   95
tokenize1:
        vpbroadcastb    ymm0, byte ptr [rip + .LCPI0_1]
        mov     rax, rdi
        mov     rdx, rdi
.LBB0_1:
        mov     rcx, rdx
        vpcmpeqb        ymm2, ymm0, ymmword ptr [rcx + 32]
        vpcmpeqb        ymm1, ymm0, ymmword ptr [rdx]
        vpmovmskb       esi, ymm2
        vpmovmskb       edx, ymm1
        shl     rsi, 32
        or      rsi, rdx
        mov     rdx, rsi ; preserve non-inverted rsi so we can cmp against -1 later??
        not     rdx
        tzcnt   rdx, rdx
        add     rdx, rcx
        cmp     rsi, -1
        je      .LBB0_1
        vzeroupper
        ret

Second version (Zen 3):

LCPI1_1:
        .byte   95
tokenize2:
        vpbroadcastb    ymm0, byte ptr [rip + .LCPI1_1]
        mov     rax, rdi
        mov     rdx, rdi
.LBB1_1:
        mov     rcx, rdx
        vpcmpeqb        ymm2, ymm0, ymmword ptr [rcx + 32]
        vpcmpeqb        ymm1, ymm0, ymmword ptr [rdx]
        vpmovmskb       esi, ymm2
        vpmovmskb       edx, ymm1
        not     esi
        not     edx ; do 2 not's before combining these bitstrings instead of just doing 1??
        shl     rsi, 32
        or      rsi, rdx
        tzcnt   rdx, rsi
        add     rdx, rcx
        test    rsi, rsi ; use inverted value instead of preserving the non-inverted value and doing cmp -1??
        je      .LBB1_1
        vzeroupper
        ret

https://zig.godbolt.org/z/oYosTb1zK

@RKSimon RKSimon added missed-optimization llvm:SelectionDAG SelectionDAGISel as well and removed new issue labels Apr 21, 2024
@RKSimon RKSimon self-assigned this Apr 21, 2024
@RKSimon
Copy link
Collaborator

RKSimon commented Apr 21, 2024

define i64 @PR89533(<64 x i8> %a0) {
  %cmp = icmp ne <64 x i8> %a0, <i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95, i8 95>
  %mask = bitcast <64 x i1> %cmp to i64
  %tz = tail call i64 @llvm.cttz.i64(i64 %mask, i1 false)
  ret i64 %tz
}
Optimized vector-legalized selection DAG: %bb.0 'PR89533:'
SelectionDAG has 24 nodes:
  t0: ch,glue = EntryToken
  t29: v32i8 = BUILD_VECTOR Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>, Constant:i8<95>
                t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0
              t36: v32i8 = X86ISD::PCMPEQ t2, t29
            t39: i32 = X86ISD::MOVMSK t36
          t40: i32 = xor t39, Constant:i32<-1>
        t23: i64 = zero_extend t40
                  t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
                t32: v32i8 = X86ISD::PCMPEQ t4, t29
              t41: i32 = X86ISD::MOVMSK t32
            t42: i32 = xor t41, Constant:i32<-1>
          t24: i64 = any_extend t42
        t26: i64 = shl t24, Constant:i8<32>
      t27: i64 = or t23, t26
    t11: i64 = cttz t27
  t14: ch,glue = CopyToReg t0, Register:i64 $rax, t11
  t15: ch = X86ISD::RET_GLUE t14, TargetConstant:i32<0>, Register:i64 $rax, t14:1

We might be able to handle this in foldLogicTreeOfShifts

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 24, 2024

Or even simpler:

define i64 @notpair(i32 %a0, i32 %a1) {
  %n0 = xor i32 %a0, -1
  %n1 = xor i32 %a1, -1
  %x0 = zext i32 %n0 to i64
  %x1 = zext i32 %n1 to i64
  %hi = shl i64 %x1, 32
  %r = or i64 %hi, %x0
  ret i64 %r
}

define i64 @not(i32 %a0, i32 %a1) {
  %x0 = zext i32 %a0 to i64
  %x1 = zext i32 %a1 to i64
  %hi = shl i64 %x1, 32
  %r = or i64 %hi, %x0
  %n = xor i64 %r, -1
  ret i64 %n
}
notpair: # @notpair
  notl %edi
  notl %esi
  shlq $32, %rsi
  leaq (%rsi,%rdi), %rax
  retq
not: # @not
  movl %edi, %eax
  shlq $32, %rsi
  orq %rsi, %rax
  notq %rax
  retq

RKSimon added a commit to RKSimon/llvm-project that referenced this issue Apr 25, 2024
…d_pair(x,y)) style patterns

(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on llvm#89533 initially.

Fixes llvm#89533
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Apr 25, 2024
…d_pair(x,y)) style patterns

(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on llvm#89533 initially.

Fixes llvm#89533
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Apr 26, 2024
…d_pair(x,y)) style patterns

(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on llvm#89533 initially.

Fixes llvm#89533
RKSimon added a commit that referenced this issue Apr 26, 2024
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Apr 26, 2024
…d_pair(x,y)) style patterns

(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on llvm#89533 initially.

Fixes llvm#89533
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Apr 26, 2024
…d_pair(x,y)) style patterns

(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on llvm#89533 initially.

Fixes llvm#89533
RKSimon added a commit that referenced this issue Apr 26, 2024
…d_pair(x,y)) style patterns (#90050)

(Sorry, not an actual build_pair node just a similar pattern).

For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.

We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on #89533 initially.

Fixes #89533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants