Skip to content

Generate better code for std::bit_ceil on x86 #60802

@kazutakahirata

Description

@kazutakahirata

Compile:

// clang -std=c++20 -march=skylake -O2
#include <bit>

unsigned bit_ceil(unsigned X) {
  return std::bit_ceil(X);
}

I get:

  %sub.i.i = add i32 %X, -1
  %0 = tail call i32 @llvm.ctlz.i32(i32 %sub.i.i, i1 false), !range !5
  %sub2.i.i = sub nuw nsw i32 32, %0
  %shl.i.i = shl nuw i32 1, %sub2.i.i
  %or.cond.inv.i.i = icmp ugt i32 %X, 1
  %retval.0.i.i = select i1 %or.cond.inv.i.i, i32 %shl.i.i, i32 1
  8d 47 ff                   lea    -0x1(%rdi),%eax
  f3 0f bd c0                lzcnt  %eax,%eax
  f6 d8                      neg    %al
  b9 01 00 00 00             mov    $0x1,%ecx
  c4 e2 79 f7 c1             shlx   %eax,%ecx,%eax
  83 ff 02                   cmp    $0x2,%edi
  0f 42 c1                   cmovb  %ecx,%eax

We could drop cmp and cmovb like so:

  8d 47 ff                   lea    -0x1(%rdi),%eax
  f3 0f bd c0                lzcnt  %eax,%eax
  f6 d8                      neg    %al
  b9 01 00 00 00             mov    $0x1,%ecx
  c4 e2 79 f7 c1             shlx   %eax,%ecx,%eax

This shorter sequence handles input 0 and 1 correctly:

input   0    1    2
-------------------
lea    -1    0    1
lzcnt   0   32   31
neg     0  -32  -31
&0x1f   0    0    1
shlx    1    1    2

Note that shlx masks the shift count with 0x1f, which is something that the LLVM IR doesn't know, so this issue probably belongs to the x86 backend.

I've empirically verified the equivalence for all possible values of uint32_t.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions