Skip to content

Use three-operand LEA for select of constants on some x86 architectures #61365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kazutakahirata opened this issue Mar 13, 2023 · 3 comments
Closed

Comments

@kazutakahirata
Copy link
Contributor

kazutakahirata commented Mar 13, 2023

Compile:

// clang -O2 -march=znver3
unsigned select_unsigned_lt_10_8_13(unsigned X) {
  return X < 10 ? 8 : 13;
}

I get:

  31 c0                      xor    %eax,%eax
  83 ff 0a                   cmp    $0xa,%edi
  0f 93 c0                   setae  %al
  8d 04 80                   lea    (%rax,%rax,4),%eax
  83 c8 08                   or     $0x8,%eax

We could generate:

  31 c0                      xor    %eax,%eax
  83 ff 0a                   cmp    $0xa,%edi
  0f 93 c0                   setae  %al
  8d 44 80 08                lea    8(%rax,%rax,4),%eax

saving one instruction and 2 bytes on those x86 architectures where three-operand LEAs are not discouraged.

I thought the x86 backend might be intentionally avoiding the three-operand LEA for -march=znver3, but that doesn't seem to be the case. For return X < 10 ? 9 : 12, I get:

  31 c0                      xor    %eax,%eax
  83 ff 0a                   cmp    $0xa,%edi
  0f 93 c0                   setae  %al
  8d 44 40 09                lea    0x9(%rax,%rax,2),%eax

Even on those x86 architectures where three-operand LEAs are discouraged, this optimization might be useful for size optimization purposes.

@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2023

@llvm/issue-subscribers-backend-x86

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 13, 2023

This looks like its something to do with the different condition codes coming from IR:

define i32 @select_unsigned_lt_10_8_13j(i32 %0) {
  %2 = icmp ult i32 %0, 10
  %3 = select i1 %2, i32 8, i32 13
  ret i32 %3
}

define i32 @select_unsigned_lt_10_9_12j(i32 %0) {
  %2 = icmp ugt i32 %0, 9
  %3 = select i1 %2, i32 12, i32 9
  ret i32 %3
}

kazutakahirata added a commit that referenced this issue Mar 24, 2023
This patch precommits a test for:

#61365
@kazutakahirata
Copy link
Contributor Author

Just posted: https://reviews.llvm.org/D146787

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants