Skip to content

optimize reciprocal square root with fast-math (x86) #21274

Open
@rotateright

Description

@rotateright
Bugzilla Link 20900
Version trunk
OS All
Blocks #31672
CC @hfinkel,@RKSimon,@tycho

Extended Description

$ ./clang -v
clang version 3.6.0 (217530)
Target: x86_64-apple-darwin13.3.0
Thread model: posix

$ cat rsqrt.c
#include <math.h>
float reciprocal_square_root(float x) {
return 1.0f / sqrtf(x);
}

$ ./clang -O2 -ffast-math -S -o - rsqrt.c
...
sqrtss %xmm0, %xmm1
movss LCPI0_0(%rip), %xmm0
divss %xmm1, %xmm0


This should be optimized to use 'rsqrtss'.

ICC 14 does this at -O2:

    rsqrtss   %xmm0, %xmm2
    mulss     %xmm2, %xmm0
    mulss     %xmm2, %xmm0
    movss     L_2il0floatpacket.2(%rip), %xmm1
    mulss     %xmm1, %xmm2
    subss     L_2il0floatpacket.1(%rip), %xmm0
    mulss     %xmm2, %xmm0
    ret       

L_2il0floatpacket.1:
.long 0x40400000
L_2il0floatpacket.2:
.long 0xbf000000

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions