[X86] Failure to use PHADDD on Intel CPUs on the second to last step of a v8i32 pairwise reduction

|  |  |
| --- | --- |
| Bugzilla Link | [39920](https://llvm.org/bz39920) |
| Resolution | FIXED |
| Resolved on | May 09, 2019 11:14 |
| Version | trunk |
| OS | Windows NT |
| Blocks | llvm/llvm-project#35132  |
| CC | @adibiagio,@topperc,@RKSimon,@rotateright |
| Fixed by commit(s) | r360360 |

## Extended Description 
I think we should use HADDPS for the first reduction step of this on Intel CPUs

define fastcc i32 @&#8203;pairwise_reduction4i32(<4 x i32> %rdx, i32 %f1) {
  %rdx.shuf.1.0 = shufflevector <4 x i32> %rdx, <4 x i32> undef,<4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
  %rdx.shuf.1.1 = shufflevector <4 x i32> %rdx, <4 x i32> undef,<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
  %bin.rdx8 = add <4 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1
  %rdx.shuf.2.0 = shufflevector <4 x i32> %bin.rdx8, <4 x i32> undef,<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %rdx.shuf.2.1 = shufflevector <4 x i32> %bin.rdx8, <4 x i32> undef,<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
  %bin.rdx9 = add <4 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1

  %r = extractelement <4 x i32> %bin.rdx9, i32 0
  ret i32 %r
}

This is the assembly we get on sse4.1

        pshufd  $232, %xmm0, %xmm1      # xmm1 = xmm0[0,2,2,3]
        pshufd  $237, %xmm0, %xmm0      # xmm0 = xmm0[1,3,2,3]
        paddd   %xmm1, %xmm0
        pshufd  $229, %xmm0, %xmm1      # xmm1 = xmm0[1,1,2,3]
        paddd   %xmm0, %xmm1
        movd    %xmm1, %eax
        retq


PHADDD uses 2 shuffles internally on Intel CPus, but as you can see the assembly we emitted also uses 2 shuffles. So I don't think we saved anything by avoiding PHADDD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Failure to use PHADDD on Intel CPUs on the second to last step of a v8i32 pairwise reduction #39267

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Bugzilla Link	39920
Resolution	FIXED
Resolved on	May 09, 2019 11:14
Version	trunk
OS	Windows NT
Blocks	#35132
CC	@adibiagio,@topperc,@RKSimon,@rotateright
Fixed by commit(s)	r360360

[X86] Failure to use PHADDD on Intel CPUs on the second to last step of a v8i32 pairwise reduction #39267

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions