Sub-optimal codegen for float newtypes

It seems that floating point types wrapped in newtypes are passed to functions in general purpose registers instead of SIMD registers the way plain f32 and f64 are. Consider this [code example in playpen](http://is.gd/5oW5Nn).

Without inlining, `add_f32()` and `add_f64()` compile to a single instruction (plus return) while `add_newtype_{f32|f64}()` first have to move their arguments from a general purpose register perform the addition and move the result back to the GPR.

With inlining, the situation is better, but still not optimal. Once again, the functions defined on plain types work directly in SIMD registers and the loop now gets unrolled by a factor of 10. For the newtypes, the accumulator is still kept in a GPR, however the loop is unrolled by a factor of 5. Here, the accumulator is only moved from the GPR to a SIMD register at the start of a loop iteration and moved back at the end (after 5 additions have been performed).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sub-optimal codegen for float newtypes #32031

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sub-optimal codegen for float newtypes #32031

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions