Added missing LLVM_EXTERNAL_VISIBILITY #32

katyo · 2020-08-01T16:41:52Z

Quick fix for #31

katyo · 2020-08-04T03:21:19Z

closed due to closing #31

Combine sequences such as: ```llvm %pn1 = phi [init1, %BB1], [%op1, %BB2] %pn2 = phi [init2, %BB1], [%op2, %BB2] %op1 = binop %pn1, constant1 %op2 = binop %pn2, constant2 %rdx = binop %op1, %op2 ``` Into: ```llvm %phi_combined = phi [init_combined, %BB1], [%op_combined, %BB2] %rdx_combined = binop %phi_combined, constant_combined ``` This allows us to simplify interleaved reductions, for example as introduced by the loop vectorizer. The anecdotal example for this is the loop below: ```c float foo() { float q = 1.f; for (int i = 0; i < 1000; ++i) q *= .99f; return q; } ``` Which currently gets lowered explicitly such as (on AArch64, interleaved by four): ```gas .LBB0_1: fmul v0.4s, v0.4s, v1.4s fmul v2.4s, v2.4s, v1.4s fmul v3.4s, v3.4s, v1.4s fmul v4.4s, v4.4s, v1.4s subs w8, w8, #32 b.ne .LBB0_1 ``` But with this patch lowers trivially: ```gas foo: mov w8, llvm#5028 movk w8, llvm#14389, lsl #16 fmov s0, w8 ret ```

…lvm#152156) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (llvm#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, #8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, #32 add x12, x12, #32 stp q0, q1, [x10, #-16] add x10, x10, #32 ... ```

Added missing LLVM_EXTERNAL_VISIBILITY

eb6c635

katyo closed this Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added missing LLVM_EXTERNAL_VISIBILITY #32

Added missing LLVM_EXTERNAL_VISIBILITY #32

Uh oh!

katyo commented Aug 1, 2020

Uh oh!

katyo commented Aug 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added missing LLVM_EXTERNAL_VISIBILITY #32

Added missing LLVM_EXTERNAL_VISIBILITY #32

Uh oh!

Conversation

katyo commented Aug 1, 2020

Uh oh!

katyo commented Aug 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant