Skip to content

Conversation

@katyo
Copy link

@katyo katyo commented Aug 1, 2020

Quick fix for #31

@katyo
Copy link
Author

katyo commented Aug 4, 2020

closed due to closing #31

@katyo katyo closed this Aug 4, 2020
espressif-bot pushed a commit that referenced this pull request Jul 1, 2025
Combine sequences such as:
```llvm
  %pn1 = phi [init1, %BB1], [%op1, %BB2]
  %pn2 = phi [init2, %BB1], [%op2, %BB2]
  %op1 = binop %pn1, constant1
  %op2 = binop %pn2, constant2
  %rdx = binop %op1, %op2
```
Into:
```llvm
  %phi_combined = phi [init_combined, %BB1], [%op_combined, %BB2]
  %rdx_combined = binop %phi_combined, constant_combined
```

This allows us to simplify interleaved reductions, for example as
introduced by the loop vectorizer.

The anecdotal example for this is the loop below:
```c
float foo() {
  float q = 1.f;
  for (int i = 0; i < 1000; ++i)
    q *= .99f;
  return q;
}
```
Which currently gets lowered explicitly such as (on AArch64,
interleaved by four):
```gas
.LBB0_1:
  fmul    v0.4s, v0.4s, v1.4s
  fmul    v2.4s, v2.4s, v1.4s
  fmul    v3.4s, v3.4s, v1.4s
  fmul    v4.4s, v4.4s, v1.4s
  subs    w8, w8, #32
  b.ne    .LBB0_1
```
But with this patch lowers trivially:
```gas
foo:
  mov     w8, llvm#5028
  movk    w8, llvm#14389, lsl #16
  fmov    s0, w8
  ret
```
espressif-bot pushed a commit that referenced this pull request Aug 7, 2025
…lvm#152156)

With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(llvm#132246), which reaps the same code generation benefits of preferring
fixed over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, #8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, #32
	add	    x12, x12, #32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, #32

...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant