You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL
This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP.
Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.
There's a subtly here. For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG.
It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):
output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles: 20703
output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles: 23903
output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles: 21604
output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles: 22804
output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles: 15204
output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles: 18404
output/sifive-x280/stack_by_vreg.mca:Total Cycles: 12104
output/sifive-x280/stack_element_by_element.mca:Total Cycles: 4304
I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.
Differential Revision: https://reviews.llvm.org/D159375
0 commit comments