Skip to content

Conversation

wangpc-pp
Copy link
Contributor

Almost all CPUs have added this.

We may enable it by default and remove this feature.

Fixes #134272.

@llvmbot
Copy link
Member

llvmbot commented Apr 11, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)

Changes

Almost all CPUs have added this.

We may enable it by default and remove this feature.

Fixes #134272.


Full diff: https://github.com/llvm/llvm-project/pull/135318.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+2-1)
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 9d48adeec5e86..07a938db3e258 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -86,7 +86,8 @@ class RISCVTuneProcessorModel<string n,
                               list<SubtargetFeature> f = []>
     : ProcessorModel<n, m, f,tunef>;
 
-defvar GenericTuneFeatures = [TuneOptimizedNF2SegmentLoadStore];
+defvar GenericTuneFeatures = [TuneOptimizedNF2SegmentLoadStore,
+                              TuneNoDefaultUnroll];
 
 def GENERIC_RV32 : RISCVProcessorModel<"generic-rv32",
                                        NoSchedModel,

@topperc
Copy link
Collaborator

topperc commented Apr 11, 2025

We got a net improvement in SPEC2017 on our out of order cores by using the default heuristic intead of TuneNoDefaultUnroll. I don't remember the exact number now. We probably need to do some more tuning.

That doesn't need to block this patch though.

@lukel97
Copy link
Contributor

lukel97 commented Apr 11, 2025

I've started a SPEC run on the BPI-F3 with this, I'll post the results soon

@lukel97
Copy link
Contributor

lukel97 commented Apr 13, 2025

Looks like this causes quite a few regressions on the BPI-F3, rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/419

The code size increases are pretty big. Maybe the unrolling is too aggressive?

@wangpc-pp
Copy link
Contributor Author

Looks like this causes quite a few regressions on the BPI-F3, rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/419

The code size increases are pretty big. Maybe the unrolling is too aggressive?

Oops...This is surprising. I will do more investigations before we go back to this PR.

@wangpc-pp
Copy link
Contributor Author

Looks like this causes quite a few regressions on the BPI-F3, rva22u64_v -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/419
The code size increases are pretty big. Maybe the unrolling is too aggressive?

Oops...This is surprising. I will do more investigations before we go back to this PR.

It may be too aggressive for in-order cores. So I set the runtime unrolling count to 4 for in-order cores just like ARM and AArch64. Can you issue another run to see the effect?

Almost all CPUs have added this.

We may enable it by default and remove this feature.

Fixes llvm#134272.
@wangpc-pp wangpc-pp force-pushed the main-riscv-default-unroll branch from 2677953 to b45bf77 Compare June 17, 2025 08:19
@wangpc-pp
Copy link
Contributor Author

Gentle ping. :-)

@mikhailramalho
Copy link
Member

I can start a run tonight @wangpc-pp, I'll post the results tomorrow

@mikhailramalho
Copy link
Member

https://lnt.lukelau.me/db_default/v4/nts/665?compare_to=656

Some big regression in code size with rva22u64_v -O3 -flto. Only regression in exec time was 523.xalancbmk_r but we know that benchmark is usually noisy

@wangpc-pp
Copy link
Contributor Author

https://lnt.lukelau.me/db_default/v4/nts/665?compare_to=656

Some big regression in code size with rva22u64_v -O3 -flto. Only regression in exec time was 523.xalancbmk_r but we know that benchmark is usually noisy

The code size regressions are expected, the performance has no observable change possibly because Spacemit X60 is just a dual-issues in-order core.
I don't know if there are some data from other CPUs?

@mikhailramalho
Copy link
Member

https://lnt.lukelau.me/db_default/v4/nts/665?compare_to=656
Some big regression in code size with rva22u64_v -O3 -flto. Only regression in exec time was 523.xalancbmk_r but we know that benchmark is usually noisy

The code size regressions are expected, the performance has no observable change possibly because Spacemit X60 is just a dual-issues in-order core. I don't know if there are some data from other CPUs?

I can run this PR on a VisionFive V2 that I have available if you'd like, but it doesn't have RVV.

@wangpc-pp
Copy link
Contributor Author

https://lnt.lukelau.me/db_default/v4/nts/665?compare_to=656
Some big regression in code size with rva22u64_v -O3 -flto. Only regression in exec time was 523.xalancbmk_r but we know that benchmark is usually noisy

The code size regressions are expected, the performance has no observable change possibly because Spacemit X60 is just a dual-issues in-order core. I don't know if there are some data from other CPUs?

I can run this PR on a VisionFive V2 that I have available if you'd like, but it doesn't have RVV.

That'll be appreciated!

@mikhailramalho
Copy link
Member

https://lnt.lukelau.me/db_default/v4/nts/665?compare_to=656
Some big regression in code size with rva22u64_v -O3 -flto. Only regression in exec time was 523.xalancbmk_r but we know that benchmark is usually noisy

The code size regressions are expected, the performance has no observable change possibly because Spacemit X60 is just a dual-issues in-order core. I don't know if there are some data from other CPUs?

I can run this PR on a VisionFive V2 that I have available if you'd like, but it doesn't have RVV.

That'll be appreciated!

Hi, I ran the tests this weekend on the visionfive2. This is SPEC built with -march=rv64gc -fuse-ld=lld -O3 -flto: https://lnt.lukelau.me/db_default/v4/nts/684?compare_to=686. Unfortunately, there is no perf data available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RISCV] llvm::is_contained is suboptimal compared to X86/AArch64
5 participants