[DFAJumpThreading] Enable DFAJumpThread by default. #157646

buggfg · 2025-09-09T11:05:11Z

We recommend setting dfa-jump-thread to be enabled by default. It’s a mature optimization that’s been supported since GCC 9.1.0. At the -O2 opt level, both the GCC and ICX compilers have this optimization enabled by default.

Once it’s enabled, we saw a 13% performance improvement in the CoreMark benchmark on the X86 platform (Intel i9-11900K Rocket Lake), and even a 15% increase on the KunMingHu FPGA. Additionally, we verified the correctness of this pass using SPEC 2017.

Co-Authored-By: YinZd <[email protected]> Co-Authored-By: ict-ql <[email protected]> Co-Authored-By: Lin Wang <[email protected]>

github-actions · 2025-09-09T11:05:30Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

efriedma-quic · 2025-09-09T22:04:03Z

See previous discussion on #83033.

nikic · 2025-09-10T08:43:05Z

This pass currently does not have a maintainer. We should have one if we enable it by default.

@XChy What is your opinion on the current quality of this pass?

@buggfg Beyond running SPEC, have you performed any due diligence to ensure this pass is ready for default enablement? Have you performed any targeted fuzzing for this enablement?

I'll provide new compile-time numbers.

XChy · 2025-09-10T10:50:40Z

The current implementation is conservative and optimizes only one switch. After numerous fixes, this pass has become more stable and safe. For me, it's acceptable to enable this pass now if there is no severe compile-time regression. And enabling it will uncover more hidden problems.
My main concern is that compile-time may increase massively if we optimize multiple switches in the future, though we don't allow it currently.
Anyway, I am not against enabling DFAJumpThreading by default.

nikic · 2025-09-10T10:55:57Z

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=d685508bec02a676383b284d268fe8a2e4cbf7f3&to=ba220372f5c68b12c15ce2efbccf4edc9ea300f1&stat=instructions:u

Generally looks okay, with some outliers that need to be investigated (libclamav_nsis_LZMADecode.c +95%, NativeInlineSiteSymbol.cpp +33%). These might be fine assuming the increase is due to unavoidable second order effects from the transformation being performed.

buggfg · 2025-09-11T10:51:49Z

This pass currently does not have a maintainer. We should have one if we enable it by default.

@XChy What is your opinion on the current quality of this pass?

@buggfg Beyond running SPEC, have you performed any due diligence to ensure this pass is ready for default enablement? Have you performed any targeted fuzzing for this enablement?

I'll provide new compile-time numbers.

Hi, I’m really glad to hear that both reviewers support enabling DFAJumpThreading by default. I haven’t had the chance to do any additional research beyond the SPEC2017 benchmarks yet. However, I believe that enabling this optimization by default is essential for the embedded domain, as it will help increase LLVM's impact and uncover other potential issues. :)

XChy · 2025-09-27T18:07:40Z

@nikic, I can be a candidate for maintainer, as I am familiar with the codebase of this pass and actively involved in it. But it would be better if there were someone else to co-maintain. I'm concerned that only one person will be working on it, actually. @dybv-sc @UsmanNadeem, do you volunteer too?

UsmanNadeem · 2025-09-30T18:01:54Z

@XChy Yes, I can volunteer to be a co-maintainer as well.

We are also interested in seeing DFA Jumpthreading being enabled by default. I think some others have also shown interest in it as well. See my previous patch: #83033

XChy · 2025-10-01T08:04:21Z

Generally looks okay, with some outliers that need to be investigated (libclamav_nsis_LZMADecode.c +95%, NativeInlineSiteSymbol.cpp +33%)

Will look into these cases.

XChy · 2025-10-01T14:38:08Z

I profiled libclamav_nsis_LZMADecode with valgrind. I found that the SLPVectorizer becomes slower after DFAJumpThreading. Specifically, the time proportion of SLPVectorizer::BoUpSLP::buildTreeRec increases from 10% to 37%.

Before enableing DFAJumpThreading:

After:

XChy · 2025-10-03T07:42:54Z

I believe #161632 has resolved most outliers. The bottleneck mostly lies in the later optimizations in the pipeline after duplicating blocks, instead of lying in DFAJumpThreading itself.

XChy · 2025-10-07T15:41:20Z

After effort on reducing compile-time, we get a better number: https://llvm-compile-time-tracker.com/compare.php?from=ed113e7904943565b4cd05588f6b639e40187510&to=2b901ca2e1b77d2b7a31cbcb57a921aa662341f9&stat=instructions:u.
@nikic, are you satisfied with the current compile-time?

nikic · 2025-10-07T16:22:56Z

After effort on reducing compile-time, we get a better number: https://llvm-compile-time-tracker.com/compare.php?from=ed113e7904943565b4cd05588f6b639e40187510&to=2b901ca2e1b77d2b7a31cbcb57a921aa662341f9&stat=instructions:u. @nikic, are you satisfied with the current compile-time?

Yeah, those results look great.

I also started a compile-time run on llvm-opt-benchmark with these results: dtcxzyw/llvm-opt-benchmark#2906 (comment) If you have time, looking at the big outlier there (openssl/smime.ll with +175%) would be good.

nikic · 2025-10-07T16:25:26Z

@zyw-bot csmith-fuzz

XChy · 2025-10-08T17:29:21Z

looking at the big outlier there (openssl/smime.ll with +175%) would be good.

#162447 resolves it partially, but the main cause remains there: the number of phi nodes increases massively, and thus SLPVectorizer slows down. Not come up with a solution for DFAJumpThreading yet. Maybe we can redesign the SSA updating method there.

nikic · 2025-10-10T12:46:13Z

@XChy Can you share how the before/after IR for that case looks like?

XChy · 2025-10-10T14:27:10Z

@XChy Can you share how the before/after IR for that case looks like?

@nikic Sure, see also https://gist.github.com/XChy/9fd567c44be7f5097b7edb973ef236f8.

XChy · 2025-10-10T14:37:31Z

The IR file may be too big to read. To illustrate it more clearly, we can imagine a threadable path BB1 -> BB2 -> BB3, where the terminator of BB2 is a switch with many successors. Duplicating BB2 requires inserting many phi nodes into these successors for the cloned instructions throughout the path.
To avoid such cases, I only come up with a workaround: create a common block with the switch and insert common phis into this common block. Though I don't think it's a good solution since it may require inserting additional branches..

nikic · 2025-10-10T14:48:37Z

Would it make sense to limit the number of phi nodes?

Looking at this example, I'm also somewhat surprised that we perform the optimization in a case where the result still has multiple large dispatch switches -- is that expected/profitable?

XChy · 2025-10-10T15:17:38Z

Would it make sense to limit the number of phi nodes?

Yes, but the problem is that it's hard to predict the exact number of inserted phi nodes before threading. Coarsely, we can set a threshold for the number of unduplicated successors.

Looking at this example, I'm also somewhat surprised that we perform the optimization in a case where the result still has multiple large dispatch switches -- is that expected/profitable?

The multiple large dispatch switches are not the targets to thread over; it's just like a sub-switch under the switch. It's semantically correct to duplicate such switches. For the small sub-switch, it's as profitable as common branches. But for such a big sub-switch, I am not quite sure without testing on real benchmarks. If the threading path is cold, I guess it's unprofitable due to the increase in code size (compares and jump tables?).

XChy · 2025-10-10T16:40:41Z

It occurs to me that the cost models of some targets, like X86/AArch64, assume the codesize cost of branches to be the basic cost. Thus, the duplication cost of switches is not correctly estimated. We can adjust the threading cost here.

Enable DFAJumpThread by default.

fa5628b

Co-Authored-By: YinZd <[email protected]> Co-Authored-By: ict-ql <[email protected]> Co-Authored-By: Lin Wang <[email protected]>

nikic requested a review from XChy September 10, 2025 08:37

XChy mentioned this pull request Oct 1, 2025

[LLVM] Volunteer myself and Usman Nadeem as DFAJumpThreading maintainers #161491

Merged

XChy requested a review from UsmanNadeem October 7, 2025 15:20

nikic mentioned this pull request Oct 7, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

zyw-bot mentioned this pull request Oct 7, 2025

pre-commit: PR157646 dtcxzyw/llvm-opt-benchmark#2906

Closed

zyw-bot mentioned this pull request Oct 7, 2025

Fuzz PR157646 dtcxzyw/llvm-fuzz-service#143

Open

This was referenced Oct 10, 2025

pre-commit: PR157646 dtcxzyw/llvm-opt-benchmark#2915

Closed

pre-commit: PR157646 dtcxzyw/llvm-opt-benchmark#2917

Closed

[DFAJumpThreading] Enable DFAJumpThread by default. #157646

Are you sure you want to change the base?

[DFAJumpThreading] Enable DFAJumpThread by default. #157646

Conversation

buggfg commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

efriedma-quic commented Sep 9, 2025

Uh oh!

nikic commented Sep 10, 2025

Uh oh!

XChy commented Sep 10, 2025

Uh oh!

nikic commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buggfg commented Sep 11, 2025

Uh oh!

XChy commented Sep 27, 2025

Uh oh!

UsmanNadeem commented Sep 30, 2025

Uh oh!

XChy commented Oct 1, 2025

Uh oh!

XChy commented Oct 1, 2025

Uh oh!

XChy commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XChy commented Oct 7, 2025

Uh oh!

nikic commented Oct 7, 2025

Uh oh!

nikic commented Oct 7, 2025

Uh oh!

XChy commented Oct 8, 2025

Uh oh!

nikic commented Oct 10, 2025

Uh oh!

XChy commented Oct 10, 2025

Uh oh!

XChy commented Oct 10, 2025

Uh oh!

nikic commented Oct 10, 2025

Uh oh!

XChy commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XChy commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nikic commented Sep 10, 2025 •

edited

Loading

XChy commented Oct 3, 2025 •

edited

Loading

XChy commented Oct 10, 2025 •

edited

Loading

XChy commented Oct 10, 2025 •

edited

Loading