Skip to content

Phoebe/parallelism test/rayon par iter only #2552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gefjon
Copy link
Contributor

@gefjon gefjon commented Apr 4, 2025

For testing/benchmarking purposes. Follow-up to #2543 .

This experiment is intended to test the theory that our synchronization overhead is coming from moving work off the "main thread" to Rayon workers via scope and spawn, not from par_iter parallel iteration. If that is the case, we should see synchronization overhead (i.e. futex_wait time) decreased similar to the previous experiment relavite to the control, but also see improved CPU utilization across multiple cores.

gefjon added 3 commits April 2, 2025 15:25
For testing/benchmarking purposes.
One theory about our performance is that we're spending a lot of time context-switching
between Tokio and Rayon threads.
This build will be used in the first of a series of experiments
to investigate that overhead.
In this patch, we just do sequential evaluation on the Tokio worker threads
where the rest of our code runs, instead of sending stuff to Rayon.
Rayon use is mostly, but not entirely, removed from Core.

The two next steps I am interested in:
- Use parallelism, but on Tokio workers rather than Rayon threads.
  I.e. replace `par_iter`, `fold` and `reduce_with` calls with Tokio-isms,
  instead of this patch's `std::iter::Iterator` versions.
- (Not discussed in meeting) continue using `par_iter` and friends,
  but invoke the "outer loop" from Tokio threads.
  I.e. retain use of `par_iter`, `fold` and `reduce_with`,
  but remove calls to `rayon::scope` or `rayon::spawn`.
This experiment is intended to test the theory that our synchronization overhead
is coming from moving work off the "main thread" to Rayon workers via `scope` and `spawn`,
not from `par_iter` parallel iteration.
If that is the case, we should see synchronization overhead (i.e. `futex_wait` time)
decreased similar to the previous experiment relavite to the control,
but also see improved CPU utilization across multiple cores.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant