Skip to content

Conversation

@mrsharm
Copy link
Member

@mrsharm mrsharm commented Jul 29, 2024

Fixes #94076 - there will never be a full improvement vs. WKS but this gets us down from a 18.75% regression to a 6.25% one in terms of Startup time.

The main issue here is that DATAS calls change_heap_count during the startup phase without having access to the yp_spin_count provided by SetYieldProcessorScalingFactor set by the YieldProcessor and this delays the startup.

Initial Data

Ran multiple yp_spin_count configurations on a 28 core Windows machine (which seemed to have repro'd the issue, as well):

application WKS vs. DATAS w/o Any Changes (32 * 28 = 896 Initial Spin) WKS vs. DATAS w/ YP Spin = 10 WKS vs. DATAS w/ YP Spin = 100 WKS vs. DATAS w/ YP Spin = 400
Start Time (ms) 18.75% 6.25% 10.94% 17.19%

Based on this, a YP Spin Count Unit of 10 seems to be a viable option.

Performance Results

Benchmark Name datas_baseline datas_fix Comparison %
Stage1Aot 73.500 68.500 -6.803
  • The baseline was the commit right before the change was made.
  • This PR improves the start-up time compared to both WKS and SVR by approx. the same amount vs. the baseline.

Observing the results from 4 iterations on a 28-core machine for the Stage1Aot scenario, we observe that there were no regressions (all within a percentage diff of < 2%).

Benchmark Name Average of Max heap size / Comparison % Average of RPS / Comparison % Average of Mean latency / Comparison %
Stage1Aot 1.780 -0.103 0.000

For all the ASP.NET benchmarks, we found that only for Stage1Aot is where we trigger a GC before the YieldProcessor Measurement is received by the GC. We found this by observing the timestamps of:

  1. The start of the first GC.
  2. The first time we observe a YieldProcessorMeasurement event.
  3. The startup time end of the benchmark.

The perfview args to get the appropriate events are:

/BufferSizeMB:256 /StackCompression /KernelEvents:Process /ClrEvents:GC,Threading /Providers:"Benchmarks" /NoGui /NoNGenRundown".

@ghost ghost added the area-GC-coreclr label Jul 29, 2024
@mrsharm mrsharm marked this pull request as draft July 29, 2024 17:39

#ifdef MULTIPLE_HEAPS
#ifdef DYNAMIC_HEAP_COUNT
yp_spin_count_unit = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to set the low initial spin counts unconditionally for all flavors?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah would be ideal to set it unconditionally, unless there are other perf concerns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do some more testing with SVR and a low initial yp spin count to see if there is any impact. Changed my mind and currently isolated it to just the DATAS case to minimize impact but will get more data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a look at more data and concluded that for WKS and SVR, we aren't exhibiting the case where we don't have the Yield Processor Factor before the first GC that was the cause of the regression. We can possibly consider setting an default agnostic of flavor after doing more analysis on other benchmarks.

@Maoni0
Copy link
Member

Maoni0 commented Jul 29, 2024

this is actually very different from what I tried. I just set the first few GC's spin count to 10 but kept the same values as soon SetYieldProcessorScalingFactor is called. this is a much bigger change since you are just actually changing original_spin_count_unit completely which means when SetYieldProcessorScalingFactor is called.it's setting a very different value from before, which means it's a much bigger change than just changing the first few GC's spin count value.

changing the spin count should be done very carefully - I made some attempts before but didn't come up with something that worked well yet. the tiny benchmarks could show great improvement but real world scenarios are much more complicated. I would limit this PR to just fixing the startup problem.

@mrsharm mrsharm marked this pull request as ready for review August 6, 2024 18:32
@mrsharm mrsharm merged commit a562a9f into dotnet:main Aug 7, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GC] DATAS Startup time significantly worse than with workstation GC

4 participants