Skip to content

[StackColoring] Change the StackColoring logic + enables it to handle spills #143800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Ralender
Copy link
Collaborator

Here is a patch set I have been working on to reduce stack usage.
It starts with a bunch if trivial NFC-like commits, that can be easily handled separately
Then a replacement of the StackColoring algorithm, with all the original analysis and assumption kept.
Finally changes to allow the new algo in StackColoring to operate on spill slots

I know that lifetime annotations are a mess, so I tried to not change any of the assumptions made by the old StackColoring.

here is some examples of the current stack coloring's shortcoming:

void use(...);
void example(int test) {
  int e[4];
  int a, b, c, d;
  if (test)
    return use(&e);
  else
    return use(&a, &b, &c, &d);
}

one of the int will be merged with the array, and thats it.
when all 4 ints could share the space with the array

void bar(char *, int);
void foo(bool var) {
A: {
  char z[4096];
  bar(z, 0);
}

  char *p;
  char x[4096];
  char y[4096];
  if (var) {
    p = x;
  } else {
    bar(y, 1);
    p = y + 1024;
  }
B:
  bar(p, 2);
}

The current StackColoring is not capable of re-using the same storage for all 3 of these arrays
Even if its the example given in the comments of StackColoring.

Both those issues are fixed by the Algorithm change

StackColoring had TODO's about this issue and handling spill slots at the same time as regular slots

This patch set adds 2 options used to do the testing.
-mllvm -new-stack-coloring, this swaps out the algorithm for stack-coloring.
-mllvm -merged-stack-coloring, this moves stack-coloring after regalloc instead of stack-slot-coloring

I have not yet updated tests because all of the work is behind options
and didn't added golden tests YET because the code has kept changing. But I obviously will add tests. before any merging.

Testing Done on this patch:

  • Basic correctness checked on:
    • x86_64-linux-gnu
    • i386-linux-gnu
    • x86_64-windows-pc
    • aarch64-linux-gnu
    • arm-linux-gnueabihf
    • riscv64-linux-gnu
    • I need to do basic correctness checking on targets where the stack grows up.
  • Deeper testing and benchmarking for x86_64-linux-gnu:
    • Passes the correctness part of the llvm-test-suite
    • All Benchmark and example presented.

Benchmark results (mean):
Note: all benchmarks are done with release with assert build, on the llvm-test-suite
With max-candidates=unlimited, because it doesn't seem to have much impact on compile-time.

  • exec time: 0%
  • compile-time: 0%
  • binary size: +0.1%
    I suspect this is caused by some frequently accessed slot not being close to the sp,
    leading to larger variants of the instruction being used. But I need to investigate
  • prologepilog.NumBytesStackSpace: -1.5% and no regressions

Note: The cleanup of the git history was only partial for now, its intended mostly help review.

Copy link

github-actions bot commented Jun 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@thanm
Copy link
Contributor

thanm commented Jun 11, 2025

I am willing to take a look, but for various reasons I won't be able to dig in for a couple of months (I am away from my work computers). Hope you can make progress without my eyeballs for now. Thanks.

@efriedma-quic
Copy link
Collaborator

The cleanup commits look reasonable; please submit them separately.

The IR representation of lifetimes has some pretty significant issues (see #45725 (comment) , #51838, #132085, #43674). Will this cause those issues to be exposed more frequently?

@Ralender
Copy link
Collaborator Author

Ralender commented Jun 12, 2025

The cleanup commits look reasonable; please submit them separately.

Yes,

The IR representation of lifetimes has some pretty significant issues (see #45725 (comment) , #51838, #132085, #43674). Will this cause those issues to be exposed more frequently?

Here are my thoughts on the problems in general, I think there is 2 distinct issues:
Pointer comparisons self-consistency:
The only viable solution I see is tagging allocas, that need to stay distinct from each other and ensuring alias analysis and StackColoring respect it. and whenever the IR folds a icmp based on distinct provenance, it tags the allocas as distinct from each other. this keeps folding icmp simple. and it would not be legal to remove a tag from an alloca under any circumstances. I/We need to experiment, but it may not have much of negative impact.

IR transformation moving accesses out of lifetime, causing UB:
I have only seen it for loads, so maybe loads outside of lifetime could be defined as poison instead of UB, as has been proposed.
maybe an other solution is to say that lifetime annotation only have meaning for captures, and accesses outside of it are legal. alias analysis could still assume that a call outside of lifetime cannot access an escaped pointer, but potential load or store outside of lifetime would cause extension of the lifetime. Captures scopes cannot be changed by optimizations and everything else should analyzable even if we may need to be pessimist when multiple objects could be accessed by a load. the consequence is more complex analysis to figure out the real lifetime, and being pessimist while doing so may cause some regressions.

About the question, original question:
The old and new StackColoring should always agree on whether 2 slots can overlap or not.
I checked it at some point by having an assert during development (it was removed since).
That said, the new algorithm reduces stack space but getting more overlaps between slots.
So some slots that would not have overlapped with the old algo because of different decisions may overlap now.
Spills are unaffected by all the lifetime issues.

@Ralender
Copy link
Collaborator Author

After a bit of heuristics tuning, I got rid of the regression in average on code-size
here are the new results:

exec-time: 0%
compile-time: 0%
code-size: -0.1%
prologepilog.NumBytesStackSpace: -1.4% and no regressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants