Skip to content

[Coro] Amortize debug info processing cost in CoroSplit #109032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

artempyanykh
Copy link
Contributor

@artempyanykh artempyanykh commented Sep 17, 2024

More details about this stack are in this topic on discourse.

TL;DR: In large modules with a bunch of coroutines the cost of debug info metadata processing can get very high (e.g. adding over a minute to the compile time). This stack does some refactoring around function cloning to significantly speed up the pass.

This makes CoroSplit pass almost as fast with -g2 as it is with -g1 on
the sample cpp file used with other parts of this stack:

Baseline IdentityMD set Prebuilt CommonDI MetadataPred (fin)
CoroSplitPass 306ms 221ms 68ms 3.8ms
CoroCloner 101ms 72ms 0.5ms 0.5ms
CollectCommonDI - - 63ms -
Speed up 1x 1.4x 4.5x 80x

Each commit is individually buildable and reviewable. I can extract them into separate PRs if the overall direction makes sense.


The final version has been merged in a series of PRs (❗️marks the salient ones):

  1. [NFC][Coro] Add helpers for coro cloning with a TimeTraceScope #112948
  2. [NFC][Utils] Extract CloneFunctionAttributesInto from CloneFunctionInto #112976
  3. [Utils] Extract CollectDebugInfoForCloning from CloneFunctionInto #114537
  4. [NFC][Utils] Remove DebugInfoFinder parameter from CloneBasicBlock #118620
  5. [NFC][Utils] Clone basic blocks after we're done with metadata in CloneFunctionInto #118621
  6. [NFC][Utils] Extract BuildDebugInfoMDMap from CloneFunctionInto #118622
  7. [NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto #118623
  8. [NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto #118624
  9. [NFC][Utils] Eliminate DISubprogram set from BuildDebugInfoMDMap #118625
  10. ❗️ [Utils] Identity map module-level debug info on first use in CloneFunction* #118627
  11. [Coro] Prebuild a module-level debug info set and share it between all coroutine clones #118628
  12. [NFC][Cloning] Make ClonedModule case more obvious in CollectDebugInfoForCloning #129143
  13. [NFC][Cloning] Simplify the flow in FindDebugInfoToIdentityMap #129144
  14. [NFC][Cloning] Add a helper to collect debug info from instructions #129145
  15. [NFC][Cloning] Make DifferentModule case more obvious in CollectDebugInfoForCloning #129146
  16. ❗️ [NFC][Cloning] Replace IdentityMD set with a predicate in ValueMapper #129147
  17. ❗️ [NFC][Cloning] Replace DIFinder usage in CloneFunctionInto with a MetadataPredicate #129148
  18. [NFC][Coro] Use CloneFunctionInto for coroutine cloning instead of CloneFunction<Part> #129149
  19. [NFC][Coro] Remove now unused CommonDebugInfo in CoroSplit #129150
  20. [NFC][Cloning] Remove now unused FindDebugInfoToIdentityMap #129151
  21. [NFC][Cloning] Remove now unused CollectDebugInfoForCloning #129152
  22. [NFC][Cloning] Clean up comments in CloneFunctionInto #129153
  23. [NFC][Cloning] Move DebugInfoFinder decl closer to its place of usage #129154

Copy link

github-actions bot commented Sep 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like generally a good idea and the implementation looks clean as well.
What are the behavior changes we expect from this patch series?

@artempyanykh
Copy link
Contributor Author

artempyanykh commented Sep 27, 2024

This sounds like generally a good idea and the implementation looks clean as well.

Thanks for reviewing @adrian-prantl!

What are the behavior changes we expect from this patch series?

  • I expect no functional changes, only faster compile times for code with coroutines.
  • The new DebugInfoCache analysis is not conditioned on the presence of coroutines in a module but it's fast and the analysis results can be used to speed up function cloning in other places than just coroutines.
  • I ran a compile time benchmark on a few thousand internal source files:
    • There were no perf regressions, code without coroutines had neutral perf impact.
    • Source files that use coroutines compiled 7% faster on avg, but there were many cases where it was 25%, 50% and even 65% faster. The larger the module is and the more coroutines it has, the greater the speed up.

@felipepiovezan
Copy link
Contributor

Thanks for working on this and for making each commit reviewable / commitable on its own, that's really great work!
I will start looking at them asap, wish I had seen this sooner! :)

@artempyanykh
Copy link
Contributor Author

Thanks @felipepiovezan! There's some commentary around the commits in this discourse topic, in case it's useful.

@artempyanykh
Copy link
Contributor Author

@adrian-prantl, @felipepiovezan have you had a chance to take a closer look? Could you advise on what would be the best way to proceed with reviewing this PR?

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felipepiovezan Can you apply this patch to the LLVM in swiftlang and run the tests with it as a sanity check?

@felipepiovezan
Copy link
Contributor

felipepiovezan commented Oct 17, 2024

Hi @artempyanykh, while I follow Adrian's suggestions, you could extract the first commit into a separate PR and we'll start reviewing/merging the NFC stuff one by one

@felipepiovezan
Copy link
Contributor

Hi @artempyanykh, while I follow Adrian's suggestions, you could extract the first commit into a separate PR and we'll start reviewing/merging the NFC stuff one by one

The second commit can also be a PR on its own right now, it doesn't depend on the first one AFAICT, so we can review them in parallel


// Copy all attributes other than those stored in the AttributeList. We need
// to remap the parameter indices of the AttributeList.
// Copy all attributes other than those stored in the AttributeList. We need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this comment is making a lot of sense as a function-level comment. To be honest even the original comment was confusing: which AttributeList? If you know the answer to this, could you take the chance provided by this NFC commit to improve the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second commit could also use a brief commit message describing why this change is useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment refers to the AttributeList field (the only such) of a Function which has e.g. parameters and return value attributes. I updated the comment in #112976 to (hopefully) make it a bit clearer.

DISubprogram *SPClonedWithinModule = nullptr;
if (Changes < CloneFunctionChangeType::DifferentModule) {
SPClonedWithinModule = F.getSubprogram();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM's coding guidelines prescribe not using {} around single line blocks

@@ -205,6 +205,11 @@ void CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
const char *NameSuffix = "",
ClonedCodeInfo *CodeInfo = nullptr);

/// Process debug information from function's subprogram attachment.
DISubprogram *ProcessSubprogramAttachment(const Function &F,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on "Process"? This is more about collecting debug info, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. This function hydrates the passed DebugInfoFinder by visiting relevant components of the function's subprogram attachment. I'll update the name/comment when we get to extracting that commit in a separate PR.

Copy link
Contributor

@felipepiovezan felipepiovezan Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, now that the other two PRs are open & approved, let's proceed with this commit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, yes! @felipepiovezan a silly question -- now that those PRs are approved, how/when do they get merged? Stacking unmerged PRs on top of each other doesn't have the best UX in Github.

Copy link
Contributor

@felipepiovezan felipepiovezan Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the best option here is to merge the two PRs that are approved (#112976 and #112948) . Both of those stand on their own and don't depend on each other, so it's fine to merge in w/e order, right?

And then you can grab the third commit (or more, if the subsequent commits are also independent of each other) of this PR and open a new PR (or multiple PRs, if you did the () before), so it will look fine on top of main. My understanding is that the fourth commit depends on the third though.

Regarding what to do about this umbrella PR in the mean time, you have two options: leave this as is (with the understanding that it is not meant to be merged, merely serve as a reference for all the work in the proposal), or rebase on top of main, which will get rid of the two-already merged commits and force push to your branch artempyanykh:fast-coro-upstream, which will mean this PR serves a reference of the work that's left to be merged, also with the understand that we're not merging it.

With all that stack, we'd never have to stack unmerged PRs on top of each other, does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do you have commit access?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the best option here is to merge the two PRs that are approved (#112976 and #112948) . Both of those stand on their own and don't depend on each other, so it's fine to merge in w/e order, right?

That's right, yes.

And then you can grab the third commit (or more, if the subsequent commits are also independent of each other) of this PR and open a new PR (or multiple PRs, if you did the () before), so it will look fine on top of main. My understanding is that the fourth commit depends on the third though.

After the first 2 are merged, extracting the third one in a separate PR is straightforward. With the rest of the stack there's quite a bit of sequencing, so we may need to rinse and repeat.

rebase on top of main, which will get rid of the two-already merged commits and force push to your branch artempyanykh:fast-coro-upstream, which will mean this PR serves a reference of the work that's left to be merged, also with the understand that we're not merging it

+1 to this. I was planning to keep rebasing this stack anyway to make sure things keep working end-to-end while the chunks are being extracted and merged.

By the way, do you have commit access?

No, not sure what the process for this is. Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, the LLVM dev conference happened last week and we were mostly out.

No, not sure what the process for this is. Why?

Just wondering if you needed help merging the open PRs. I've merged the first two for you.
The process for getting commit access is described here: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
If you don't want to do that, you can always let the reviewers know you need someone to press the merge button for you

Copy link
Contributor Author

@artempyanykh artempyanykh Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if you needed help merging the open PRs. I've merged the first two for you.

Thank you @felipepiovezan, appreciate the help! I also emailed Chris and got the commit rights.

I'll go ahead and extract the next PR now.

@artempyanykh
Copy link
Contributor Author

@felipepiovezan I extracted the first 2 commits into #112948 and #112976.

felipepiovezan pushed a commit that referenced this pull request Oct 30, 2024
A helper (2 overloads) that consolidates corocloner creation and the
actual cloning. The helpers create a TimeTraceScope to make it easier to
see how long the cloning takes.

Extracted from #109032 (commit 1)
felipepiovezan pushed a commit that referenced this pull request Oct 30, 2024
…to (#112976)

This patch is a part of step-by-step refactoring of CloneFunctionInto.
The goal is to extract reusable pieces out of it that will be later used
to optimize function cloning e.g. in coroutine processing.

Extracted from #109032 (commit 2)
@artempyanykh
Copy link
Contributor Author

I extracted the next commit into #114537
Also, rebased this PR

smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…112948)

A helper (2 overloads) that consolidates corocloner creation and the
actual cloning. The helpers create a TimeTraceScope to make it easier to
see how long the cloning takes.

Extracted from llvm#109032 (commit 1)
@artempyanykh
Copy link
Contributor Author

With #114537 out, there’s quite a bit of sequencing in the remaining 12 commits. I can combine some of the remaining commits into larger PRs if it makes the review logistics a bit easier. @felipepiovezan @adrian-prantl please let me know what you think.

@adrian-prantl
Copy link
Collaborator

With #114537 out, there’s quite a bit of sequencing in the remaining 12 commits. I can combine some of the remaining commits into larger PRs if it makes the review logistics a bit easier.

On the contrary, 12 PRs with a trivial NFC refactor will be easier to review than 1 large one.

@artempyanykh
Copy link
Contributor Author

Rebased on trunk

@artempyanykh
Copy link
Contributor Author

On the contrary, 12 PRs with a trivial NFC refactor will be easier to review than 1 large one.

I agree. Just being mindful of sequencing and iteration time. After the review of #114537 is complete, I'll try using spr or stack-pr to parallelize the rest of the stack a bit.

artempyanykh added a commit that referenced this pull request Nov 20, 2024
…14537)

Summary:
Consolidate the logic in a single function. We do an extra pass over
Instructions but this is necessary to untangle things and extract
metadata cloning in a future diff.

Test Plan:
```
$ ninja check-llvm-unit check-llvm
[211/213] Running the LLVM regression tests

Testing Time: 106.06s

Total Discovered Tests: 62601
  Skipped          :    17 (0.03%)
  Unsupported      :  2518 (4.02%)
  Passed           : 59911 (95.70%)
  Expectedly Failed:   155 (0.25%)
[212/213] Running lit suite 

Testing Time: 12.47s

Total Discovered Tests: 8474
  Skipped:   17 (0.20%)
  Passed : 8457 (99.80%)
```

Extracted from #109032 (commit 3) (there are more refactors and cleanups
in subsequent commits)
@felipepiovezan
Copy link
Contributor

Hi @artempyanykh, just checking the status of my reviews after the Thanksgiving break; AFAICT there are no new patches in this series, right?

@artempyanykh
Copy link
Contributor Author

Thanks for checking @felipepiovezan! I sent the rest of the stack as PRs via stack-pr: the root is at #118620. They still need to land in order, but could be reviewed separately/concurrently. Bottom half of the stack are refactorings. More interesting changes start at #118627.

Summary:
Extract the logic to build up a metadata map to use in metadata cloning
into a separate function.

Test Plan:
ninja check-llvm-unit check-llvm

stack-info: PR: llvm#118622, branch: users/artempyanykh/fast-coro-upstream/3
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.

Test Plan:
ninja check-llvm-unit check-llvm

stack-info: PR: llvm#118623, branch: users/artempyanykh/fast-coro-upstream/4
Summary:
This and previously extracted `CloneFunction*Into` functions will be used in later diffs.

Test Plan:
ninja check-llvm-unit check-llvm

stack-info: PR: llvm#118624, branch: users/artempyanykh/fast-coro-upstream/5
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.

Test Plan:
ninja check-llvm-unit check-llvm

stack-info: PR: llvm#118625, branch: users/artempyanykh/fast-coro-upstream/6
…ction*

Summary:
To avoid cloning module-level debug info (owned by the module rather
than the function), CloneFunction implementation used to eagerly
identity map such debug info into ValueMap's MD map. In larger modules
with meaningful volume of debug info this gets very expensive.

By passing such debug info metadata via an IdentityMD set for the
ValueMapper to map on first use, we get several benefits:

1. Mapping metadata is not cheap, particularly because of tracking. When
   cloning a Function we identity map lots of global module-level
   metadata to avoid cloning it, while only a fraction of it is actually
   used by the function. Mapping on first use is a lot faster for
   modules with meaningful amount of debug info.

2. Eagerly identity mapping metadata makes it harder to cache
   module-level data (e.g. a set of metadata nodes in a \a DICompileUnit).
   With this patch we can cache certain module-level metadata
   calculations to speed things up further.

Anecdata from compiling a sample cpp file with full debug info shows that this moderately speeds up
CoroSplitPass which is one of the heavier users of cloning:

|                 | Baseline | IdentityMD set |
|-----------------|----------|----------------|
| CoroSplitPass   | 306ms    | 221ms          |
| CoroCloner      | 101ms    | 72ms           |
| Speed up        | 1x       | 1.4x           |

Test Plan:
ninja check-llvm-unit
ninja check-llvm

stack-info: PR: llvm#118627, branch: users/artempyanykh/fast-coro-upstream/8
…l coroutine clones

Summary:
CoroCloner, by calling into CloneFunctionInto, does a lot of repeated
work priming DIFinder and building a list of common module-level debug
info metadata. For programs compiled with full debug info this can get
very expensive.

This diff builds the data once and shares it between all clones.

Anecdata for a sample cpp source file compiled with full debug info:

|                 | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) |
|-----------------|----------|----------------|--------------------------|
| CoroSplitPass   | 306ms    | 221ms          | 68ms                     |
| CoroCloner      | 101ms    | 72ms           | 0.5ms                    |
| CollectGlobalDI | -        | -              | 63ms                     |
| Speed up        | 1x       | 1.4x           | 4.5x                     |

Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone.

Test Plan:
ninja check-llvm-unit
ninja check-llvm

Compiled a sample internal source file, checked time trace output for scope timings.

stack-info: PR: llvm#118628, branch: users/artempyanykh/fast-coro-upstream/9
Summary:
The analysis simply primes and caches DebugInfoFinders for each DICompileUnit in a module. This
allows (future) callers like CoroSplitPass to compute global debug info metadata (required for
coroutine function cloning) much faster. Specifically, pay the price of DICompileUnit processing
only once per compile unit, rather than once per coroutine.

Test Plan:
Added a smoke test for the new analysis
ninja check-llvm-unit check-llvm

stack-info: PR: llvm#118629, branch: users/artempyanykh/fast-coro-upstream/10
Summary:
We can use a DebugInfoFinder from DebugInfoCache which is already primed on a compile unit to speed
up collection of module-level debug info.

The pass could likely be another 2x+ faster if we avoid rebuilding the set of global debug
info. This needs further massaging of CloneFunction and ValueMapper, though, and can be done
incrementally on top of this.

Comparing performance of CoroSplitPass at various points in this stack, this is anecdata from a sample
cpp file compiled with full debug info:
|                 | Baseline | IdentityMD set | Prebuilt CommonDI | Cached CU DIFinder (cur.) |
|-----------------|----------|----------------|-------------------|---------------------------|
| CoroSplitPass   | 306ms    | 221ms          | 68ms              | 17ms                      |
| CoroCloner      | 101ms    | 72ms           | 0.5ms             | 0.5ms                     |
| CollectGlobalDI | -        | -              | 63ms              | 13ms                      |
| Speed up        | 1x       | 1.4x           | 4.5x              | 18x                       |

Test Plan:
ninja check-llvm-unit
ninja check-llvm

Compiled a sample cpp file with time trace to get the avg. duration of the pass and inner scopes.

stack-info: PR: llvm#118630, branch: users/artempyanykh/fast-coro-upstream/11
@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-coroutines

@llvm/pr-subscribers-llvm-analysis

Author: Artem Pianykh (artempyanykh)

Changes

More details about this stack are in this topic on discourse.

TL;DR: In large modules with a bunch of coroutines the cost of debug info metadata processing can get very high (e.g. adding over a minute to the compile time). This stack does some refactoring around function cloning and caches some module-level calculations to speed up the pass.

Anecdata for a sample C++ source file:

Baseline / 0 IdentityMD set Prebuilt GlobalDI Cached CU DIFinder
CoroSplitPass 306ms 221ms 68ms 17ms
CoroCloner 101ms 72ms 63ms 0.5ms
CollectGlobalDI - - 63ms 13ms
Overall speed up 1x 1.4x 4.5x 18x

Each commit is individually buildable and reviewable. I can extract them into separate PRs if the overall direction makes sense.


Patch is 67.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109032.diff

27 Files Affected:

  • (added) llvm/include/llvm/Analysis/DebugInfoCache.h (+50)
  • (modified) llvm/include/llvm/IR/DebugInfo.h (+3-1)
  • (modified) llvm/include/llvm/Transforms/Coroutines/ABI.h (+9-4)
  • (modified) llvm/include/llvm/Transforms/Utils/Cloning.h (+42-13)
  • (modified) llvm/include/llvm/Transforms/Utils/ValueMapper.h (+49-18)
  • (modified) llvm/lib/Analysis/CGSCCPassManager.cpp (+7)
  • (modified) llvm/lib/Analysis/CMakeLists.txt (+1)
  • (added) llvm/lib/Analysis/DebugInfoCache.cpp (+47)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassRegistry.def (+1)
  • (modified) llvm/lib/Transforms/Coroutines/CoroCloner.h (+20-11)
  • (modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+82-15)
  • (modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+111-80)
  • (modified) llvm/lib/Transforms/Utils/ValueMapper.cpp (+15-4)
  • (modified) llvm/test/Other/new-pass-manager.ll (+1)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-lto-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-pgo-preinline.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
  • (modified) llvm/unittests/Analysis/CGSCCPassManagerTest.cpp (+3-1)
  • (modified) llvm/unittests/Analysis/CMakeLists.txt (+1)
  • (added) llvm/unittests/Analysis/DebugInfoCacheTest.cpp (+211)
diff --git a/llvm/include/llvm/Analysis/DebugInfoCache.h b/llvm/include/llvm/Analysis/DebugInfoCache.h
new file mode 100644
index 00000000000000..dbd6802c99ea01
--- /dev/null
+++ b/llvm/include/llvm/Analysis/DebugInfoCache.h
@@ -0,0 +1,50 @@
+//===- llvm/Analysis/DebugInfoCache.h - debug info cache --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains an analysis that builds a cache of debug info for each
+// DICompileUnit in a module.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_DEBUGINFOCACHE_H
+#define LLVM_ANALYSIS_DEBUGINFOCACHE_H
+
+#include "llvm/IR/DebugInfo.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+/// Processes and caches debug info for each DICompileUnit in a module.
+///
+/// The result of the analysis is a set of DebugInfoFinders primed on their
+/// respective DICompileUnit. Such DebugInfoFinders can be used to speed up
+/// function cloning which otherwise requires an expensive traversal of
+/// DICompileUnit-level debug info. See an example usage in CoroSplit.
+class DebugInfoCache {
+public:
+  using DIFinderCache = SmallDenseMap<const DICompileUnit *, DebugInfoFinder>;
+  DIFinderCache Result;
+
+  DebugInfoCache(const Module &M);
+
+  bool invalidate(Module &, const PreservedAnalyses &,
+                  ModuleAnalysisManager::Invalidator &);
+};
+
+class DebugInfoCacheAnalysis
+    : public AnalysisInfoMixin<DebugInfoCacheAnalysis> {
+  friend AnalysisInfoMixin<DebugInfoCacheAnalysis>;
+  static AnalysisKey Key;
+
+public:
+  using Result = DebugInfoCache;
+  Result run(Module &M, ModuleAnalysisManager &);
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/include/llvm/IR/DebugInfo.h b/llvm/include/llvm/IR/DebugInfo.h
index 73f45c3769be44..11907fbb7f20b3 100644
--- a/llvm/include/llvm/IR/DebugInfo.h
+++ b/llvm/include/llvm/IR/DebugInfo.h
@@ -120,11 +120,13 @@ class DebugInfoFinder {
   /// Process subprogram.
   void processSubprogram(DISubprogram *SP);
 
+  /// Process a compile unit.
+  void processCompileUnit(DICompileUnit *CU);
+
   /// Clear all lists.
   void reset();
 
 private:
-  void processCompileUnit(DICompileUnit *CU);
   void processScope(DIScope *Scope);
   void processType(DIType *DT);
   bool addCompileUnit(DICompileUnit *CU);
diff --git a/llvm/include/llvm/Transforms/Coroutines/ABI.h b/llvm/include/llvm/Transforms/Coroutines/ABI.h
index 0b2d405f3caec4..2cf614b6bb1e2a 100644
--- a/llvm/include/llvm/Transforms/Coroutines/ABI.h
+++ b/llvm/include/llvm/Transforms/Coroutines/ABI.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_TRANSFORMS_COROUTINES_ABI_H
 #define LLVM_TRANSFORMS_COROUTINES_ABI_H
 
+#include "llvm/Analysis/DebugInfoCache.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Transforms/Coroutines/CoroShape.h"
 #include "llvm/Transforms/Coroutines/MaterializationUtils.h"
@@ -53,7 +54,8 @@ class BaseABI {
   // Perform the function splitting according to the ABI.
   virtual void splitCoroutine(Function &F, coro::Shape &Shape,
                               SmallVectorImpl<Function *> &Clones,
-                              TargetTransformInfo &TTI) = 0;
+                              TargetTransformInfo &TTI,
+                              const DebugInfoCache *DICache) = 0;
 
   Function &F;
   coro::Shape &Shape;
@@ -73,7 +75,8 @@ class SwitchABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
                       SmallVectorImpl<Function *> &Clones,
-                      TargetTransformInfo &TTI) override;
+                      TargetTransformInfo &TTI,
+                      const DebugInfoCache *DICache) override;
 };
 
 class AsyncABI : public BaseABI {
@@ -86,7 +89,8 @@ class AsyncABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
                       SmallVectorImpl<Function *> &Clones,
-                      TargetTransformInfo &TTI) override;
+                      TargetTransformInfo &TTI,
+                      const DebugInfoCache *DICache) override;
 };
 
 class AnyRetconABI : public BaseABI {
@@ -99,7 +103,8 @@ class AnyRetconABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
                       SmallVectorImpl<Function *> &Clones,
-                      TargetTransformInfo &TTI) override;
+                      TargetTransformInfo &TTI,
+                      const DebugInfoCache *DICache) override;
 };
 
 } // end namespace coro
diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 3c8f2cbfaa9b81..9b256f9b4d6890 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -182,6 +182,29 @@ void CloneFunctionAttributesInto(Function *NewFunc, const Function *OldFunc,
                                  ValueMapTypeRemapper *TypeMapper = nullptr,
                                  ValueMaterializer *Materializer = nullptr);
 
+/// Clone OldFunc's metadata into NewFunc.
+///
+/// The caller is expected to populate \p VMap beforehand and set an appropriate
+/// \p RemapFlag.
+///
+/// NOTE: This function doesn't clone !llvm.dbg.cu when cloning into a different
+/// module. Use CloneFunctionInto for that behavior.
+void CloneFunctionMetadataInto(Function *NewFunc, const Function *OldFunc,
+                               ValueToValueMapTy &VMap, RemapFlags RemapFlag,
+                               ValueMapTypeRemapper *TypeMapper = nullptr,
+                               ValueMaterializer *Materializer = nullptr,
+                               const MetadataSetTy *IdentityMD = nullptr);
+
+/// Clone OldFunc's body into NewFunc.
+void CloneFunctionBodyInto(Function *NewFunc, const Function *OldFunc,
+                           ValueToValueMapTy &VMap, RemapFlags RemapFlag,
+                           SmallVectorImpl<ReturnInst *> &Returns,
+                           const char *NameSuffix = "",
+                           ClonedCodeInfo *CodeInfo = nullptr,
+                           ValueMapTypeRemapper *TypeMapper = nullptr,
+                           ValueMaterializer *Materializer = nullptr,
+                           const MetadataSetTy *IdentityMD = nullptr);
+
 void CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
                                const Instruction *StartingInst,
                                ValueToValueMapTy &VMap, bool ModuleLevelChanges,
@@ -202,7 +225,7 @@ void CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
 ///
 void CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
                                ValueToValueMapTy &VMap, bool ModuleLevelChanges,
-                               SmallVectorImpl<ReturnInst*> &Returns,
+                               SmallVectorImpl<ReturnInst *> &Returns,
                                const char *NameSuffix = "",
                                ClonedCodeInfo *CodeInfo = nullptr);
 
@@ -220,6 +243,13 @@ DISubprogram *CollectDebugInfoForCloning(const Function &F,
                                          CloneFunctionChangeType Changes,
                                          DebugInfoFinder &DIFinder);
 
+/// Based on \p Changes and \p DIFinder populate \p MD with debug info that
+/// needs to be identity mapped during Metadata cloning.
+void FindDebugInfoToIdentityMap(MetadataSetTy &MD,
+                                CloneFunctionChangeType Changes,
+                                DebugInfoFinder &DIFinder,
+                                DISubprogram *SPClonedWithinModule);
+
 /// This class captures the data input to the InlineFunction call, and records
 /// the auxiliary results produced by it.
 class InlineFunctionInfo {
@@ -341,32 +371,31 @@ void updateProfileCallee(
 /// Find the 'llvm.experimental.noalias.scope.decl' intrinsics in the specified
 /// basic blocks and extract their scope. These are candidates for duplication
 /// when cloning.
-void identifyNoAliasScopesToClone(
-    ArrayRef<BasicBlock *> BBs, SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
+void identifyNoAliasScopesToClone(ArrayRef<BasicBlock *> BBs,
+                                  SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
 
 /// Find the 'llvm.experimental.noalias.scope.decl' intrinsics in the specified
 /// instruction range and extract their scope. These are candidates for
 /// duplication when cloning.
-void identifyNoAliasScopesToClone(
-    BasicBlock::iterator Start, BasicBlock::iterator End,
-    SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
+void identifyNoAliasScopesToClone(BasicBlock::iterator Start,
+                                  BasicBlock::iterator End,
+                                  SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
 
 /// Duplicate the specified list of noalias decl scopes.
 /// The 'Ext' string is added as an extension to the name.
 /// Afterwards, the ClonedScopes contains the mapping of the original scope
 /// MDNode onto the cloned scope.
 /// Be aware that the cloned scopes are still part of the original scope domain.
-void cloneNoAliasScopes(
-    ArrayRef<MDNode *> NoAliasDeclScopes,
-    DenseMap<MDNode *, MDNode *> &ClonedScopes,
-    StringRef Ext, LLVMContext &Context);
+void cloneNoAliasScopes(ArrayRef<MDNode *> NoAliasDeclScopes,
+                        DenseMap<MDNode *, MDNode *> &ClonedScopes,
+                        StringRef Ext, LLVMContext &Context);
 
 /// Adapt the metadata for the specified instruction according to the
 /// provided mapping. This is normally used after cloning an instruction, when
 /// some noalias scopes needed to be cloned.
-void adaptNoAliasScopes(
-    llvm::Instruction *I, const DenseMap<MDNode *, MDNode *> &ClonedScopes,
-    LLVMContext &Context);
+void adaptNoAliasScopes(llvm::Instruction *I,
+                        const DenseMap<MDNode *, MDNode *> &ClonedScopes,
+                        LLVMContext &Context);
 
 /// Clone the specified noalias decl scopes. Then adapt all instructions in the
 /// NewBlocks basicblocks to the cloned versions.
diff --git a/llvm/include/llvm/Transforms/Utils/ValueMapper.h b/llvm/include/llvm/Transforms/Utils/ValueMapper.h
index 743cfeb7ef3f02..b8d612f11d519f 100644
--- a/llvm/include/llvm/Transforms/Utils/ValueMapper.h
+++ b/llvm/include/llvm/Transforms/Utils/ValueMapper.h
@@ -15,6 +15,7 @@
 #define LLVM_TRANSFORMS_UTILS_VALUEMAPPER_H
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/simple_ilist.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/IR/ValueMap.h"
@@ -35,6 +36,7 @@ class Value;
 
 using ValueToValueMapTy = ValueMap<const Value *, WeakTrackingVH>;
 using DbgRecordIterator = simple_ilist<DbgRecord>::iterator;
+using MetadataSetTy = SmallPtrSet<const Metadata *, 16>;
 
 /// This is a class that can be implemented by clients to remap types when
 /// cloning constants and instructions.
@@ -136,6 +138,18 @@ inline RemapFlags operator|(RemapFlags LHS, RemapFlags RHS) {
 /// alternate \a ValueToValueMapTy and \a ValueMaterializer and returns a ID to
 /// pass into the schedule*() functions.
 ///
+/// NOTE: \c IdentityMD is used by CloneFunction* to directly specify metadata
+/// that should be identity mapped (and hence not cloned). The metadata will be
+/// identity mapped in \c VM on first use. There are several reasons for doing
+/// it this way rather than eagerly identity mapping metadata nodes in \c VM:
+/// 1. Mapping metadata is not cheap, particularly because of tracking.
+/// 2. When cloning a Function we identity map lots of global module-level
+///    metadata to avoid cloning it, while only a fraction of it is actually
+///    used by the function. Mapping on first use is a lot faster for modules
+///    with meaningful amount of debug info.
+/// 3. Eagerly identity mapping metadata makes it harder to cache module-level
+///    data (e.g. a set of metadata nodes in a \a DICompileUnit).
+///
 /// TODO: lib/Linker really doesn't need the \a ValueHandle in the \a
 /// ValueToValueMapTy.  We should template \a ValueMapper (and its
 /// implementation classes), and explicitly instantiate on two concrete
@@ -152,7 +166,8 @@ class ValueMapper {
 public:
   ValueMapper(ValueToValueMapTy &VM, RemapFlags Flags = RF_None,
               ValueMapTypeRemapper *TypeMapper = nullptr,
-              ValueMaterializer *Materializer = nullptr);
+              ValueMaterializer *Materializer = nullptr,
+              const MetadataSetTy *IdentityMD = nullptr);
   ValueMapper(ValueMapper &&) = delete;
   ValueMapper(const ValueMapper &) = delete;
   ValueMapper &operator=(ValueMapper &&) = delete;
@@ -218,8 +233,10 @@ class ValueMapper {
 inline Value *MapValue(const Value *V, ValueToValueMapTy &VM,
                        RemapFlags Flags = RF_None,
                        ValueMapTypeRemapper *TypeMapper = nullptr,
-                       ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapValue(*V);
+                       ValueMaterializer *Materializer = nullptr,
+                       const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapValue(*V);
 }
 
 /// Lookup or compute a mapping for a piece of metadata.
@@ -231,7 +248,9 @@ inline Value *MapValue(const Value *V, ValueToValueMapTy &VM,
 ///     \c MD.
 ///  3. Else if \c MD is a \a ConstantAsMetadata, call \a MapValue() and
 ///     re-wrap its return (returning nullptr on nullptr).
-///  4. Else, \c MD is an \a MDNode.  These are remapped, along with their
+///  4. Else if \c MD is in \c IdentityMD then add an identity mapping for it
+///     and return it.
+///  5. Else, \c MD is an \a MDNode.  These are remapped, along with their
 ///     transitive operands.  Distinct nodes are duplicated or moved depending
 ///     on \a RF_MoveDistinctNodes.  Uniqued nodes are remapped like constants.
 ///
@@ -240,16 +259,20 @@ inline Value *MapValue(const Value *V, ValueToValueMapTy &VM,
 inline Metadata *MapMetadata(const Metadata *MD, ValueToValueMapTy &VM,
                              RemapFlags Flags = RF_None,
                              ValueMapTypeRemapper *TypeMapper = nullptr,
-                             ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapMetadata(*MD);
+                             ValueMaterializer *Materializer = nullptr,
+                             const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapMetadata(*MD);
 }
 
 /// Version of MapMetadata with type safety for MDNode.
 inline MDNode *MapMetadata(const MDNode *MD, ValueToValueMapTy &VM,
                            RemapFlags Flags = RF_None,
                            ValueMapTypeRemapper *TypeMapper = nullptr,
-                           ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapMDNode(*MD);
+                           ValueMaterializer *Materializer = nullptr,
+                           const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapMDNode(*MD);
 }
 
 /// Convert the instruction operands from referencing the current values into
@@ -263,8 +286,10 @@ inline MDNode *MapMetadata(const MDNode *MD, ValueToValueMapTy &VM,
 inline void RemapInstruction(Instruction *I, ValueToValueMapTy &VM,
                              RemapFlags Flags = RF_None,
                              ValueMapTypeRemapper *TypeMapper = nullptr,
-                             ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer).remapInstruction(*I);
+                             ValueMaterializer *Materializer = nullptr,
+                             const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .remapInstruction(*I);
 }
 
 /// Remap the Values used in the DbgRecord \a DR using the value map \a
@@ -272,8 +297,10 @@ inline void RemapInstruction(Instruction *I, ValueToValueMapTy &VM,
 inline void RemapDbgRecord(Module *M, DbgRecord *DR, ValueToValueMapTy &VM,
                            RemapFlags Flags = RF_None,
                            ValueMapTypeRemapper *TypeMapper = nullptr,
-                           ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer).remapDbgRecord(M, *DR);
+                           ValueMaterializer *Materializer = nullptr,
+                           const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .remapDbgRecord(M, *DR);
 }
 
 /// Remap the Values used in the DbgRecords \a Range using the value map \a
@@ -283,8 +310,9 @@ inline void RemapDbgRecordRange(Module *M,
                                 ValueToValueMapTy &VM,
                                 RemapFlags Flags = RF_None,
                                 ValueMapTypeRemapper *TypeMapper = nullptr,
-                                ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer)
+                                ValueMaterializer *Materializer = nullptr,
+                                const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
       .remapDbgRecordRange(M, Range);
 }
 
@@ -297,16 +325,19 @@ inline void RemapDbgRecordRange(Module *M,
 inline void RemapFunction(Function &F, ValueToValueMapTy &VM,
                           RemapFlags Flags = RF_None,
                           ValueMapTypeRemapper *TypeMapper = nullptr,
-                          ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer).remapFunction(F);
+                          ValueMaterializer *Materializer = nullptr,
+                          const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD).remapFunction(F);
 }
 
 /// Version of MapValue with type safety for Constant.
 inline Constant *MapValue(const Constant *V, ValueToValueMapTy &VM,
                           RemapFlags Flags = RF_None,
                           ValueMapTypeRemapper *TypeMapper = nullptr,
-                          ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapConstant(*V);
+                          ValueMaterializer *Materializer = nullptr,
+                          const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapConstant(*V);
 }
 
 } // end namespace llvm
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 948bc2435ab275..3ba085cdb0be8b 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/iterator_range.h"
+#include "llvm/Analysis/DebugInfoCache.h"
 #include "llvm/Analysis/LazyCallGraph.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/InstIterator.h"
@@ -139,6 +140,11 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   // Get the call graph for this module.
   LazyCallGraph &CG = AM.getResult<LazyCallGraphAnalysis>(M);
 
+  // Prime DebugInfoCache.
+  // TODO: Currently, the only user is CoroSplitPass. Consider running
+  // conditionally.
+  AM.getResult<DebugInfoCacheAnalysis>(M);
+
   // Get Function analysis manager from its proxy.
   FunctionAnalysisManager &FAM =
       AM.getCachedResult<FunctionAnalysisManagerModuleProxy>(M)->getManager();
@@ -350,6 +356,7 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   // analysis proxies by handling them above and in any nested pass managers.
   PA.preserveSet<AllAnalysesOn<LazyCallGraph::SCC>>();
   PA.preserve<LazyCallGraphAnalysis>();
+  PA.preserve<DebugI...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-llvm-ir

Author: Artem Pianykh (artempyanykh)

Changes

More details about this stack are in this topic on discourse.

TL;DR: In large modules with a bunch of coroutines the cost of debug info metadata processing can get very high (e.g. adding over a minute to the compile time). This stack does some refactoring around function cloning and caches some module-level calculations to speed up the pass.

Anecdata for a sample C++ source file:

Baseline / 0 IdentityMD set Prebuilt GlobalDI Cached CU DIFinder
CoroSplitPass 306ms 221ms 68ms 17ms
CoroCloner 101ms 72ms 63ms 0.5ms
CollectGlobalDI - - 63ms 13ms
Overall speed up 1x 1.4x 4.5x 18x

Each commit is individually buildable and reviewable. I can extract them into separate PRs if the overall direction makes sense.


Patch is 67.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109032.diff

27 Files Affected:

  • (added) llvm/include/llvm/Analysis/DebugInfoCache.h (+50)
  • (modified) llvm/include/llvm/IR/DebugInfo.h (+3-1)
  • (modified) llvm/include/llvm/Transforms/Coroutines/ABI.h (+9-4)
  • (modified) llvm/include/llvm/Transforms/Utils/Cloning.h (+42-13)
  • (modified) llvm/include/llvm/Transforms/Utils/ValueMapper.h (+49-18)
  • (modified) llvm/lib/Analysis/CGSCCPassManager.cpp (+7)
  • (modified) llvm/lib/Analysis/CMakeLists.txt (+1)
  • (added) llvm/lib/Analysis/DebugInfoCache.cpp (+47)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassRegistry.def (+1)
  • (modified) llvm/lib/Transforms/Coroutines/CoroCloner.h (+20-11)
  • (modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+82-15)
  • (modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+111-80)
  • (modified) llvm/lib/Transforms/Utils/ValueMapper.cpp (+15-4)
  • (modified) llvm/test/Other/new-pass-manager.ll (+1)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-lto-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-pgo-preinline.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
  • (modified) llvm/unittests/Analysis/CGSCCPassManagerTest.cpp (+3-1)
  • (modified) llvm/unittests/Analysis/CMakeLists.txt (+1)
  • (added) llvm/unittests/Analysis/DebugInfoCacheTest.cpp (+211)
diff --git a/llvm/include/llvm/Analysis/DebugInfoCache.h b/llvm/include/llvm/Analysis/DebugInfoCache.h
new file mode 100644
index 00000000000000..dbd6802c99ea01
--- /dev/null
+++ b/llvm/include/llvm/Analysis/DebugInfoCache.h
@@ -0,0 +1,50 @@
+//===- llvm/Analysis/DebugInfoCache.h - debug info cache --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains an analysis that builds a cache of debug info for each
+// DICompileUnit in a module.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_DEBUGINFOCACHE_H
+#define LLVM_ANALYSIS_DEBUGINFOCACHE_H
+
+#include "llvm/IR/DebugInfo.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+/// Processes and caches debug info for each DICompileUnit in a module.
+///
+/// The result of the analysis is a set of DebugInfoFinders primed on their
+/// respective DICompileUnit. Such DebugInfoFinders can be used to speed up
+/// function cloning which otherwise requires an expensive traversal of
+/// DICompileUnit-level debug info. See an example usage in CoroSplit.
+class DebugInfoCache {
+public:
+  using DIFinderCache = SmallDenseMap<const DICompileUnit *, DebugInfoFinder>;
+  DIFinderCache Result;
+
+  DebugInfoCache(const Module &M);
+
+  bool invalidate(Module &, const PreservedAnalyses &,
+                  ModuleAnalysisManager::Invalidator &);
+};
+
+class DebugInfoCacheAnalysis
+    : public AnalysisInfoMixin<DebugInfoCacheAnalysis> {
+  friend AnalysisInfoMixin<DebugInfoCacheAnalysis>;
+  static AnalysisKey Key;
+
+public:
+  using Result = DebugInfoCache;
+  Result run(Module &M, ModuleAnalysisManager &);
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/include/llvm/IR/DebugInfo.h b/llvm/include/llvm/IR/DebugInfo.h
index 73f45c3769be44..11907fbb7f20b3 100644
--- a/llvm/include/llvm/IR/DebugInfo.h
+++ b/llvm/include/llvm/IR/DebugInfo.h
@@ -120,11 +120,13 @@ class DebugInfoFinder {
   /// Process subprogram.
   void processSubprogram(DISubprogram *SP);
 
+  /// Process a compile unit.
+  void processCompileUnit(DICompileUnit *CU);
+
   /// Clear all lists.
   void reset();
 
 private:
-  void processCompileUnit(DICompileUnit *CU);
   void processScope(DIScope *Scope);
   void processType(DIType *DT);
   bool addCompileUnit(DICompileUnit *CU);
diff --git a/llvm/include/llvm/Transforms/Coroutines/ABI.h b/llvm/include/llvm/Transforms/Coroutines/ABI.h
index 0b2d405f3caec4..2cf614b6bb1e2a 100644
--- a/llvm/include/llvm/Transforms/Coroutines/ABI.h
+++ b/llvm/include/llvm/Transforms/Coroutines/ABI.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_TRANSFORMS_COROUTINES_ABI_H
 #define LLVM_TRANSFORMS_COROUTINES_ABI_H
 
+#include "llvm/Analysis/DebugInfoCache.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Transforms/Coroutines/CoroShape.h"
 #include "llvm/Transforms/Coroutines/MaterializationUtils.h"
@@ -53,7 +54,8 @@ class BaseABI {
   // Perform the function splitting according to the ABI.
   virtual void splitCoroutine(Function &F, coro::Shape &Shape,
                               SmallVectorImpl<Function *> &Clones,
-                              TargetTransformInfo &TTI) = 0;
+                              TargetTransformInfo &TTI,
+                              const DebugInfoCache *DICache) = 0;
 
   Function &F;
   coro::Shape &Shape;
@@ -73,7 +75,8 @@ class SwitchABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
                       SmallVectorImpl<Function *> &Clones,
-                      TargetTransformInfo &TTI) override;
+                      TargetTransformInfo &TTI,
+                      const DebugInfoCache *DICache) override;
 };
 
 class AsyncABI : public BaseABI {
@@ -86,7 +89,8 @@ class AsyncABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
                       SmallVectorImpl<Function *> &Clones,
-                      TargetTransformInfo &TTI) override;
+                      TargetTransformInfo &TTI,
+                      const DebugInfoCache *DICache) override;
 };
 
 class AnyRetconABI : public BaseABI {
@@ -99,7 +103,8 @@ class AnyRetconABI : public BaseABI {
 
   void splitCoroutine(Function &F, coro::Shape &Shape,
                       SmallVectorImpl<Function *> &Clones,
-                      TargetTransformInfo &TTI) override;
+                      TargetTransformInfo &TTI,
+                      const DebugInfoCache *DICache) override;
 };
 
 } // end namespace coro
diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 3c8f2cbfaa9b81..9b256f9b4d6890 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -182,6 +182,29 @@ void CloneFunctionAttributesInto(Function *NewFunc, const Function *OldFunc,
                                  ValueMapTypeRemapper *TypeMapper = nullptr,
                                  ValueMaterializer *Materializer = nullptr);
 
+/// Clone OldFunc's metadata into NewFunc.
+///
+/// The caller is expected to populate \p VMap beforehand and set an appropriate
+/// \p RemapFlag.
+///
+/// NOTE: This function doesn't clone !llvm.dbg.cu when cloning into a different
+/// module. Use CloneFunctionInto for that behavior.
+void CloneFunctionMetadataInto(Function *NewFunc, const Function *OldFunc,
+                               ValueToValueMapTy &VMap, RemapFlags RemapFlag,
+                               ValueMapTypeRemapper *TypeMapper = nullptr,
+                               ValueMaterializer *Materializer = nullptr,
+                               const MetadataSetTy *IdentityMD = nullptr);
+
+/// Clone OldFunc's body into NewFunc.
+void CloneFunctionBodyInto(Function *NewFunc, const Function *OldFunc,
+                           ValueToValueMapTy &VMap, RemapFlags RemapFlag,
+                           SmallVectorImpl<ReturnInst *> &Returns,
+                           const char *NameSuffix = "",
+                           ClonedCodeInfo *CodeInfo = nullptr,
+                           ValueMapTypeRemapper *TypeMapper = nullptr,
+                           ValueMaterializer *Materializer = nullptr,
+                           const MetadataSetTy *IdentityMD = nullptr);
+
 void CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
                                const Instruction *StartingInst,
                                ValueToValueMapTy &VMap, bool ModuleLevelChanges,
@@ -202,7 +225,7 @@ void CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
 ///
 void CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
                                ValueToValueMapTy &VMap, bool ModuleLevelChanges,
-                               SmallVectorImpl<ReturnInst*> &Returns,
+                               SmallVectorImpl<ReturnInst *> &Returns,
                                const char *NameSuffix = "",
                                ClonedCodeInfo *CodeInfo = nullptr);
 
@@ -220,6 +243,13 @@ DISubprogram *CollectDebugInfoForCloning(const Function &F,
                                          CloneFunctionChangeType Changes,
                                          DebugInfoFinder &DIFinder);
 
+/// Based on \p Changes and \p DIFinder populate \p MD with debug info that
+/// needs to be identity mapped during Metadata cloning.
+void FindDebugInfoToIdentityMap(MetadataSetTy &MD,
+                                CloneFunctionChangeType Changes,
+                                DebugInfoFinder &DIFinder,
+                                DISubprogram *SPClonedWithinModule);
+
 /// This class captures the data input to the InlineFunction call, and records
 /// the auxiliary results produced by it.
 class InlineFunctionInfo {
@@ -341,32 +371,31 @@ void updateProfileCallee(
 /// Find the 'llvm.experimental.noalias.scope.decl' intrinsics in the specified
 /// basic blocks and extract their scope. These are candidates for duplication
 /// when cloning.
-void identifyNoAliasScopesToClone(
-    ArrayRef<BasicBlock *> BBs, SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
+void identifyNoAliasScopesToClone(ArrayRef<BasicBlock *> BBs,
+                                  SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
 
 /// Find the 'llvm.experimental.noalias.scope.decl' intrinsics in the specified
 /// instruction range and extract their scope. These are candidates for
 /// duplication when cloning.
-void identifyNoAliasScopesToClone(
-    BasicBlock::iterator Start, BasicBlock::iterator End,
-    SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
+void identifyNoAliasScopesToClone(BasicBlock::iterator Start,
+                                  BasicBlock::iterator End,
+                                  SmallVectorImpl<MDNode *> &NoAliasDeclScopes);
 
 /// Duplicate the specified list of noalias decl scopes.
 /// The 'Ext' string is added as an extension to the name.
 /// Afterwards, the ClonedScopes contains the mapping of the original scope
 /// MDNode onto the cloned scope.
 /// Be aware that the cloned scopes are still part of the original scope domain.
-void cloneNoAliasScopes(
-    ArrayRef<MDNode *> NoAliasDeclScopes,
-    DenseMap<MDNode *, MDNode *> &ClonedScopes,
-    StringRef Ext, LLVMContext &Context);
+void cloneNoAliasScopes(ArrayRef<MDNode *> NoAliasDeclScopes,
+                        DenseMap<MDNode *, MDNode *> &ClonedScopes,
+                        StringRef Ext, LLVMContext &Context);
 
 /// Adapt the metadata for the specified instruction according to the
 /// provided mapping. This is normally used after cloning an instruction, when
 /// some noalias scopes needed to be cloned.
-void adaptNoAliasScopes(
-    llvm::Instruction *I, const DenseMap<MDNode *, MDNode *> &ClonedScopes,
-    LLVMContext &Context);
+void adaptNoAliasScopes(llvm::Instruction *I,
+                        const DenseMap<MDNode *, MDNode *> &ClonedScopes,
+                        LLVMContext &Context);
 
 /// Clone the specified noalias decl scopes. Then adapt all instructions in the
 /// NewBlocks basicblocks to the cloned versions.
diff --git a/llvm/include/llvm/Transforms/Utils/ValueMapper.h b/llvm/include/llvm/Transforms/Utils/ValueMapper.h
index 743cfeb7ef3f02..b8d612f11d519f 100644
--- a/llvm/include/llvm/Transforms/Utils/ValueMapper.h
+++ b/llvm/include/llvm/Transforms/Utils/ValueMapper.h
@@ -15,6 +15,7 @@
 #define LLVM_TRANSFORMS_UTILS_VALUEMAPPER_H
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/simple_ilist.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/IR/ValueMap.h"
@@ -35,6 +36,7 @@ class Value;
 
 using ValueToValueMapTy = ValueMap<const Value *, WeakTrackingVH>;
 using DbgRecordIterator = simple_ilist<DbgRecord>::iterator;
+using MetadataSetTy = SmallPtrSet<const Metadata *, 16>;
 
 /// This is a class that can be implemented by clients to remap types when
 /// cloning constants and instructions.
@@ -136,6 +138,18 @@ inline RemapFlags operator|(RemapFlags LHS, RemapFlags RHS) {
 /// alternate \a ValueToValueMapTy and \a ValueMaterializer and returns a ID to
 /// pass into the schedule*() functions.
 ///
+/// NOTE: \c IdentityMD is used by CloneFunction* to directly specify metadata
+/// that should be identity mapped (and hence not cloned). The metadata will be
+/// identity mapped in \c VM on first use. There are several reasons for doing
+/// it this way rather than eagerly identity mapping metadata nodes in \c VM:
+/// 1. Mapping metadata is not cheap, particularly because of tracking.
+/// 2. When cloning a Function we identity map lots of global module-level
+///    metadata to avoid cloning it, while only a fraction of it is actually
+///    used by the function. Mapping on first use is a lot faster for modules
+///    with meaningful amount of debug info.
+/// 3. Eagerly identity mapping metadata makes it harder to cache module-level
+///    data (e.g. a set of metadata nodes in a \a DICompileUnit).
+///
 /// TODO: lib/Linker really doesn't need the \a ValueHandle in the \a
 /// ValueToValueMapTy.  We should template \a ValueMapper (and its
 /// implementation classes), and explicitly instantiate on two concrete
@@ -152,7 +166,8 @@ class ValueMapper {
 public:
   ValueMapper(ValueToValueMapTy &VM, RemapFlags Flags = RF_None,
               ValueMapTypeRemapper *TypeMapper = nullptr,
-              ValueMaterializer *Materializer = nullptr);
+              ValueMaterializer *Materializer = nullptr,
+              const MetadataSetTy *IdentityMD = nullptr);
   ValueMapper(ValueMapper &&) = delete;
   ValueMapper(const ValueMapper &) = delete;
   ValueMapper &operator=(ValueMapper &&) = delete;
@@ -218,8 +233,10 @@ class ValueMapper {
 inline Value *MapValue(const Value *V, ValueToValueMapTy &VM,
                        RemapFlags Flags = RF_None,
                        ValueMapTypeRemapper *TypeMapper = nullptr,
-                       ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapValue(*V);
+                       ValueMaterializer *Materializer = nullptr,
+                       const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapValue(*V);
 }
 
 /// Lookup or compute a mapping for a piece of metadata.
@@ -231,7 +248,9 @@ inline Value *MapValue(const Value *V, ValueToValueMapTy &VM,
 ///     \c MD.
 ///  3. Else if \c MD is a \a ConstantAsMetadata, call \a MapValue() and
 ///     re-wrap its return (returning nullptr on nullptr).
-///  4. Else, \c MD is an \a MDNode.  These are remapped, along with their
+///  4. Else if \c MD is in \c IdentityMD then add an identity mapping for it
+///     and return it.
+///  5. Else, \c MD is an \a MDNode.  These are remapped, along with their
 ///     transitive operands.  Distinct nodes are duplicated or moved depending
 ///     on \a RF_MoveDistinctNodes.  Uniqued nodes are remapped like constants.
 ///
@@ -240,16 +259,20 @@ inline Value *MapValue(const Value *V, ValueToValueMapTy &VM,
 inline Metadata *MapMetadata(const Metadata *MD, ValueToValueMapTy &VM,
                              RemapFlags Flags = RF_None,
                              ValueMapTypeRemapper *TypeMapper = nullptr,
-                             ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapMetadata(*MD);
+                             ValueMaterializer *Materializer = nullptr,
+                             const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapMetadata(*MD);
 }
 
 /// Version of MapMetadata with type safety for MDNode.
 inline MDNode *MapMetadata(const MDNode *MD, ValueToValueMapTy &VM,
                            RemapFlags Flags = RF_None,
                            ValueMapTypeRemapper *TypeMapper = nullptr,
-                           ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapMDNode(*MD);
+                           ValueMaterializer *Materializer = nullptr,
+                           const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapMDNode(*MD);
 }
 
 /// Convert the instruction operands from referencing the current values into
@@ -263,8 +286,10 @@ inline MDNode *MapMetadata(const MDNode *MD, ValueToValueMapTy &VM,
 inline void RemapInstruction(Instruction *I, ValueToValueMapTy &VM,
                              RemapFlags Flags = RF_None,
                              ValueMapTypeRemapper *TypeMapper = nullptr,
-                             ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer).remapInstruction(*I);
+                             ValueMaterializer *Materializer = nullptr,
+                             const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .remapInstruction(*I);
 }
 
 /// Remap the Values used in the DbgRecord \a DR using the value map \a
@@ -272,8 +297,10 @@ inline void RemapInstruction(Instruction *I, ValueToValueMapTy &VM,
 inline void RemapDbgRecord(Module *M, DbgRecord *DR, ValueToValueMapTy &VM,
                            RemapFlags Flags = RF_None,
                            ValueMapTypeRemapper *TypeMapper = nullptr,
-                           ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer).remapDbgRecord(M, *DR);
+                           ValueMaterializer *Materializer = nullptr,
+                           const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .remapDbgRecord(M, *DR);
 }
 
 /// Remap the Values used in the DbgRecords \a Range using the value map \a
@@ -283,8 +310,9 @@ inline void RemapDbgRecordRange(Module *M,
                                 ValueToValueMapTy &VM,
                                 RemapFlags Flags = RF_None,
                                 ValueMapTypeRemapper *TypeMapper = nullptr,
-                                ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer)
+                                ValueMaterializer *Materializer = nullptr,
+                                const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
       .remapDbgRecordRange(M, Range);
 }
 
@@ -297,16 +325,19 @@ inline void RemapDbgRecordRange(Module *M,
 inline void RemapFunction(Function &F, ValueToValueMapTy &VM,
                           RemapFlags Flags = RF_None,
                           ValueMapTypeRemapper *TypeMapper = nullptr,
-                          ValueMaterializer *Materializer = nullptr) {
-  ValueMapper(VM, Flags, TypeMapper, Materializer).remapFunction(F);
+                          ValueMaterializer *Materializer = nullptr,
+                          const MetadataSetTy *IdentityMD = nullptr) {
+  ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD).remapFunction(F);
 }
 
 /// Version of MapValue with type safety for Constant.
 inline Constant *MapValue(const Constant *V, ValueToValueMapTy &VM,
                           RemapFlags Flags = RF_None,
                           ValueMapTypeRemapper *TypeMapper = nullptr,
-                          ValueMaterializer *Materializer = nullptr) {
-  return ValueMapper(VM, Flags, TypeMapper, Materializer).mapConstant(*V);
+                          ValueMaterializer *Materializer = nullptr,
+                          const MetadataSetTy *IdentityMD = nullptr) {
+  return ValueMapper(VM, Flags, TypeMapper, Materializer, IdentityMD)
+      .mapConstant(*V);
 }
 
 } // end namespace llvm
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 948bc2435ab275..3ba085cdb0be8b 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/iterator_range.h"
+#include "llvm/Analysis/DebugInfoCache.h"
 #include "llvm/Analysis/LazyCallGraph.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/InstIterator.h"
@@ -139,6 +140,11 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   // Get the call graph for this module.
   LazyCallGraph &CG = AM.getResult<LazyCallGraphAnalysis>(M);
 
+  // Prime DebugInfoCache.
+  // TODO: Currently, the only user is CoroSplitPass. Consider running
+  // conditionally.
+  AM.getResult<DebugInfoCacheAnalysis>(M);
+
   // Get Function analysis manager from its proxy.
   FunctionAnalysisManager &FAM =
       AM.getCachedResult<FunctionAnalysisManagerModuleProxy>(M)->getManager();
@@ -350,6 +356,7 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   // analysis proxies by handling them above and in any nested pass managers.
   PA.preserveSet<AllAnalysesOn<LazyCallGraph::SCC>>();
   PA.preserve<LazyCallGraphAnalysis>();
+  PA.preserve<DebugI...
[truncated]

@artempyanykh
Copy link
Contributor Author

Rebase with latest changes.

@artempyanykh
Copy link
Contributor Author

artempyanykh commented Apr 21, 2025

This has now been fully merged. I updated the description with references to relevant PRs. The final implementation turned out faster than the initial one: in terms of the CoroSplit pass runtime on a sample cpp file with -g2: 306ms -> 17ms @ 18x initially vs 306ms -> 3.8ms @ 80x now which is almost as fast as -g1 .

Thanks @felipepiovezan, @TylerNowicki, and @adrian-prantl for reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants