Skip to content

Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" #95482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 20, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Jun 13, 2024

Make FunctionsToImportTy an unordered_map rather than DenseMap. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of #92718

  • DenseMap allocates space for a large number of key/value pairs and wastes space when the number of elements are small.
    • While init bucket size is zero, it quickly allocates buckets for 64 elements when the number of elements is small (for example, 3 or 4 elements). programmer manual also mentions it could waste space.
  • Experiments show FunctionsToImportTy.size() is smaller than 4 for multiple binaries with high indexing ram usage. unordered_map grows factor is at most 2 in llvm libc for insert operations.

With this change, ComputeCrossModuleImport ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass.

Original commit message
The goal is to populate declaration import status if a new flag -import-declaration is on.

  • For in-process ThinLTO, the declaration status is visible to backend function-import pass, so FunctionImporter::importFunctions should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle declaration summaries. Two use cases (better call-graph sort or cross-module auto-init) would use this bit differently.

  • For distributed ThinLTO, the declaration status is not serialized to bitcode. As discussed, [ThinLTO][Bitcode] Generate import type in bitcode #87600 will do this.

…eSummary::GVFlags

- This change doesn't set the bit in the summaries, just make the code changes.
Update this one accordingly
1. Use unordered_map for 'FunctionsToImportTy'.
2. Use reference not value when iterating containers.
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review June 20, 2024 02:36
@mingmingl-llvm mingmingl-llvm requested review from teresajohnson and removed request for teresajohnson June 20, 2024 02:36
Copy link
Contributor

@jvoung jvoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mingmingl-llvm
Copy link
Contributor Author

thanks!

@mingmingl-llvm mingmingl-llvm merged commit 8d9db94 into main Jun 20, 2024
9 checks passed
@mingmingl-llvm mingmingl-llvm deleted the users/minglotus-6/spr/summary2 branch June 20, 2024 17:50
mingmingl-llvm added a commit that referenced this pull request Jul 3, 2024
…97360)

#95482 is a reland of
#88024.
#95482 keeps indexing memory
usage reasonable by using unordered_map and doesn't make other changes
to originally reviewed code.

While discussing possible ways to minimize indexing memory usage, Teresa
asked whether I need `ExportSetTy` as a map or a set is sufficient. This
PR implements the idea. It uses a set rather than a map to track exposed
ValueInfos.

Currently, `ExportLists` has two use cases, and neither needs to track a
ValueInfo's import/export status. So using a set is sufficient and
correct.
1) In both in-process and distributed ThinLTO, it's used to decide if a
function or global variable is visible [1] from another module after importing
creates additional cross-module references.
     * If a cross-module call edge is seen today, the callee must be visible
       to another module without keeping track of its export status already.
       For instance, this [2] is how callees of direct calls get exported.
2) For in-process ThinLTO [3], it's used to compute lto cache key.
     * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is
        determined by 'ImportList'. So it's fine to not track 'import type' for export list.

[1] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1815-L1819
[2] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1783-L1794
[3] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1494-L1496
[4] https://github.com/llvm/llvm-project/blob/b76100e220591fab2bf0a4917b216439f7aa4b09/llvm/lib/LTO/LTO.cpp#L194-L222
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…lvm#97360)

llvm#95482 is a reland of
llvm#88024.
llvm#95482 keeps indexing memory
usage reasonable by using unordered_map and doesn't make other changes
to originally reviewed code.

While discussing possible ways to minimize indexing memory usage, Teresa
asked whether I need `ExportSetTy` as a map or a set is sufficient. This
PR implements the idea. It uses a set rather than a map to track exposed
ValueInfos.

Currently, `ExportLists` has two use cases, and neither needs to track a
ValueInfo's import/export status. So using a set is sufficient and
correct.
1) In both in-process and distributed ThinLTO, it's used to decide if a
function or global variable is visible [1] from another module after importing
creates additional cross-module references.
     * If a cross-module call edge is seen today, the callee must be visible
       to another module without keeping track of its export status already.
       For instance, this [2] is how callees of direct calls get exported.
2) For in-process ThinLTO [3], it's used to compute lto cache key.
     * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is
        determined by 'ImportList'. So it's fine to not track 'import type' for export list.

[1] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1815-L1819
[2] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1783-L1794
[3] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1494-L1496
[4] https://github.com/llvm/llvm-project/blob/b76100e220591fab2bf0a4917b216439f7aa4b09/llvm/lib/LTO/LTO.cpp#L194-L222
mingmingl-llvm added a commit that referenced this pull request Jul 9, 2024
#87600 was reverted in order to
revert
6262763.
Now #95482 is fix forward for
6262763.
This patch is a reland for
#87600

**Changes on top of original patch**
In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of
`GVSummaryPtrSet` an `unordered_set` which is more memory efficient when
the number of elements is smaller than 128 [1]

**Original commit message**

For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.

[1]
https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ibuted ThinLTO under a default-off new option" (llvm#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of llvm#92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](llvm#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, llvm#87600
will do this.
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
llvm#87600 was reverted in order to
revert
llvm@6262763.
Now llvm#95482 is fix forward for
llvm@6262763.
This patch is a reland for
llvm#87600

**Changes on top of original patch**
In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of
`GVSummaryPtrSet` an `unordered_set` which is more memory efficient when
the number of elements is smaller than 128 [1]

**Original commit message**

For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.

[1]
https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants