Skip to content

#[inline] causes duplicated symbols in the final binary #105771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EFanZh opened this issue Dec 16, 2022 · 6 comments
Closed

#[inline] causes duplicated symbols in the final binary #105771

EFanZh opened this issue Dec 16, 2022 · 6 comments
Labels
C-bug Category: This is a bug.

Comments

@EFanZh
Copy link
Contributor

EFanZh commented Dec 16, 2022

I noticed that if a function is marked #[inline] in an upstream crate, then even if upstream have already generated a function instance, downstream creates will not reuse the existing instance, instead, they will generate their own instance, which causes duplicated symbols generated in the binary file, which causes size bloat.

Note that this happens in the situation where a function is marked #[inline], but not actually being inlined, so the function will have a symbol entry of its own.

To reproduce, you can make a project with the following dependency graph:

    +----------+
    | upstream |
    +----------+
      /  |  \
     /   |   \
+------+ | +-------+
| left | | | right |
+------+ | +-------+
     \   |   /
      \  |  /
   +------------+
   | downstream |
   +------------+

where a fairly complex (to prevent being inlined) inline function is defined in upstream crate, and all upstream, left and right crates calls the inline function. downstream crate is for collecting all symbols generated in all the crates. I wrote a shell script for generating a such project locally (Usage: <SCRIPT> <PROJECT PATH>):

#!/bin/sh -ex

mkdir -p "$1"
cd "$1"

# Upstream.

cargo new --lib upstream
echo '#[inline]
pub fn inline_function(x: u32) {
    if x != 42 {
        if x % 2 == 0 {
            inline_function(x / 2)
        } else {
            inline_function(x * 3 + 1)
        }
    }

    std::hint::black_box(x);
}

pub fn instance(n: u32) {
    inline_function(n)
}' > 'upstream/src/lib.rs'


# Left and right.

for x in left right; do
    cargo new --lib "$x"
    cargo add --manifest-path "$x/Cargo.toml" --path upstream

    echo 'pub fn instance(x: u32) {
    upstream::inline_function(x)
}' > "$x/src/lib.rs"
done

# Downstream.

cargo init
cargo add --path 'upstream'
cargo add --path 'left'
cargo add --path 'right'

echo '#[no_mangle]
extern "C" fn entry(x: u32) {
    upstream::instance(x);
    left::instance(x);
    right::instance(x);
}' > "src/lib.rs"

# Build.

cargo rustc --lib --crate-type cdylib --release

You can inspect the result binary using llvm-nm and llvm-objdump. In my case, llvm-nm gives me the following output:

0000000000003ee0 t __ZN3top15inline_function17hcd434eca693902bbE
0000000000003f20 t __ZN3top15inline_function17hcd434eca693902bbE
0000000000003f60 t __ZN3top15inline_function17hcd434eca693902bbE
0000000000003f90 t __ZN3top8instance17hc175d5b93e3a6803E
0000000000003f50 t __ZN4left8instance17h5cc7aa4e08943d83E
0000000000003f10 t __ZN5right8instance17hcefb8e0167b5830aE
0000000000003eb0 T _entry

Note the three duplicated __ZN3top15inline_function17hcd434eca693902bbE symbol. And using llvm-objdump, I got three pieces of duplicated assembly code:

...
0000000000003ee0 <__ZN3top15inline_function17hcd434eca693902bbE>:
    3ee0: 55                           	pushq	%rbp
    3ee1: 48 89 e5                     	movq	%rsp, %rbp
    3ee4: 53                           	pushq	%rbx
    3ee5: 50                           	pushq	%rax
    3ee6: 89 fb                        	movl	%edi, %ebx
    3ee8: 83 ff 2a                     	cmpl	$42, %edi
    3eeb: 74 13                        	je	0x3f00 <__ZN3top15inline_function17hcd434eca693902bbE+0x20>
...
0000000000003f20 <__ZN3top15inline_function17hcd434eca693902bbE>:
    3f20: 55                           	pushq	%rbp
    3f21: 48 89 e5                     	movq	%rsp, %rbp
    3f24: 53                           	pushq	%rbx
    3f25: 50                           	pushq	%rax
    3f26: 89 fb                        	movl	%edi, %ebx
    3f28: 83 ff 2a                     	cmpl	$42, %edi
    3f2b: 74 13                        	je	0x3f40 <__ZN3top15inline_function17hcd434eca693902bbE+0x20>
...
0000000000003f60 <__ZN3top15inline_function17hcd434eca693902bbE>:
    3f60: 55                           	pushq	%rbp
    3f61: 48 89 e5                     	movq	%rsp, %rbp
    3f64: 53                           	pushq	%rbx
    3f65: 50                           	pushq	%rax
    3f66: 89 fb                        	movl	%edi, %ebx
    3f68: 83 ff 2a                     	cmpl	$42, %edi
    3f6b: 74 13                        	je	0x3f80 <__ZN3top15inline_function17hcd434eca693902bbE+0x20>
...

And If I remove the #[inline] attribute in the upstream crate, there will be no duplicated symbols.

"fat" LTO seems to be able to merge the duplicated symbols, but not all project can enable this option, so is it possible to fix this problem even if "fat" LTO is not used? Also, codegen-units=1 does not seem to help.

@EFanZh EFanZh added the C-bug Category: This is a bug. label Dec 16, 2022
@bjorn3
Copy link
Member

bjorn3 commented Dec 16, 2022

This is expected. #[inline] functions are codegened as local functions in each codegen unit, just like generic functions. Not doing this for #[inline] would require every #[inline] function to be codegened once as regular function and then in every other crate use it with the available_externally linkage. This is however only supported by LLVM and it comes at an optimization penalty as it is no longer possible to specialize the function when arguments are fixed. Fixing it for generics is not possible without linker assistance. The linker assistance in question is "identical code folding" and can be enabled for lld with --icf=safe.

@EFanZh
Copy link
Contributor Author

EFanZh commented Dec 16, 2022

Thanks, I will try the identical code folding option. But I am still wondering whether a hybrid approach can be adopted, maybe something like:

  • Upstream still generates inline functions as regular functions.
  • If downstream crates prefer inlining or specialization, do it。
  • If downstream crates decide no inlining or specialization is necessary, use the function from upstream.

Or more aggressively, ignore specialization entirely:

  • Upstream still generates inline functions as regular functions.
  • If downstream crates prefer inlining, do it。
  • If downstream crates decide no inlining is necessary, use the function from upstream.

@bjorn3
Copy link
Member

bjorn3 commented Dec 16, 2022

That is literally what available_externally does. It provides the source of the function to LLVM, but at the same time tells it that if it isn't inlined, it can import the function instead rather than codegen it locally. available_externally is necessary as we don't know if LLVM will inline the function or not and LLVM can't go back and ask for the LLVM ir of an arbitrary function after the fact.

@EFanZh
Copy link
Contributor Author

EFanZh commented Dec 16, 2022

Is there an option to enable this behavior? There are situations where I am willing to trade runtime performance for smaller binary size.

@bjorn3
Copy link
Member

bjorn3 commented Dec 16, 2022

There isn't at this moment. I opened #89154 last year to suggest adding this as option, but haven't received any feedback yet.

@Enselic
Copy link
Member

Enselic commented Jul 13, 2024

Triage: Thanks for reporting. Let's close as duplicate of / handled by #89154 .

@Enselic Enselic closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

5 participants