Skip to content

Fix bugs in and improve name mangling #8875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

alexcrichton
Copy link
Member

These commits fix bugs related to identically named statics in functions of implementations in various situations. The commit messages have most of the information about what bugs are being fixed and why.

As a bonus, while I was messing around with name mangling, I improved the backtraces we'll get in gdb by removing __extensions__ for the trait/type being implemented and by adding the method name as well. Yay!

@alexcrichton
Copy link
Member Author

The test I added will not compile with today's rustc for a large number of reasons, and this pull request depends on #8843

@alexcrichton
Copy link
Member Author

Hmm... the "improved" name mangling aspect of this isn't working as I thought, I'm currently investigating...

Regardless though the bug fixing changes are good to be looked at.

@huonw
Copy link
Member

huonw commented Aug 30, 2013

Does this fix #6602?

@alexcrichton
Copy link
Member Author

Aha, this should be good to go now. Here's a good before/after story for this code:

#[no_std];

pub fn foo() -> int {
    static foo_constant: int = 1;
    return foo_constant;
}

struct A;

trait B {
    fn bar(&self) -> int;
}

impl A {
    fn foo(&self) -> int {
        static foo_constant: int = 1;
        return foo_constant + foo();
    }
}

impl B for A {
    fn bar(&self) -> int {
        static bar_constant: int = 1;
        return bar_constant + self.foo();
    }
}

struct C<T>;

impl<T> C<T> {
    fn baz(&self) -> int {
        static baz_constant: int = 1;
        return baz_constant;
    }
}

#[start]
fn main(_: int, _: **u8, _: *u8) -> int {
    let a = A;
    a.bar();
    let c = C::<()>;
    c.baz();
    3
}

Before:

$ ./x86_64-apple-darwin/stage0/bin/rustc foo.rs -o foo && gobjdump -t foo | c++filt
warning: no debug symbols in executable (-arch x86_64)
BFD: foo: unknown load command 0x2a
BFD: foo: unknown load command 0x28
BFD: foo: unknown load command 0x29
BFD: foo: unknown load command 0x2b

foo:     file format mach-unsigned __int128-x86-64

SYMBOL TABLE:
0000000100000da0 long       0e SECT   01 0000 [.text] __extensions__::baz_216::_b12014513859d86::_0$x2e0
0000000100000de6 __float128       e SECT   01 0000 [.text] ___morestack
0000000100000e90 long       0e SECT   04 0000 [.const] foo::foo_constant::_15ddc065e7e4c082::_0$x2e0
0000000100000e98 long       0e SECT   04 0000 [.const] __extensions__::foo_constant::_15ddc065e7e4c082::_0$x2e0
0000000100000ea8 long       0e SECT   04 0000 [.const] __extensions__::baz_constant::_15ddc065e7e4c082::_0$x2e0
0000000100001040 long       0e SECT   09 0000 [.bss] __rust_mod_map
0000000100000ea0 __float128       0f SECT   04 0000 [.const] __extensions__::bar_constant::_15ddc065e7e4c082::_0$x2e0
0000000100000c10 __float128       0f SECT   01 0000 [.text] __extensions__::meth_211::foo::_b12014513859d86::_0$x2e0
0000000100000c70 __float128       0f SECT   01 0000 [.text] __extensions__::meth_213::bar::_b12014513859d86::_0$x2e0
0000000100000bd0 __float128       0f SECT   01 0000 [.text] foo::_b12014513859d86::_0$x2e0
0000000100000cd0 __float128       0f SECT   01 0000 [.text] main::_2de6c7ff3a55374a::_0$x2e0
0000000100000000 __float128       0f SECT   01 0010 [.text] __mh_execute_header
0000000100001020 __float128       0f SECT   08 0000 [.data] __rust_crate_map_toplevel
0000000100000d40 __float128       0f SECT   01 0000 [.text] _main
0000000100000eb0 __float128       0f SECT   04 0000 [.const] _rust_abi_version
0000000000000000 __float128       01 UND    00 0100 _upcall_del_stack
0000000000000000 __float128       01 UND    00 0100 _upcall_new_stack
0000000000000000 __float128       01 UND    00 0200 dyld_stub_binder

After:

$ ./x86_64-apple-darwin/stage1/bin/rustc foo.rs -o foo && gobjdump -t foo | c++filt
warning: no debug symbols in executable (-arch x86_64)
BFD: foo2: unknown load command 0x2a
BFD: foo2: unknown load command 0x28
BFD: foo2: unknown load command 0x29
BFD: foo2: unknown load command 0x2b

foo2:     file format mach-unsigned __int128-x86-64

SYMBOL TABLE:
0000000100000dd0 long       0e SECT   01 0000 [.text] C::baz_244::_82a6c98f66d35026_0$x2e0
0000000100000e12 __float128       e SECT   01 0000 [.text] ___morestack
0000000100000eb8 long       0e SECT   04 0000 [.const] foo::foo_constant::_3b576b6bfb7380a0_0$x2e0
0000000100000ec0 long       0e SECT   04 0000 [.const] A::foo::foo_constant::_3b576b6bfb7380a0_0$x2e0RY
0000000100000ed0 long       0e SECT   04 0000 [.const] C::baz::baz_constant::_3b576b6bfb7380a0_0$x2e0Op
0000000100001040 long       0e SECT   09 0000 [.bss] __rust_mod_map
0000000100000c30 __float128       0f SECT   01 0000 [.text] A::foo::_82a6c98f66d35026_0$x2e0a4
0000000100000ec8 __float128       0f SECT   04 0000 [.const] B::bar::bar_constant::_3b576b6bfb7380a0_0$x2e0FN
0000000100000c90 __float128       0f SECT   01 0000 [.text] B::bar::_82a6c98f66d35026_0$x2e0a5
0000000100000be0 __float128       0f SECT   01 0000 [.text] foo::_82a6c98f66d35026_0$x2e0
0000000100000cf0 __float128       0f SECT   01 0000 [.text] main::_8bb46282fac6e18_0$x2e0
0000000100000000 __float128       0f SECT   01 0010 [.text] __mh_execute_header
0000000100001020 __float128       0f SECT   08 0000 [.data] __rust_crate_map_toplevel
0000000100000d70 __float128       0f SECT   01 0000 [.text] _main
0000000100000ed8 __float128       0f SECT   04 0000 [.const] _rust_abi_version
0000000000000000 __float128       01 UND    00 0100 _upcall_del_stack
0000000000000000 __float128       01 UND    00 0100 _upcall_new_stack
0000000000000000 __float128       01 UND    00 0200 dyld_stub_binder

@jdm
Copy link
Contributor

jdm commented Aug 31, 2013

That is wonderful! ⛵

@huonw
Copy link
Member

huonw commented Aug 31, 2013

What happens if C::<int> and C::<i8> are both included? i.e. do different monomorphisations create different symbol names (if this is the correct thing to do?).

(I could imagine this could be important if one had static baz_constant: Option<T> = None.)

@alexcrichton
Copy link
Member Author

So here's a few things that I'm finding with this.

  1. In its current form, this doesn't actually bootstrap. I think that this is because there's another case or two where our symbols names are colliding (testing a fix for that now).
  2. We currently do name mangling similarly to the way that C++ does so. When I looked into it, I actually couldn't figure out why we did this. There's a comment about gas not accepting the names, but we're not using that to the best of my knowledge so this may have simply been a relic from awhile ago. Namely, for mangling, we sanitize characters like '<' to '$LT$', but I couldn't find any sort of "standard" for this that gdb/lldb follows. The backtrace would always have the literal $LT$ character. Knowing this, I tried just not mangling symbols at all and it turns out that it bootstraps and works just fine (at least on OSX). Unless we have a reason to continue C++ name mangling, I think that this may actually be the way to go.
  3. I decided to go all out and just use pprust to print names of things when generating names. For the test above now we get:
$ gobjdump -t foo
BFD: foo: unknown load command 0x2a
BFD: foo: unknown load command 0x28
BFD: foo: unknown load command 0x29
BFD: foo: unknown load command 0x2b

foo:     file format mach-o-x86-64

SYMBOL TABLE:
0000000100000dc0 l       0e SECT   01 0000 [.text] _C<T>::baz_247-ve3fc7435c4ba24c0Op
0000000100000e17 g       1e SECT   01 0000 [.text] ___morestack
0000000100000ec0 l       0e SECT   04 0000 [.const] _foo::foo_constant-vd2f173dfc35a5d36
0000000100000ec8 l       0e SECT   04 0000 [.const] _A::foo::foo_constant-vd2f173dfc35a5d36RY
0000000100000ed8 l       0e SECT   04 0000 [.const] _C<T>::baz::baz_constant-vd2f173dfc35a5d36Op
0000000100001040 l       0e SECT   09 0000 [.bss] __rust_mod_map
0000000100000c20 g       0f SECT   01 0000 [.text] _A::foo-ve3fc7435c4ba24c0RYa7
0000000100000c80 g       0f SECT   01 0000 [.text] _B[for A]::bar-ve3fc7435c4ba24c0FNa8
0000000100000ed0 g       0f SECT   04 0000 [.const] _B[for A]::bar::bar_constant-vd2f173dfc35a5d36FN
0000000100000000 g       0f SECT   01 0010 [.text] __mh_execute_header
0000000100001020 g       0f SECT   08 0000 [.data] __rust_crate_map_toplevel
0000000100000bd0 g       0f SECT   01 0000 [.text] _foo-ve3fc7435c4ba24c0
0000000100000d60 g       0f SECT   01 0000 [.text] _main
0000000100000ce0 g       0f SECT   01 0000 [.text] _main-vb9fcaaf7d4ea848c
0000000100000ee0 g       0f SECT   04 0000 [.const] _rust_abi_version
0000000000000000 g       01 UND    00 0100 _upcall_del_stack
0000000000000000 g       01 UND    00 0100 _upcall_new_stack
0000000000000000 g       01 UND    00 0200 dyld_stub_binder

Note the lack of c++filt. The leading underscore is an OSX thing and won't be present in gdb backtraces and such.

@huonw
Copy link
Member

huonw commented Aug 31, 2013

That doesn't sound too bad.

(Although, I do remember there were some issues with the android assembler and non-ascii characters in symbols.)

@alexcrichton
Copy link
Member Author

If that's the case, perhaps we could perform some odd normalization on android but let other platforms have the benefit of better symbol generation. I'd love to look into why there's the bug in the first place on android though (for a later time).

@alexcrichton
Copy link
Member Author

ok it's almost done with make check, I'm just gonna assume that it'll finish all the way now

@alexcrichton
Copy link
Member Author

Looks like I'm about to learn why we did sanitization in the first place. I'll investigate this soon.

Before, the path name for all items defined in methods of traits and impls never
took into account the name of the method. This meant that if you had two statics
of the same name in two different methods the statics would end up having the
same symbol named (even after mangling) because the path components leading to
the symbol were exactly the same (just __extensions__ and the static name).

It turns out that if you add the symbol "A" twice to LLVM, it automatically
makes the second one "A1" instead of "A". What this meant is that in local crate
compilations we never found this bug. Even across crates, this was never a
problem. The problem arises when you have generic methods that don't get
generated at compile-time of a library. If the statics were re-added to LLVM by
a client crate of a library in a different order, you would reference different
constants (the integer suffixes wouldn't be guaranteed to be the same).

This fixes the problem by adding the method name to symbol path when building
the ast_map. In doing so, two symbols in two different methods are disambiguated
against.
As with the previous commit, this is targeted at removing the possibility of
collisions between statics. The main use case here is when there's a
type-parametric function with an inner static that's compiled as a library.
Before this commit, any impl would generate a path item of "__extensions__".
This changes this identifier to be a "pretty name", which is either the last
element of the path of the trait implemented or the last element of the type's
path that's being implemented.  That doesn't quite cut it though, so the (trait,
type) pair is hashed and again used to append information to the symbol.

Essentially, __extensions__ was removed for something nicer for debugging, and
then some more information was added to symbol name by including a hash of the
trait being implemented and type it's being implemented for. This should prevent
colliding names for inner statics in regular functions with similar names.
@alexcrichton
Copy link
Member Author

Alright let's try this again. I've reverted back to C++ name-mangling so we can get pretty :: separators, but sadly this means that we can't get things like Trait[for Type] or even A<T>. I do think that this is an improvement, however.

r? the last commit? A few things changed a bit substantially there.

@huonw
Copy link
Member

huonw commented Sep 4, 2013

#7610 seems related too.

(Would it be possible to say have Trait__for_Type_... or something like that?)

Remove __extensions__ from method symbols as well as the meth_XXX. The XXX is
now used to append a few characters at the end of the name of the symbol.

Closes rust-lang#6602
@alexcrichton
Copy link
Member Author

perhaps, the problem is then you can run into collisions with absurdly named structs. I think that the type hash will cause it such that there isn't actually a naming collision. I figured we could try these out for a bit and then maybe move to that.

It's not a bad idea, though, especially if the convention for struct names is CamelCase and not with_underscores.

bors added a commit that referenced this pull request Sep 5, 2013
… r=huonw

These commits fix bugs related to identically named statics in functions of implementations in various situations. The commit messages have most of the information about what bugs are being fixed and why.

As a bonus, while I was messing around with name mangling, I improved the backtraces we'll get in gdb by removing `__extensions__` for the trait/type being implemented and by adding the method name as well. Yay!
@bors bors closed this Sep 5, 2013
flip1995 pushed a commit to flip1995/rust that referenced this pull request Jun 16, 2022
Fixes for `derive_partial_eq_without_eq`

fixes  rust-lang#8875

changelog: Don't lint `derive_partial_eq_without_eq` on non-public types
changelog: Better handle generics in `derive_partial_eq_without_eq`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants