Skip to content

Using this in constructor seems to be UB #2384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tom-anders opened this issue Jan 10, 2023 · 5 comments · Fixed by #2385
Closed

Using this in constructor seems to be UB #2384

tom-anders opened this issue Jan 10, 2023 · 5 comments · Fixed by #2385

Comments

@tom-anders
Copy link

tom-anders commented Jan 10, 2023

Input C/C++ Header

class Foo {
	Foo() { printf("%d", this);} 
	void bar() const { printf("%d", this);} 
}

When using this from Rust like this

let foo = Foo::new();
foo.bar();

Then the printf calls will print different numbers. This is probably expected behavior, but it doesn't seem to be documented (would've saved me 1 hour of debugging). For example, people may use the this pointer to pass the class as some listener interface to one of its fields.

@pvdrz
Copy link
Contributor

pvdrz commented Jan 10, 2023

Hmm this is a bit tricky as bindgen would generate code like this:

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct Foo {
    pub _address: u8,
}
extern "C" {
    #[link_name = "\u{1}_ZN3Foo3barEv"]
    pub fn Foo_bar(this: *mut Foo);
}
extern "C" {
    #[link_name = "\u{1}_ZN3FooC1Ev"]
    pub fn Foo_Foo(this: *mut Foo);
}
impl Foo {
    #[inline]
    pub unsafe fn bar(&mut self) {
        Foo_bar(self)
    }
    #[inline]
    pub unsafe fn new() -> Self {
        let mut __bindgen_tmp = ::std::mem::MaybeUninit::uninit();
        Foo_Foo(__bindgen_tmp.as_mut_ptr());
        __bindgen_tmp.assume_init()
    }
}

Whether this code exhibits UB or not depends on whatever the Foo_Foo constructor does. If it actually initializes this: *mut FOO then it won't cause UB.

That being said, the reason for both pointers being different is actually our method wrappers. I don't have your code but I suspect that this:

let mut foo = ::std::mem::MaybeUninit::<Foo>::uninit();
Foo_Foo(foo.as_mut_ptr());
foo.assume_init();
Foo_bar(foo.as_mut_ptr());

should print the same value twice instead.

I don't see a proper way to fix it though as this happens because once Foo::new() is called, the Foo value will be moved as it is being returned by the constructor.

The best we could do is document this behavior as you said. Maybe even add a disclaimer saying that you shouldn't rely on the pointer location if you're using the generated constructors.

@tom-anders
Copy link
Author

Thanks for the quick response!

That being said, the reason for both pointers being different is actually our method wrappers. I don't have your code but I suspect that this:

let mut foo = ::std::mem::MaybeUninit::<Foo>::uninit();
Foo_Foo(foo.as_mut_ptr());
foo.assume_init();
Foo_bar(foo.as_mut_ptr());

should print the same value twice instead.

Yeah that works (and also doesn't seem to cause undefined behavior anymore when passing around this).

I don't see a proper way to fix it though as this happens because once Foo::new() is called, the Foo value will be moved as it is being returned by the constructor.

Does this mean the same problem happens when passing the Foo object around, e.g. moving it into a function argument?

The best we could do is document this behavior as you said. Maybe even add a disclaimer saying that you shouldn't rely on the pointer location if you're using the generated constructors.

Sounds good, maybe adding the workaround from above would also be useful.

@pvdrz
Copy link
Contributor

pvdrz commented Jan 10, 2023

Yeah that works (and also doesn't seem to cause undefined behavior anymore when passing around this).

This might be a bit nitpicky but getting different addresses for both pointers is not UB in the sense that this behavior is expected from the semantic of both C and Rust IMHO. The address of a value could change anytime the value is moved.

Does this mean the same problem happens when passing the Foo object around, e.g. moving it into a function argument?

Yes! In fact, this actually happens in C++ too, this code

#include<stdio.h>

class Foo {
    public:
	Foo() { printf("%d\n", this);} 
	void bar() const { printf("%d\n", this);} 
};

void aux(Foo foo) {
    foo.bar();
}

int main() {
    Foo foo = Foo();
    aux(foo);

    return 0;
}

will print different values.

Sounds good, maybe adding the workaround from above would also be useful.

Yeah this is going to be one of these funny things to explain as C++ constructors are "by reference" (they receive a pointer to the value to be initialized) meanwhile Rust constructors are "by value" (they return the initialized value).

@tom-anders
Copy link
Author

Ah yeah that makes sense. To get pointer stability, instead of passing this as the listener, I should probably just have a dedicated member allocated on the heap and pass that as the listener instead. That should give me address stability even when Rust or C++ move the containing class around

@tom-anders
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants