Skip to content

miri: improve current situation #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
phip1611 opened this issue Jun 23, 2023 · 4 comments · Fixed by #227
Closed

miri: improve current situation #160

phip1611 opened this issue Jun 23, 2023 · 4 comments · Fixed by #227

Comments

@phip1611
Copy link
Member

In #128, I added basic miri support. Due to the following issues, miri can't embrace all it's funcitionality:

  • Each test that uses the cast_tag (transitively) fails, as miri can't guarantee that the memory is valid, unfortunately
  • when something like struct_as_bytes() ... cast is done, there are alignment issues as the returned Vec is of course only aligned to a one byte boundary... so far, in tests this never was a problem, luckily. At least on x86.
@phip1611
Copy link
Member Author

phip1611 commented Jun 23, 2023

I'm 99.9% sure that the current crate is memory-safe. Unit tests, integration tests, at least a partial miri coverage. However, having miri on your side is a benefit, of course.

Update: This might not be true. When it comes to UB, one has to think about the "Rust abstract machine", not what the hardware in the end does.

@phip1611
Copy link
Member Author

phip1611 commented Jun 26, 2023

I created a minimal reproducer for the miri issue. cast_tag causes the problem.

I also tried to solve it by giving all tags a 'a lifetime. Didn't work

use std::mem::size_of;
use std::ops::Deref;
use ptr_meta::Pointee;

#[repr(C, align(8))]
struct Align8<T>(T);

impl<T> Deref for Align8<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

#[derive(Debug, PartialEq, Eq)]
#[repr(u32)]
enum TagType {
    Foo = 1,
    Dst = 2,
}

#[repr(C)]
struct BaseTag {
    typ: TagType,
    size: u32,
}

// Tag type: 0x1
#[repr(C)]
struct FooTag {
    base: BaseTag,
    val: u64,
}

// Tag type: 0x2
#[derive(ptr_meta::Pointee)]
#[repr(C)]
struct DstTag {
    base: BaseTag,
    // embedded string slice
    name: [u8],
}

impl DstTag {
    fn name(&self) -> &str {
        core::str::from_utf8(&self.name).unwrap()
    }
}

fn cast_tag<'a, T: ?Sized + Pointee<Metadata = usize> + 'a>(tag: &'a BaseTag) -> &'a T {
    let ptr = core::ptr::addr_of!(*tag);
    unsafe { &*ptr_meta::from_raw_parts(ptr.cast(), tag.size as usize - size_of::<BaseTag>()) }
}

mod tests {
    use super::*;

    #[test]
    fn test_parse_tag() {
        let bytes = Align8([
            // <begin of tag foo>
            TagType::Foo as u32 as u8, // tag foo: type
            0x0,
            0x0,
            0x0,
            16, // tag foo: size
            0x0,
            0x0,
            0x0,
            0x37, // tag foo: val
            0x13,
            0x0,
            0x0,
            0x0,
            0x0,
            0x0,
            0x0,
            // <end of tag foo>
            // <begin of tag dst>
            TagType::Dst as u32 as u8, // tag dst: type
            0x0,
            0x0,
            0x0,
            16, // tag dst: size
            0x0,
            0x0,
            0x0,
            b'h',
            b'e',
            b'l',
            b'l',
            b'o',
            b' ',
            b'!',
            b'\0', // tag foo: name
            // <end of tag dst>
        ]);

        // Parsing of Tag Foo
        let base_tag_ptr = bytes.as_ptr().cast::<BaseTag>();
        let base_tag_ref = unsafe { &*base_tag_ptr };
        assert_eq!(base_tag_ref.size, 16);
        assert_eq!(base_tag_ref.typ, TagType::Foo);
        let tag_foo = unsafe { &*base_tag_ptr.cast::<FooTag>() };
        assert_eq!(tag_foo.val, 0x1337);

        // Parsing of Tag Dst
        let base_tag_ptr = unsafe {
            bytes
                .as_ptr()
                .add(tag_foo.base.size as usize)
                .cast::<BaseTag>()
        };
        let base_tag_ref = unsafe { &*base_tag_ptr };
        assert_eq!(base_tag_ref.typ, TagType::Dst);
        assert_eq!(base_tag_ref.size, 16);
        let tag_dst = cast_tag::<DstTag>(base_tag_ref);
        assert_eq!(tag_dst.name(), "hello !\0");
    }
}

@phip1611
Copy link
Member Author

This patch solves the miri issues:

0a1
> use ptr_meta::Pointee;
3d3
< use ptr_meta::Pointee;
51,53c51,60
< fn cast_tag<'a, T: ?Sized + Pointee<Metadata = usize> + 'a>(tag: &'a BaseTag) -> &'a T {
<     let ptr = core::ptr::addr_of!(*tag);
<     unsafe { &*ptr_meta::from_raw_parts(ptr.cast(), tag.size as usize - size_of::<BaseTag>()) }
---
> fn cast_tag<'a, T: ?Sized + Pointee<Metadata = usize> + 'a>(
>     backing_mem: &'a [u8],
> ) -> &'a T {
>     let base_tag = unsafe { backing_mem.as_ptr().cast::<BaseTag>().as_ref() }.unwrap();
>     unsafe {
>         &*ptr_meta::from_raw_parts(
>             backing_mem.as_ptr().cast(),
>             base_tag.size as usize - size_of::<BaseTag>(),
>         )
>     }
97c104
<             // <end of tag dst>
---
>                    // <end of tag dst>
118c125,127
<         let tag_dst = cast_tag::<DstTag>(base_tag_ref);
---
>         let offset = base_tag_ptr.cast::<u8>() as usize - bytes.as_ptr() as usize;
>         let bytes_slices = &bytes[offset..];
>         let tag_dst = cast_tag::<DstTag>(bytes_slices);

@phip1611
Copy link
Member Author

phip1611 commented Aug 13, 2024

A lively discussion on Reddit (thanks everyone!) pointed out that my assumption was wrong. My assumption was that one can access memory outside the scope of a function if the caller level ensures that memory is valid. However, rustc only looks at functions and not the whole call tree to improve performance during compilation.

Here are the options I have:

  • go with approach B (see patch above)
  • make cast_tag a macro
  • Split the type Tag into CommonHeader and GenericTag, where the latter is a DST with a [u8] field after the common header. This ensures that every cast only operates on memory that the function also has access to by its inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant