Skip to content

Potentially redundant check for zero alignment in generated assembly #91438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alex65536 opened this issue Dec 1, 2021 · 10 comments · Fixed by #91569
Closed

Potentially redundant check for zero alignment in generated assembly #91438

alex65536 opened this issue Dec 1, 2021 · 10 comments · Fixed by #91569
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@alex65536
Copy link

Consider the following code with custom DSTs:

pub trait My {
    fn f(&self) -> i32;
}

pub struct Wrapper<T: ?Sized>(i8, T);

pub struct My1<T: My + ?Sized> {
    field: i8,
    my: Wrapper<T>,
}

type DynMy1 = My1<dyn My>;

pub fn run(d: &DynMy1) -> &Wrapper<dyn My> {
    &d.my
}

The generated assembly for x86_64 is as follows (with my comments added):

example::run:
  mov     rdx, rsi                   ; Copy vtable pointer into that of result
  mov     rcx, qword ptr [rsi + 16]  ; Load alignment into rcx
  test    rcx, rcx                   ; Check if the alignment is non-zero (?)
  mov     eax, 1
  cmovne  rax, rcx                   ; If the alignment is zero, then the offset is 1, otherwise it's rcx
  add     rax, rdi                   ; Add offset to base pointer and store into the base pointer of result
  ret

This can be seen on Godbolt.

Why does the compiler check for zero alignment? It seems that it's impossible, as the alignment cannot be zero.

Meta

rustc --version --verbose:

rustc 1.56.0 (09c42c458 2021-10-18)
binary: rustc
commit-hash: 09c42c45858d5f3aedfa670698275303a3d19afa
commit-date: 2021-10-18
host: x86_64-unknown-linux-gnu
release: 1.56.0
LLVM version: 13.0.0
@alex65536 alex65536 added the C-bug Category: This is a bug. label Dec 1, 2021
@Stargateur
Copy link
Contributor

Stargateur commented Dec 1, 2021

Why alignment can't be zero ? Are you sure it's check for alignment ?

As I read the asm using https://icscsi.org/library/Documents/Cheat_Sheets/Programming%20-%20x64%20Instructions.pdf for ref:

test if rcx is zero, use 1 as default, move rcx if it was not zero. make sense for me but asm is not my field of expertise and even less vtable so I don't really understand why put 1 instead of just move rcx assuming the alignment should be correct but I guess I miss a crucial information about rust internal.

I guess ?Sized is why rust need to test since it's "maybe" sized

@workingjubilee
Copy link
Member

Adding

fn main() {
    println!("{}", std::mem::align_of::<Wrapper<dyn My>>());
}

gets me

error[E0277]: the size for values of type `dyn My` cannot be known at compilation time
   --> <source>:19:20
    |
19  |     println!("{}", std::mem::align_of::<Wrapper<dyn My>>());
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
    |
    = help: within `Wrapper<dyn My>`, the trait `Sized` is not implemented for `dyn My`
note: required because it appears within the type `Wrapper<dyn My>`
   --> <source>:5:12
    |
5   | pub struct Wrapper<T: ?Sized>(i8, T);
    |            ^^^^^^^
note: required by a bound in `align_of`

So, at least as far as the compiler is concerned, there is no obvious reason it could not be a zero-sized type.
I could believe this is a missed optimization, where the compiler could infer that because of the i8 field, it has to be aligned to at least 1, but I do not believe it is a bug. It seems to satisfy the current reasoning around alignment and DSTs.

@alex65536
Copy link
Author

Why alignment can't be zero ?

As stated in https://doc.rust-lang.org/reference/type-layout.html,

Alignment is measured in bytes, and must be at least 1, and always a power of 2.

Are you sure it's check for alignment ?

Yes. The references to dyn Trait consist of two pointers, one of with is a base pointer (rdi in the assembly above) and the second one (rsi in the assembly above) is the pointer to vtable. The entry of vtable with index 2 (written as [rsi + 16] in assembly) is the alignment (as it can be seen here).

@alex65536
Copy link
Author

alex65536 commented Dec 2, 2021

So, at least as far as the compiler is concerned, there is no obvious reason it could not be a zero-sized type.

How does the compilation error above indicate that? Also, ZSTs has alignment 1, as you can check using

std::mem::align_of::<()>();

I could believe this is a missed optimization, where the compiler could infer that because of the i8 field, it has to be aligned to at least 1, but I do not believe it is a bug.

Why it's not a bug? The compiler generates a seemingly unnecessary instruction, which is an optimization issue. I agree that the issue is quite minor, though, and most probably it doesn't cause observable slowdown.

Maybe there are some reasons why the alignment in vtable can be zero, but it seems to contradict the documentation.

@Stargateur
Copy link
Contributor

Stargateur commented Dec 2, 2021

Yes. The references to dyn Trait consist of two pointers, one of with is a base pointer (rdi in the assembly above) and the second one (rsi in the assembly above) is the pointer to vtable. The entry of vtable with index 2 (written as [rsi + 16] in assembly) is the alignment (as it can be seen here).

I fail to see any explanation in the link that say that, COMMON_VTABLE_ENTRIES_ALIGN is about the alignment of the vtable not its position or whatever registry contain it. For me it's make more sense it's the size of the object or something, why this code would need the vtable anyway ?

It's probably search the alignement of T to know the offset of the strucutre:

pub struct My1<T: My + ?Sized> {
    field: i8, // <= default 1 alignment
    my: Wrapper<T>, // <= could dynamicly change alignment of My1 ?
}

Again just random guess.

@alex65536
Copy link
Author

I fail to see any explanation in the link that say that, COMMON_VTABLE_ENTRIES_ALIGN is about the alignment of the vtable not its position or whatever registry contain it. For me it's make more sense it's the size of the object or something, why this code would need the vtable anyway ?

It's the alignment of the type corresponding to this vtable, not the alignment of vtable itself.

It's probably search the alignement of T to know the offset of the strucutre:

Yes, it is. The issue is that while calculating the offset dynamically, it does redundantly check whether the alignment is zero.

@Stargateur
Copy link
Contributor

Yes, it is. The issue is that while calculating the offset dynamically, it does redundantly check whether the alignment is zero.

why do you suppose rust internal doesn't need this ? having a zero alignment could mean something like unknown alignment ? I mean you first need to prove test for zero is useless before say it's seem useless. We don't know for sure what Rust really need to do here, the type is dyn, maybe Sized. That not trivial, one should perfectly know Rust internal to know if this check is useless.

Sorry for my poor knowledge in this if my comment are just losing your time just ignore them, I just try to understand your point.

@alex65536
Copy link
Author

alex65536 commented Dec 2, 2021

why do you suppose rust internal doesn't need this ? having a zero alignment could mean something like unknown alignment ?

I didn't dive into Rust internals code, but I know that the compiler doesn't always check for zero alignment. Here's the example:

pub trait My {
    fn f(&self) -> i32;
}

pub struct My1<T: My + ?Sized> {
    field: i8,
    my: T,
}

type DynMy1 = My1<dyn My>;

pub fn run(d: &DynMy1) -> &dyn My {
    &d.my
}

gives just

example::run:
        mov     rdx, rsi
        mov     rax, qword ptr [rdx + 16]
        add     rax, rdi
        ret

But, even if I add the transparent wrapper, the check for zero is generated. See the following example:

pub trait My {
    fn f(&self) -> i32;
}

#[repr(transparent)]
pub struct Wrapper<T: ?Sized>(T);

pub struct My1<T: My + ?Sized> {
    field: i8,
    my: Wrapper<T>,
}

type DynMy1 = My1<dyn My>;

pub fn run(d: &DynMy1) -> &Wrapper<dyn My> {
    &d.my
}

The generated assembly becomes much more complex:

example::run:
        mov     rdx, rsi
        mov     rax, qword ptr [rdx + 16]
        test    rax, rax
        mov     ecx, 1
        cmove   rax, rcx
        add     rax, rdi
        ret

I don't have an idea on why the check happens. If it's omitted sometimes, then the compiler kind of knows that the alignment in vtables cannot be zero, yet it generates the check in a more complex example.

@Stargateur
Copy link
Contributor

I think the diff is caused by &dyn My versus &Wrapper<dyn My> but you use transparent so here I agree with you it's look strange. You convince me.

@inquisitivecrystal inquisitivecrystal added A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Dec 2, 2021
@erikdesjardins
Copy link
Contributor

This check is generated by the following code:

// Choose max of two known alignments (combined value must
// be aligned according to more restrictive of the two).
let align = match (
bx.const_to_opt_u128(sized_align, false),
bx.const_to_opt_u128(unsized_align, false),
) {
(Some(sized_align), Some(unsized_align)) => {
// If both alignments are constant, (the sized_align should always be), then
// pick the correct alignment statically.
bx.const_usize(std::cmp::max(sized_align, unsized_align) as u64)
}
_ => {
let cmp = bx.icmp(IntPredicate::IntUGT, sized_align, unsized_align);
bx.select(cmp, sized_align, unsized_align)
}
};
and is used to determine the runtime alignment of the my field, which is max(<align of Wrapper>, <align of DST field from vtable>).

In the general case, this check is required, because the alignment of Wrapper may be >1. You can see this if you change the field in Wrapper to an i16, which has 2-byte alignment: https://godbolt.org/z/r3dPchjGP

In the specific cases you've identified, where the alignment of Wrapper is 1, I agree that it is unnecessary. When loading alignment from the vtable, we should be adding range metadata to tell LLVM that it's nonzero.

(opened #91569 for this)

@bors bors closed this as completed in 4a7fb97 Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants