Skip to content

Missed niche optimization #119055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jules-Bertholet opened this issue Dec 17, 2023 · 3 comments
Open

Missed niche optimization #119055

Jules-Bertholet opened this issue Dec 17, 2023 · 3 comments
Labels
A-layout Area: Memory layout of types C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Jules-Bertholet
Copy link
Contributor

I tried this code:

#[repr(u8)]
enum Foo {
    A = 0,
}

#[repr(u8)]
enum Bar {
    A = 1,
}

enum Choice {
    F(Foo),
    B(Bar),
}

fn main() {
    dbg!(core::mem::size_of::<Choice>());
}

I expected to see this happen: Choice has size 1

Instead, this happened: Choice has size 2

Meta

Rust version; 1.74.1

@rustbot label A-layout T-compiler

@Jules-Bertholet Jules-Bertholet added the C-bug Category: This is a bug. label Dec 17, 2023
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-layout Area: Memory layout of types T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 17, 2023
@saethlin saethlin added C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Dec 18, 2023
@programmerjake
Copy link
Member

this optimization should apply when variants have data too:

// could be size 2 since A and B's discriminants are in the same spot with non-overlapping contiguous values, but currently isn't
enum Top {
    A(A),
    B(B),
}

#[repr(u8)]
enum A {
    A0(u8) = 0,
    A1(u8) = 1,
}

#[repr(u8)]
enum B {
    B2(u8) = 2,
    B3(u8) = 3,
}

@aochagavia
Copy link
Contributor

I was debugging this today together with @holodorum and observed an unexpected layout difference for the simplest possible inhabited enum: enum Univariant { Value }. When switching from#[repr(Rust)] to #[repr(u8)], the variants field now has a value of Variants::Multiple, even though there's only a single variant.

Is this intended? I'm not that familiar with the compiler code, but at least based on the name (i.e. Multiple) it seems wrong to me. I think this could be causing calculate_filling_niche_layout to return early here (there is a check ensuring that all enum variants have a layout of Variants::Single). Could anyone chime in to comment on this hypothesis?

Full output of #[rustc_layout(debug)]

#[repr(Rust)]

Layout {
    size: Size(0 bytes),
    align: AbiAndPrefAlign {
        abi: Align(1 bytes),
        pref: Align(8 bytes),
    },
    backend_repr: Memory {
        sized: true,
    },
    fields: Arbitrary {
        offsets: [],
        memory_index: [],
    },
    largest_niche: None,
    uninhabited: false,
    variants: Single {
        index: 0,
    },
    max_repr_align: None,
    unadjusted_abi_align: Align(1 bytes),
    randomization_seed: 11354830321609605625,
}

#[repr(u8)]:

Layout {
    size: Size(1 bytes),
    align: AbiAndPrefAlign {
        abi: Align(1 bytes),
        pref: Align(8 bytes),
    },
    backend_repr: Scalar(
        Initialized {
            value: Int(
                I8,
                false,
            ),
            valid_range: 0..=0,
        },
    ),
    fields: Arbitrary {
        offsets: [
            Size(0 bytes),
        ],
        memory_index: [
            0,
        ],
    },
    largest_niche: Some(
        Niche {
            offset: Size(0 bytes),
            value: Int(
                I8,
                false,
            ),
            valid_range: 0..=0,
        },
    ),
    uninhabited: false,
    variants: Multiple {
        tag: Initialized {
            value: Int(
                I8,
                false,
            ),
            valid_range: 0..=0,
        },
        tag_encoding: Direct,
        tag_field: 0,
        variants: [
            Layout {
                size: Size(1 bytes),
                align: AbiAndPrefAlign {
                    abi: Align(1 bytes),
                    pref: Align(8 bytes),
                },
                backend_repr: Memory {
                    sized: true,
                },
                fields: Arbitrary {
                    offsets: [],
                    memory_index: [],
                },
                largest_niche: None,
                uninhabited: false,
                variants: Single {
                    index: 0,
                },
                max_repr_align: None,
                unadjusted_abi_align: Align(1 bytes),
                randomization_seed: 11354830321609605625,
            },
        ],
    },
    max_repr_align: None,
    unadjusted_abi_align: Align(1 bytes),
    randomization_seed: 4262916569509659634,
}

@the8472
Copy link
Member

the8472 commented Apr 26, 2025

I think Variants::Multiple is used because repr(u8) must encode its tag even if there's only a single variant because its layout is guaranteed. Variants::Single just doesn't have tag encoding.
Perhaps the naming is suboptimal in this case.

The normal niche-filling logic can offset the variant index→tag encoding around to accommodate whatever niche is available. To handle the values of repr(u8) structs which are rigid will require some new logic to check that they're non-overlapping.

Probably the most tricky part would be the tag↔discriminant codegen which would need a switch or lookup table for Choice because its discriminant values don't necessarily match the fixed tags of the inner enums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: Memory layout of types C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants