Skip to content

Optimise and bug-fix to_digit #132428

Closed as not planned
Closed as not planned
@daniel-pfeiffer

Description

@daniel-pfeiffer

I tried this code, with absurd radices:

'0'.to_digit(0);
'0'.to_digit(1);

I expected to see this happen: panic

Instead, this happened: None Some(0)

Solution

This function was quite convoluted, which lead to unnecessary operations. Radix bounds checking was forgotten at the lower end.

OTOH radix bounds checking is often not necessary to repeat for every single digit, so I propose 2 variants, the consequence of which, see at the bottom:

pub const fn to_digit(self, radix: u32) -> Option<u32> {
    assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
    assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");
    self.to_digit_unchecked(radix)
}

/// ### Soundness
///
/// Callers of this function are responsible that this precondition is satisfied:
///
/// - The radix must be between 2 and 36 inclusive.
///
/// Failing that, the returned value may be weird and is not guaranteed to stay the same in future.
#[inline]
// semi-branchless variant
pub const fn to_digit_unchecked(selfie: char, radix: u32) -> Option<u32> {
    let is_digit = (selfie <= '9') as u32;
    let digit =
        is_digit * // branchless if
            // If not a digit, a number greater than radix will be created.
            (selfie as u32).wrapping_sub('0' as u32)
        + (1 - is_digit) * // branchless else
            // Force the 6th bit to be set to ensure ascii is lower case.
            (selfie as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10);
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}
// conventional variant
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
    let digit =
        match self {
            // If not a digit, a number greater than radix will be created.
            ..='9' => (self as u32).wrapping_sub('0' as u32),
            // Force the 6th bit to be set to ensure ascii is lower case.
            _ => (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
        };
	/* Or if you prefer this style
        if self <= '9' {
            // If not a digit, a number greater than radix will be created.
            (self as u32).wrapping_sub('0' as u32)
        } else {
            // Force the 6th bit to be set to ensure ascii is lower case.
            (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
        };
	*/
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}

I have checked all callers of to_digit, to see where it is already safe, or can be made safe, to switch to to_digit_unchecked. Maybe each time a comment should be added to the place that guarantees a valid radix:

bounds already checked outside of loop, so can switch:

  • library/core/src/num/mod.rs 2x

literal base in a variable, only valid values, so can switch:

  • compiler/rustc_session/src/errors.rs 1x
  • rustc_parse/src/lexer/mod.rs 1x

in a loop, should add bounds check and switch:

  • library/core/src/net/parser.rs 2x

literal radices, where the compiler can hopefully eliminate the asserts, so no need, but might do it to save compile time:

  • compiler/rustc_lexer/src/unescape.rs 4x
  • librustdoc/html/render/print_item.rs 6x
  • rustc_parse_format/src/lib.rs 1x
  • many test files

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions