Description
I tried this code, with absurd radices:
'0'.to_digit(0);
'0'.to_digit(1);
I expected to see this happen: panic
Instead, this happened: None Some(0)
Solution
This function was quite convoluted, which lead to unnecessary operations. Radix bounds checking was forgotten at the lower end.
OTOH radix bounds checking is often not necessary to repeat for every single digit, so I propose 2 variants, the consequence of which, see at the bottom:
pub const fn to_digit(self, radix: u32) -> Option<u32> {
assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");
self.to_digit_unchecked(radix)
}
/// ### Soundness
///
/// Callers of this function are responsible that this precondition is satisfied:
///
/// - The radix must be between 2 and 36 inclusive.
///
/// Failing that, the returned value may be weird and is not guaranteed to stay the same in future.
#[inline]
// semi-branchless variant
pub const fn to_digit_unchecked(selfie: char, radix: u32) -> Option<u32> {
let is_digit = (selfie <= '9') as u32;
let digit =
is_digit * // branchless if
// If not a digit, a number greater than radix will be created.
(selfie as u32).wrapping_sub('0' as u32)
+ (1 - is_digit) * // branchless else
// Force the 6th bit to be set to ensure ascii is lower case.
(selfie as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10);
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
// conventional variant
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
let digit =
match self {
// If not a digit, a number greater than radix will be created.
..='9' => (self as u32).wrapping_sub('0' as u32),
// Force the 6th bit to be set to ensure ascii is lower case.
_ => (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
};
/* Or if you prefer this style
if self <= '9' {
// If not a digit, a number greater than radix will be created.
(self as u32).wrapping_sub('0' as u32)
} else {
// Force the 6th bit to be set to ensure ascii is lower case.
(self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
};
*/
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
I have checked all callers of to_digit
, to see where it is already safe, or can be made safe, to switch to to_digit_unchecked
. Maybe each time a comment should be added to the place that guarantees a valid radix:
bounds already checked outside of loop, so can switch:
- library/core/src/num/mod.rs 2x
literal base in a variable, only valid values, so can switch:
- compiler/rustc_session/src/errors.rs 1x
- rustc_parse/src/lexer/mod.rs 1x
in a loop, should add bounds check and switch:
- library/core/src/net/parser.rs 2x
literal radices, where the compiler can hopefully eliminate the assert
s, so no need, but might do it to save compile time:
- compiler/rustc_lexer/src/unescape.rs 4x
- librustdoc/html/render/print_item.rs 6x
- rustc_parse_format/src/lib.rs 1x
- many test files