-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Clean up string->{float, int} implementation #18536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -215,7 +214,9 @@ fn each_reexport(d: rbml::Doc, f: |rbml::Doc| -> bool) -> bool { | |||
|
|||
fn variant_disr_val(d: rbml::Doc) -> Option<ty::Disr> { | |||
reader::maybe_get_doc(d, tag_disr_val).and_then(|val_doc| { | |||
reader::with_doc_data(val_doc, |data| u64::parse_bytes(data, 10u)) | |||
reader::with_doc_data(val_doc, |data| { | |||
from_str(String::from_utf8_lossy(data).as_slice()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is non-trivially more expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - any suggestions on an improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, the old version will just read the string and only handle values between b'0'
and b'9'
(automatically ensuring unicode-validity), while this needs to validate as unicode, possibly allocating a whole new string, and then reparse as a number (i.e. revalidating that each byte lies between the two bounds).
One way to address part of the above is: invalid UTF-8 implies invalid number, so str::from_utf8(data).and_then(from_str)
should work (avoids allocating, at least).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while this needs to validate as unicode, possibly allocating a whole new string, and then reparse as a number (i.e. revalidating that each byte lies between the two bounds).
So do you think we might still need a couple of &[u8] to number functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, not sure.
I have a failing test - fixing... |
Looks like some fantastic cleanup to me, thanks @bjz! |
@alexcrichton removed the |
Ugh - I am now failing on |
We now have a really simple function signature: pub fn from_str_radix_float<T: Float>(src: &str, radix: uint) -> Option<T> By removing some of the arguments, we remove the possibility of some invalid states.
This is now covered by `FromStrRadix::from_str_radix`
Just fixed shootout-pfib |
|
||
match char::to_digit(c, radix) { | ||
// The significand to accumulate | ||
let mut sig = if is_positive { _0 } else { -_1 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut sig = _0;
This removes the |
That's just in line with the rest of the stdlib. from_str already operates on strings instead of bytes. Removing duplicate functions such as parse_bytes makes the interface more consistent. The old parse_bytes wasn't very useful anyway because you still had to first find the end of the input yourself. |
I really think that the parsing API deserves a rethink. For now I am happier having a simpler API that will be easier to update in the future. |
I have been meaning to clean up
std::num::strconv
for a long time. This module is too generic and hard to understand, both for maintainers and users.I have specialized the underlying string->{float, int} implementations into two separate underlying functions. This will allow it to be more easily optimized in the future, and untangles the logic to make it more easier to understand.
The function signatures were also simplified to take only a source string and a radix.
{f32, f64}::from_str_hex
was deprecated in favor ofFromStrRadix::from_str_radix
.r? @aturon @alexcrichton