Skip to content

Clean up string->{float, int} implementation #18536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 4, 2014
Merged

Conversation

brendanzab
Copy link
Member

I have been meaning to clean up std::num::strconv for a long time. This module is too generic and hard to understand, both for maintainers and users.

I have specialized the underlying string->{float, int} implementations into two separate underlying functions. This will allow it to be more easily optimized in the future, and untangles the logic to make it more easier to understand.

The function signatures were also simplified to take only a source string and a radix. {f32, f64}::from_str_hex was deprecated in favor of FromStrRadix::from_str_radix.

r? @aturon @alexcrichton

@@ -215,7 +214,9 @@ fn each_reexport(d: rbml::Doc, f: |rbml::Doc| -> bool) -> bool {

fn variant_disr_val(d: rbml::Doc) -> Option<ty::Disr> {
reader::maybe_get_doc(d, tag_disr_val).and_then(|val_doc| {
reader::with_doc_data(val_doc, |data| u64::parse_bytes(data, 10u))
reader::with_doc_data(val_doc, |data| {
from_str(String::from_utf8_lossy(data).as_slice())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is non-trivially more expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - any suggestions on an improvement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, the old version will just read the string and only handle values between b'0' and b'9' (automatically ensuring unicode-validity), while this needs to validate as unicode, possibly allocating a whole new string, and then reparse as a number (i.e. revalidating that each byte lies between the two bounds).

One way to address part of the above is: invalid UTF-8 implies invalid number, so str::from_utf8(data).and_then(from_str) should work (avoids allocating, at least).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this needs to validate as unicode, possibly allocating a whole new string, and then reparse as a number (i.e. revalidating that each byte lies between the two bounds).

So do you think we might still need a couple of &[u8] to number functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, not sure.

@brendanzab
Copy link
Member Author

I have a failing test - fixing...

@alexcrichton
Copy link
Member

Looks like some fantastic cleanup to me, thanks @bjz!

@brendanzab brendanzab changed the title Numeric string->{float, int} implementation cleanups Clean up string->{float, int} implementation and remove strconv module Nov 3, 2014
@brendanzab
Copy link
Member Author

@alexcrichton removed the strconv module entirely now.

@brendanzab
Copy link
Member Author

Ugh - I am now failing on run-pass/exponential-notation.rs. There is no way to print out binary exponent floating point values with using fmt. I guess we will have to put off removing strconv then...

@brendanzab brendanzab changed the title Clean up string->{float, int} implementation and remove strconv module Clean up string->{float, int} implementation Nov 3, 2014
We now have a really simple function signature:

    pub fn from_str_radix_float<T: Float>(src: &str, radix: uint) -> Option<T>

By removing some of the arguments, we remove the possibility of some invalid states.
This is now covered by `FromStrRadix::from_str_radix`
@brendanzab
Copy link
Member Author

Just fixed shootout-pfib

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 4, 2014
@bors bors merged commit 8bd37e6 into rust-lang:master Nov 4, 2014
@brendanzab brendanzab deleted the strconv branch November 6, 2014 01:51

match char::to_digit(c, radix) {
// The significand to accumulate
let mut sig = if is_positive { _0 } else { -_1 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut sig = _0;

@SimonSapin
Copy link
Contributor

This removes the parse_bytes functions, but the replacement requires an extraneous UTF-8 check. It seems unfortunate to pay a runtime cost for no particular reason.

@mahkoh
Copy link
Contributor

mahkoh commented Nov 6, 2014

That's just in line with the rest of the stdlib. from_str already operates on strings instead of bytes. Removing duplicate functions such as parse_bytes makes the interface more consistent. The old parse_bytes wasn't very useful anyway because you still had to first find the end of the input yourself.

@brendanzab
Copy link
Member Author

I really think that the parsing API deserves a rethink. For now I am happier having a simpler API that will be easier to update in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants