Skip to content

Tweak from_utf8_lossy to return a new MaybeOwned enum #12098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 8, 2014

Conversation

lilyball
Copy link
Contributor

@lilyball lilyball commented Feb 7, 2014

MaybeOwned allows from_utf8_lossy to avoid allocation if there are no
invalid bytes in the input.

Before:

test str::bench::from_utf8_lossy_100_ascii                      ... bench:       183 ns/iter (+/- 5)
test str::bench::from_utf8_lossy_100_invalid                    ... bench:       341 ns/iter (+/- 15)
test str::bench::from_utf8_lossy_100_multibyte                  ... bench:       227 ns/iter (+/- 13)
test str::bench::from_utf8_lossy_invalid                        ... bench:       102 ns/iter (+/- 4)
test str::bench::is_utf8_100_ascii                              ... bench:         2 ns/iter (+/- 0)
test str::bench::is_utf8_100_multibyte                          ... bench:         2 ns/iter (+/- 0)

Now:

test str::bench::from_utf8_lossy_100_ascii                      ... bench:        96 ns/iter (+/- 4)
test str::bench::from_utf8_lossy_100_invalid                    ... bench:       318 ns/iter (+/- 10)
test str::bench::from_utf8_lossy_100_multibyte                  ... bench:       105 ns/iter (+/- 2)
test str::bench::from_utf8_lossy_invalid                        ... bench:       105 ns/iter (+/- 2)
test str::bench::is_utf8_100_ascii                              ... bench:         2 ns/iter (+/- 0)
test str::bench::is_utf8_100_multibyte                          ... bench:         2 ns/iter (+/- 0)

@lilyball
Copy link
Contributor Author

lilyball commented Feb 7, 2014

@huonw I tried out the MaybeOwned you suggested, and it's a decent performance improvement in the valid-utf8 case.

@@ -495,7 +495,7 @@ pub struct Display<'a, P> {

impl<'a, P: GenericPath> fmt::Show for Display<'a, P> {
fn fmt(d: &Display<P>, f: &mut fmt::Formatter) -> fmt::Result {
d.with_str(|s| f.pad(s))
fmt::Show::fmt(&d.as_maybe_owned().as_slice(), f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be rebased, the formatters are now methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you're referring to #12066, which hasn't landed yet. Guess I'll have to wait for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, whoops... forgot that small detail. ;)

(It's on the bots now, fwiw.)

@lilyball
Copy link
Contributor Author

lilyball commented Feb 8, 2014

@huonw I've pushed my changes. Everything but rebasing for fmt has been addressed.

@lilyball
Copy link
Contributor Author

lilyball commented Feb 8, 2014

@huonw Rebased on top of the fmt changes

None => return Slice(unsafe { cast::transmute(v) }),
Some(i) => i
};

static REPLACEMENT: &'static [u8] = bytes!(0xEF, 0xBF, 0xBD); // U+FFFD in UTF-8
let mut i = 0u;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we start at firstbad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh geeze. Of course we should. That's a nice bit of performance I was throwing away.

@lilyball
Copy link
Contributor Author

lilyball commented Feb 8, 2014

Fixed

unsafe { raw::push_bytes(&mut res, v.slice_to(firstbad)) };
}

let mut lastgood = firstbad;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a comment about why having lastgood pointing a known bad byte is ok. (To check my understanding: the while loop below will always trigger on the first run, so on the first pass lastgood == i_ and nothing is copied.)

MaybeOwned allows from_utf8_lossy to avoid allocation if there are no
invalid bytes in the input.
Declare a `type SendStr = MaybeOwned<'static>` to ease readibility of
types that needed the old SendStr behavior.

Implement all the traits for MaybeOwned that SendStr used to implement.
@lilyball
Copy link
Contributor Author

lilyball commented Feb 8, 2014

Renamed lastgood to subseqidx and gave it a comment.

bors added a commit that referenced this pull request Feb 8, 2014
MaybeOwned allows from_utf8_lossy to avoid allocation if there are no
invalid bytes in the input.

Before:
```
test str::bench::from_utf8_lossy_100_ascii                      ... bench:       183 ns/iter (+/- 5)
test str::bench::from_utf8_lossy_100_invalid                    ... bench:       341 ns/iter (+/- 15)
test str::bench::from_utf8_lossy_100_multibyte                  ... bench:       227 ns/iter (+/- 13)
test str::bench::from_utf8_lossy_invalid                        ... bench:       102 ns/iter (+/- 4)
test str::bench::is_utf8_100_ascii                              ... bench:         2 ns/iter (+/- 0)
test str::bench::is_utf8_100_multibyte                          ... bench:         2 ns/iter (+/- 0)
```

Now:
```
test str::bench::from_utf8_lossy_100_ascii                      ... bench:        96 ns/iter (+/- 4)
test str::bench::from_utf8_lossy_100_invalid                    ... bench:       318 ns/iter (+/- 10)
test str::bench::from_utf8_lossy_100_multibyte                  ... bench:       105 ns/iter (+/- 2)
test str::bench::from_utf8_lossy_invalid                        ... bench:       105 ns/iter (+/- 2)
test str::bench::is_utf8_100_ascii                              ... bench:         2 ns/iter (+/- 0)
test str::bench::is_utf8_100_multibyte                          ... bench:         2 ns/iter (+/- 0)
```
@bors bors closed this Feb 8, 2014
@bors bors merged commit 1d17c21 into rust-lang:master Feb 8, 2014
@lilyball lilyball deleted the from_utf8_lossy_tweak branch February 8, 2014 21:26
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 25, 2022
…r=jonas-schievink

fix: show `macro_rules` snippet in blocks

fixes rust-lang/rust-analyzer#12092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants