Skip to content

UTF8 perf work #7696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 12, 2013
Merged

UTF8 perf work #7696

merged 3 commits into from
Jul 12, 2013

Conversation

glinscott
Copy link
Contributor

Moves multibyte code to it's own function to make char_range_at
easier to inline, and faster for single and multibyte chars.

Benchmarked reading example.json 100 times, 1.18s before, 1.08s
after.

Also, optimize str::is_utf8 for the single and multibyte case
Before:
is_utf8_ascii: 272.355162 ms
is_utf8_multibyte: 167.337334 ms

After:
is_utf8_ascii: 218.088049 ms
is_utf8_multibyte: 134.836722 ms

@bluss
Copy link
Member

bluss commented Jul 10, 2013

having an ascii fast path is crucial. Later we have to add a check in is_utf8 to deny overlong encodings (for example, codepoints in the ascii range encoded with more than 1 byte).

Moves multibyte code to it's own function to make char_range_at
easier to inline, and faster for single and multibyte chars.

Benchmarked reading example.json 100 times, 1.18s before, 1.08s
after.
Manually unroll the multibyte loops, and optimize for the single
byte chars.
Before:
is_utf8_ascii:          272.355162 ms
is_utf8_multibyte:      167.337334 ms

After:
is_utf8_ascii:          218.088049 ms
is_utf8_multibyte:      134.836722 ms
@glinscott
Copy link
Contributor Author

Updated with is_utf8 bench tests. Substantial improvement for ascii, and a good improvement for multibyte.

bors added a commit that referenced this pull request Jul 12, 2013
Moves multibyte code to it's own function to make char_range_at
easier to inline, and faster for single and multibyte chars.

Benchmarked reading example.json 100 times, 1.18s before, 1.08s
after.

Also, optimize str::is_utf8 for the single and multibyte case
Before:
is_utf8_ascii:          272.355162 ms
is_utf8_multibyte:      167.337334 ms

After:
is_utf8_ascii:          218.088049 ms
is_utf8_multibyte:      134.836722 ms
@bors bors closed this Jul 12, 2013
@bors bors merged commit 8926b31 into rust-lang:master Jul 12, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants