-
Notifications
You must be signed in to change notification settings - Fork 13.3k
String searching is slow #14107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
I-slow
Issue: Problems and improvements with respect to performance of generated code.
Comments
bors
added a commit
that referenced
this issue
May 16, 2014
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far: test str::bench::bench_contains_bad_naive ... bench: 290 ns/iter (+/- 12) from 1309 ns/iter (+/- 36) test str::bench::bench_contains_equal ... bench: 479 ns/iter (+/- 10) from 137 ns/iter (+/- 2) test str::bench::bench_contains_short_long ... bench: 2844 ns/iter (+/- 105) from 5473 ns/iter (+/- 14) test str::bench::bench_contains_short_short ... bench: 55 ns/iter (+/- 4) from 57 ns/iter (+/- 6) Except for the case specifically designed to be optimal for the naive case (`bench_contains_equal`), this gets as good or better performance as the previous code.
This is for the case where the needle is a single character. In the case above the rust version is as fast or faster than the C version. #include <string.h>
#include <unistd.h>
int main(void) {
static char haystack[1024 * 1024];
read(0, haystack, sizeof(haystack) - 1);
const char *needle = "_";
for (size_t i = 0; i < 10000000; i++) {
size_t j = (size_t)strstr(haystack, needle);
asm volatile("" : : "r"(j), "r"(&needle) : "memory");
}
} #![feature(core, test, io)]
extern crate test;
fn main() {
let haystack = std::old_io::stdin().read_to_string().unwrap();
let mut needle = "_";
for _ in range(0, 10000000) {
test::black_box(&mut needle);
test::black_box(haystack.contains(needle));
}
} Maybe there is a bug in the case for short needles. |
flip1995
pushed a commit
to flip1995/rust
that referenced
this issue
Apr 3, 2025
close rust-lang#2177 changelog: [`manual_dangling_ptr`]: new lint
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The current implement of functions like
StrSlice.contains
are naive string searches, not using a more intelligent algorithm like Boyer-Moore (with a comment referring one to #1932).Unfortunately that means we're trounced when performing operations like that. e.g.
General notes:
strstr
: strstr.c and str-two-way.hThe text was updated successfully, but these errors were encountered: