Skip to content

bpo-24821: Fixed the slowing down to 25 times in the searching of some #505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Doc/whatsnew/3.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ Optimizations
in method calls being faster up to 20%. (Contributed by Yury Selivanov and
INADA Naoki in :issue:`26110`.)

* Searching some unlucky Unicode characters (like Ukrainian capital "Є")
in a string was to 25 times slower than searching other characters.
Now it is slower only by 3 times in worst case.
(Contributed by Serhiy Storchaka in :issue:`24821`.)

* Fast implementation from standard C library is now used for functions
:func:`~math.tgamma`, :func:`~math.lgamma`, :func:`~math.erf` and
:func:`~math.erfc` in the :mod:`math` module. (Contributed by Serhiy
Expand Down
3 changes: 3 additions & 0 deletions Misc/NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------

- bpo-24821: Fixed the slowing down to 25 times in the searching of some
unlucky Unicode characters.

- bpo-29894: The deprecation warning is emitted if __complex__ returns an
instance of a strict subclass of complex. In a future versions of Python
this can be an error.
Expand Down
46 changes: 40 additions & 6 deletions Objects/stringlib/fastsearch.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,20 @@
#define STRINGLIB_BLOOM(mask, ch) \
((mask & (1UL << ((ch) & (STRINGLIB_BLOOM_WIDTH -1)))))

#if STRINGLIB_SIZEOF_CHAR == 1
# define MEMCHR_CUT_OFF 15
#else
# define MEMCHR_CUT_OFF 40
#endif

Py_LOCAL_INLINE(Py_ssize_t)
STRINGLIB(find_char)(const STRINGLIB_CHAR* s, Py_ssize_t n, STRINGLIB_CHAR ch)
{
const STRINGLIB_CHAR *p, *e;

p = s;
e = s + n;
if (n > 10) {
if (n > MEMCHR_CUT_OFF) {
#if STRINGLIB_SIZEOF_CHAR == 1
p = memchr(s, ch, n);
if (p != NULL)
Expand All @@ -48,24 +54,36 @@ STRINGLIB(find_char)(const STRINGLIB_CHAR* s, Py_ssize_t n, STRINGLIB_CHAR ch)
#else
/* use memchr if we can choose a needle without two many likely
false positives */
const STRINGLIB_CHAR *s1, *e1;
unsigned char needle = ch & 0xff;
/* If looking for a multiple of 256, we'd have too
many false positives looking for the '\0' byte in UCS2
and UCS4 representations. */
if (needle != 0) {
while (p < e) {
do {
void *candidate = memchr(p, needle,
(e - p) * sizeof(STRINGLIB_CHAR));
if (candidate == NULL)
return -1;
s1 = p;
p = (const STRINGLIB_CHAR *)
_Py_ALIGN_DOWN(candidate, sizeof(STRINGLIB_CHAR));
if (*p == ch)
return (p - s);
/* False positive */
p++;
if (p - s1 > MEMCHR_CUT_OFF)
continue;
if (e - p <= MEMCHR_CUT_OFF)
break;
e1 = p + MEMCHR_CUT_OFF;
while (p != e1) {
if (*p == ch)
return (p - s);
p++;
}
}
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about assert(e - p <= MEMCHR_CUT_OFF) here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is obvious if use a do-while loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, yes. I comment with the previous code.

while (e - p > MEMCHR_CUT_OFF);
}
#endif
}
Expand All @@ -86,7 +104,7 @@ STRINGLIB(rfind_char)(const STRINGLIB_CHAR* s, Py_ssize_t n, STRINGLIB_CHAR ch)
it doesn't seem as optimized as memchr(), but is still quite
faster than our hand-written loop below */

if (n > 10) {
if (n > MEMCHR_CUT_OFF) {
#if STRINGLIB_SIZEOF_CHAR == 1
p = memrchr(s, ch, n);
if (p != NULL)
Expand All @@ -95,24 +113,38 @@ STRINGLIB(rfind_char)(const STRINGLIB_CHAR* s, Py_ssize_t n, STRINGLIB_CHAR ch)
#else
/* use memrchr if we can choose a needle without two many likely
false positives */
const STRINGLIB_CHAR *s1;
Py_ssize_t n1;
unsigned char needle = ch & 0xff;
/* If looking for a multiple of 256, we'd have too
many false positives looking for the '\0' byte in UCS2
and UCS4 representations. */
if (needle != 0) {
while (n > 0) {
do {
void *candidate = memrchr(s, needle,
n * sizeof(STRINGLIB_CHAR));
if (candidate == NULL)
return -1;
n1 = n;
p = (const STRINGLIB_CHAR *)
_Py_ALIGN_DOWN(candidate, sizeof(STRINGLIB_CHAR));
n = p - s;
if (*p == ch)
return n;
/* False positive */
if (n1 - n > MEMCHR_CUT_OFF)
continue;
if (n <= MEMCHR_CUT_OFF)
break;
s1 = p - MEMCHR_CUT_OFF;
while (p > s1) {
p--;
if (*p == ch)
return (p - s);
}
n = p - s;
}
return -1;
while (n > MEMCHR_CUT_OFF);
}
#endif
}
Expand All @@ -126,6 +158,8 @@ STRINGLIB(rfind_char)(const STRINGLIB_CHAR* s, Py_ssize_t n, STRINGLIB_CHAR ch)
return -1;
}

#undef MEMCHR_CUT_OFF

Py_LOCAL_INLINE(Py_ssize_t)
FASTSEARCH(const STRINGLIB_CHAR* s, Py_ssize_t n,
const STRINGLIB_CHAR* p, Py_ssize_t m,
Expand Down