-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-107596: Specialize str[int]
#107597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-107596: Specialize str[int]
#107597
Conversation
@@ -1360,6 +1350,19 @@ _Py_Specialize_BinarySubscr( | |||
PySlice_Check(sub) ? SPEC_FAIL_SUBSCR_TUPLE_SLICE : SPEC_FAIL_OTHER); | |||
goto fail; | |||
} | |||
if (container_type == &PyUnicode_Type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to check if the string is all ASCII, so we don't immediately deoptimise in that case? Or just support all characters by calling unicode_char
in the opcode case, it's the last branch anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I'll just change the specialization to only support ASCII strings, rather than bothering with other widths. It's quick and easy to check, since strings know their encoding.
Well, I think I know which new benchmark is responsible for this showing up so prominently in the failure stats... I'm going to modify this to only support ASCII strings, and see if that further improves things. |
I'm going to revert that last change. Turns out the TOML benchmark that's hitting this so hard is indeed getting mostly ASCII characters out of "barely-Unicode" strings. So the original code is probably fine. |
I'm happy with the benchmarks and stats. |
This handles cases where a string (with any internal representation) is indexed by a medium nonnegative integer, resulting in an ASCII character.
Running the benchamarks/stats now...
str[int]
#107596