Skip to content

gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 29, 2025

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented May 13, 2025

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().


📚 Documentation preview 📚: https://cpython-previews--133973.org.readthedocs.build/

Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
@vstinner
Copy link
Member Author

JSON benchmark: #133832 (comment)

Benchmark ref change
encode 100 booleans 7.15 us 6.54 us: 1.09x faster
encode 100 integers 11.6 us 11.7 us: 1.01x slower
encode 100 "ascii" strings 13.4 us 13.2 us: 1.02x faster
encode escaped string len=128 1.11 us 1.10 us: 1.01x faster
encode 1000 booleans 39.3 us 32.9 us: 1.19x faster
encode Unicode string len=1000 4.93 us 4.94 us: 1.00x slower
encode 10000 booleans 343 us 286 us: 1.20x faster
encode ascii string len=10000 28.5 us 28.8 us: 1.01x slower
encode escaped string len=9984 38.7 us 38.9 us: 1.00x slower
encode Unicode string len=10000 42.6 us 42.4 us: 1.00x faster
Geometric mean (ref) 1.02x faster

Benchmark hidden because not significant (11): encode 100 floats, encode ascii string len=100, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode escaped string len=896, encode 10000 integers, encode 10000 floats, encode 10000 "ascii" strings

Up to 1.20x faster to encode booleans is interesting knowing that these strings are very short: "true" (4 characters) and "false" (5 characters).

@vstinner
Copy link
Member Author

The PyUnicodeWriter_WriteASCII() function is faster than PyUnicodeWriter_WriteUTF8(), but has an undefined behavior if the input string contains non-ASCII characters.

@serhiy-storchaka: What do you think of this function?

@vstinner
Copy link
Member Author

cc @ZeroIntensity

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits

@serhiy-storchaka
Copy link
Member

Well, we had _PyUnicodeWriter_WriteASCIIString for reasons.

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more? In theory, it can be made almost as fast as _PyUnicodeWriter_WriteASCIIString.

We can add private _PyUnicodeWriter_WriteASCII for now, to avoid regression in JSON encode, and then try to squeeze nanoseconds from PyUnicodeWriter_WriteUTF8. If we fail, we can add public PyUnicodeWriter_WriteASCII.

Co-authored-by: Peter Bierma <[email protected]>
@vstinner
Copy link
Member Author

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more?

I don't think that it can become as fast or faster than a function which takes ASCII string as argument. If we know that the input string is ASCII, there is no need to scan the string for non-ASCII characters, and we can take the fast path.

You're right that the UTF-8 decoder is already highly optimized.

@vstinner
Copy link
Member Author

In short:

  • PyUnicodeWriter_WriteUTF8() calls ascii_decode() which is an efficient ASCII decoder.
  • PyUnicodeWriter_WriteASCII() calls memcpy().

It's hard to beat memcpy() performance!

@serhiy-storchaka
Copy link
Member

Yes, although it was close, at least for moderately large strings. Could it be optimized even more? I don't know.

But decision about PyUnicodeWriter_WriteASCII should be made by the C API Workgroup. I'm not sure of my opinion yet. This API is unsafe.

@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#65 issue.

@vstinner
Copy link
Member Author

Benchmark:

write_utf8 size=10: Mean +- std dev: 153 ns +- 1 ns
write_utf8 size=100: Mean +- std dev: 174 ns +- 1 ns
write_utf8 size=1,000: Mean +- std dev: 279 ns +- 0 ns
write_utf8 size=10,000: Mean +- std dev: 1.36 us +- 0.00 us

write_ascii size=10: Mean +- std dev: 141 ns +- 0 ns
write_ascii size=100: Mean +- std dev: 149 ns +- 0 ns
write_ascii size=1,000: Mean +- std dev: 176 ns +- 3 ns
write_ascii size=10,000: Mean +- std dev: 690 ns +- 8 ns

On long strings (10,000 bytes), PyUnicodeWriter_WriteASCII() is up to 2x faster (1.36 us => 690 ns) than PyUnicodeWriter_WriteUTF8().

from _testcapi import PyUnicodeWriter
import pyperf

range_100 = range(100)

def bench_write_utf8(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)

def bench_write_ascii(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)

runner = pyperf.Runner()
sizes = (10, 100, 1_000, 10_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_utf8 size={size:,}', bench_write_utf8, text, size,
                      inner_loops=1_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_ascii size={size:,}', bench_write_ascii, text, size,
                      inner_loops=1_000)

@encukou
Copy link
Member

encukou commented May 15, 2025

Do we know where the bottleneck is for long strings?
Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

@vstinner
Copy link
Member Author

Do we know where the bottleneck is for long strings?

WriteUTF8() has to check for non-ASCII characters: this check has a cost. That's the bottleneck.

Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

Maybe, I don't know if it would be faster.

@vstinner
Copy link
Member Author

Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

I tried but failed to modify the code to copy while reading (checking if the string is encoded to ASCII). The code is quite complicated.

vstinner and others added 3 commits May 15, 2025 21:41
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to have this function public. I always preferred using the faster versions of the writer API when I hardcoded strings, but they were private.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, LGTM as well.

@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#65 issue.

The C API Working Group voted in favor of adding the function.

@vstinner vstinner enabled auto-merge (squash) May 29, 2025 14:40
@vstinner vstinner merged commit f49a07b into python:main May 29, 2025
39 checks passed
@vstinner vstinner deleted the write_ascii branch May 29, 2025 14:54
vstinner added a commit to vstinner/cpython that referenced this pull request May 31, 2025
…3973)

Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().

Unrelated change to please the linter: remove an unused
import in test_ctypes.

Co-authored-by: Peter Bierma <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
(cherry picked from commit f49a07b)
@bedevere-app
Copy link

bedevere-app bot commented May 31, 2025

GH-134974 is a backport of this pull request to the 3.14 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants