Skip to content

gh-95382: Use cache for indentations in the JSON encoder #118636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented May 6, 2024

It is a continuation of #118105 which keeps a cache of newline+indentation and comma+newline+indentation at every level.

For example:

$ ./python -m timeit -s 'import json; a = [[[{"key": "value"}]*10]*10]*10' 'json.dumps(a, indent=2)'
Before: 200 loops, best of 5: 1.79 msec per loop
After:  200 loops, best of 5: 1.25 msec per loop
$ ./python -m timeit -s 'import json; a = [[[list(range(10))]*10]*10]*10' 'json.dumps(a, indent=2)'
Before: 50 loops, best of 5: 5.45 msec per loop
After:  50 loops, best of 5: 4.78 msec per loop

The effect is the strongest if there are many deep narrow trees growing from the root:

$ ./python -m timeit -s 'import json; a = [[[["nested"]]]]*1000' 'json.dumps(a, indent=2)'
100 loops, best of 5: 3.39 msec per loop
200 loops, best of 5: 1.85 msec per loop

And the weakest (up to no difference) if the tree contains few large flat lists or dicts.

@serhiy-storchaka serhiy-storchaka force-pushed the json-encode-indent-cache branch from 8e82b42 to 2e915fd Compare May 6, 2024 08:46
if (self->indent != Py_None) {
newline_indent = _create_newline_indent(self->indent, indent_level);
if (newline_indent == NULL) {
indent_cache = create_indent_cache(self, indent_level);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When called via the public interface indent_level is always zero. By removing the argument indent_level in the internal implementation this call and some other bits of the code can be a bit simplified.

(this is probably not worth creating a separate PR for, I leave it up to you)

write_newline_indent(_PyUnicodeWriter *writer,
Py_ssize_t indent_level, PyObject *indent_cache)
{
PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2);
assert(indent_level * 2 < PyList_GET_SIZE(indent_cache));
PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert above is placed after conditional resizing the cache. It is useful to check that we always get the right cache size, independently from conditions. But here we do not change the cache.

return NULL;
}
}
assert(indent_level * 2 < PyList_GET_SIZE(indent_cache));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(indent_level * 2 < PyList_GET_SIZE(indent_cache));
assert(indent_level * 2 - 1 < PyList_GET_SIZE(indent_cache));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent_level * 2 < PyList_GET_SIZE(indent_cache) is more strong condition.

Copy link
Contributor

@eendebakpt eendebakpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds some extra code for creation and usage of the cache, but this simplifies the error handling in the methods using the cache. The performance improvement is a bit larger than I expected which is nice, although it is also fair to say that the performance when writing json files with indent != None is probably not the most important.

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) November 12, 2024 16:57
@serhiy-storchaka serhiy-storchaka merged commit 6b2a196 into python:main Nov 12, 2024
37 checks passed
picnixz pushed a commit to picnixz/cpython that referenced this pull request Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants