Skip to content

gh-128118: Improve performance of copy.copy by using a fast lookup for atomic and container types #128119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 9 additions & 18 deletions Lib/copy.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,15 @@ def copy(x):

cls = type(x)

copier = _copy_dispatch.get(cls)
if copier:
return copier(x)
if cls in _copy_atomic_types:
return x
if cls in _copy_builtin_containers:
return cls.copy(x)


if issubclass(cls, type):
# treat it as a regular class:
return _copy_immutable(x)
return x

copier = getattr(cls, "__copy__", None)
if copier is not None:
Expand All @@ -98,23 +100,12 @@ def copy(x):
return _reconstruct(x, None, *rv)


_copy_dispatch = d = {}

def _copy_immutable(x):
return x
for t in (types.NoneType, int, float, bool, complex, str, tuple,
_copy_atomic_types = {types.NoneType, int, float, bool, complex, str, tuple,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, would performance be better if we use a frozenset instead of a set? (and is it possible?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'll benchmark a bit later. A frozenset should not require any locking, so perhaps there is a difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment the set and frozenset have the same implementation for __contains__:

cpython/Objects/setobject.c

Lines 2529 to 2531 in 3bd7730

static PyMethodDef frozenset_methods[] = {
SET___CONTAINS___METHODDEF
FROZENSET_COPY_METHODDEF

cpython/Objects/setobject.c

Lines 2416 to 2420 in 3bd7730

static PyMethodDef set_methods[] = {
SET_ADD_METHODDEF
SET_CLEAR_METHODDEF
SET___CONTAINS___METHODDEF
SET_COPY_METHODDEF

so there is no performance difference. In the future however, for the free-threading build one could remove the critical section for the frozenset implementation here:

cpython/Objects/setobject.c

Lines 2198 to 2207 in 3bd7730

static int
set_contains(PyObject *self, PyObject *key)
{
PySetObject *so = _PySet_CAST(self);
return _PySet_Contains(so, key);
}
/*[clinic input]
@critical_section
@coexist

Using a frozenset is possible, but this would add a bit of time to the import. On my system %timeit frozenset(_copy_atomic_types) is about 300 ns

Even faster than a setwould be a data structure that looks only at the id of the objects involved (the set will use rich compare if no match is found, but that is not needed as all objects involved are singletons), but that is not available in cpython I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that is not available in cpython I believe.

That's right, it's not available.


Up to you if you want to make the free-threaded build faster in the future, but we should probably check the performances on this build. For now, let's keep the set for now (hopefully you'll rememeber this)

bytes, frozenset, type, range, slice, property,
types.BuiltinFunctionType, types.EllipsisType,
types.NotImplementedType, types.FunctionType, types.CodeType,
weakref.ref, super):
d[t] = _copy_immutable

d[list] = list.copy
d[dict] = dict.copy
d[set] = set.copy
d[bytearray] = bytearray.copy

del d, t
weakref.ref, super}
_copy_builtin_containers = {list, dict, set, bytearray}

def deepcopy(x, memo=None, _nil=[]):
"""Deep copy operation on arbitrary Python objects.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Improve performance of :func:`copy.copy` by 30% via
a fast path for atomic types and container types.
Loading