Description
This is in part motivated by #402.
It is also an attempt to avoid the inefficiencies in python/cpython#27738
It also relates to #132.
it is also needed to implement python/cpython#98260 efficiently
Almost all objects end up on a freelist when de-allocated, about half in an explicit freelist, and the other half in an ob_malloc
freelist.
However, the amount of indirection and overhead to get from _Py_Dealloc
to adding something to the freelist can be huge. To free an int the following happens:
_Py_Dealloc
callsPyLongType.tp_dealloc
(via a function pointer, just to prevent the compiler doing its job 😞 )PyLongType.tp_dealloc
callsPyObject_Free
(again via function pointer)PyObject_Free
calls_PyObject_Free
(again via function pointer)_PyObject_Free
callspymalloc_free
which:- Does a radix tree search to check that the object belongs to
ob_malloc
- Finds the pool to which the object belongs
- Add the object to the pool's freelist
- Do some pool management if the pool is now emtpy, or was previously full.
- Does a radix tree search to check that the object belongs to
We want to do two things to improve performance.
- Get from
Py_DECREF()
toPyObject_Free
more efficiently - Get from
PyObject_Free
to putting the memory on the freelist more efficiently.
Getting from Py_DECREF()
to PyObject_Free
more efficiently
Rather than every extension class writing its own dealloc and free functions, types should set flags to indicate whether they:
- Are just bits of memory and need no dealloc, e.g. ints, floats.
- Need deallocation of the objects and memory they contain, but do not need finalization
- Have explicitly separate deallocation and finalization functions.
- Legacy code, with a
tp_dealloc
function that can do anything.
We need two bits in tp_flags
to express this.
For objects that are just lumps of memory we can set tp_dealloc
to point to PyObject_Free
avoiding the extra indirection.
The other cases would get their own function pointers, but would can do some of the dispatching at class creation time, not at object deallocation time.
Getting from PyObject_Free
to putting the memory on the freelist more efficiently.
See #132 for implementation details of freelists.
We need to compute the size of the object quickly to determine the freelist to use.
Any class that uses the standard allocator PyType_GenericAlloc
can have its size computed reliably.
Other classes would need to use the current generic approach, possibly with a few customizations