Skip to content

More efficient deallocation. #482

Open
@markshannon

Description

@markshannon

This is in part motivated by #402.
It is also an attempt to avoid the inefficiencies in python/cpython#27738
It also relates to #132.
it is also needed to implement python/cpython#98260 efficiently

Almost all objects end up on a freelist when de-allocated, about half in an explicit freelist, and the other half in an ob_malloc freelist.
However, the amount of indirection and overhead to get from _Py_Dealloc to adding something to the freelist can be huge. To free an int the following happens:

  • _Py_Dealloc calls PyLongType.tp_dealloc (via a function pointer, just to prevent the compiler doing its job 😞 )
  • PyLongType.tp_dealloc calls PyObject_Free (again via function pointer)
  • PyObject_Free calls _PyObject_Free (again via function pointer)
  • _PyObject_Freecalls pymalloc_free which:
    • Does a radix tree search to check that the object belongs to ob_malloc
    • Finds the pool to which the object belongs
    • Add the object to the pool's freelist
    • Do some pool management if the pool is now emtpy, or was previously full.

We want to do two things to improve performance.

  1. Get from Py_DECREF() to PyObject_Free more efficiently
  2. Get from PyObject_Free to putting the memory on the freelist more efficiently.

Getting from Py_DECREF() to PyObject_Free more efficiently

Rather than every extension class writing its own dealloc and free functions, types should set flags to indicate whether they:

  • Are just bits of memory and need no dealloc, e.g. ints, floats.
  • Need deallocation of the objects and memory they contain, but do not need finalization
  • Have explicitly separate deallocation and finalization functions.
  • Legacy code, with a tp_dealloc function that can do anything.

We need two bits in tp_flags to express this.

For objects that are just lumps of memory we can set tp_dealloc to point to PyObject_Free avoiding the extra indirection.
The other cases would get their own function pointers, but would can do some of the dispatching at class creation time, not at object deallocation time.

Getting from PyObject_Free to putting the memory on the freelist more efficiently.

See #132 for implementation details of freelists.

We need to compute the size of the object quickly to determine the freelist to use.
Any class that uses the standard allocator PyType_GenericAlloc can have its size computed reliably.
Other classes would need to use the current generic approach, possibly with a few customizations

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions