Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
daf8e02
gh-139871: Update bytearray to contain PyBytesObject
cmaloney Oct 3, 2025
39b2d15
📜🤖 Added by blurb_it.
blurb-it[bot] Oct 14, 2025
86faf1d
Add bytearray.take_bytes
cmaloney Oct 3, 2025
a9328f4
Merge branch 'main' into bytearray_bytes
cmaloney Oct 15, 2025
e9f5ca9
Update Objects/bytearrayobject.c
cmaloney Oct 15, 2025
4784957
Review fixes
cmaloney Oct 15, 2025
db19def
Merge branch 'main' into bytearray_bytes
cmaloney Oct 15, 2025
451c302
Update Objects/bytearrayobject.c
cmaloney Oct 17, 2025
bab7151
Add tests around alloc and getsizeof that show clearing isn't working…
cmaloney Oct 15, 2025
cb2377c
Fix resizing to 0 length / clearing leaving one byte alloc
cmaloney Oct 15, 2025
20175f8
review fix: handle NULL return from from PyBytes_FromStringAndSize
cmaloney Oct 18, 2025
e485595
Add take_bytes to test_free_threading
cmaloney Oct 18, 2025
b5535d0
Missed line...
cmaloney Oct 18, 2025
7c6e8a8
Simplify getting out ob_bytes
cmaloney Oct 18, 2025
4e27d13
Include PyBytesObject in __alloc__ of bytearray.
cmaloney Oct 18, 2025
9887dad
Apply suggestion from @vstinner
cmaloney Oct 27, 2025
6e4b910
Don't multiply by sizeof(char) as it's always 1
cmaloney Oct 27, 2025
b6f8403
Rely on bytes for end of buffer NULL
cmaloney Oct 27, 2025
28cb8c5
Personal review fixes
cmaloney Oct 27, 2025
f03b895
Simplify resize error handling
cmaloney Oct 27, 2025
a45f3c2
Use right PyLong constructor
cmaloney Oct 27, 2025
c8943e3
Add a define for max bytearray size, comment size=0
cmaloney Oct 29, 2025
5bffb7e
Remove oold comment
cmaloney Oct 29, 2025
583ea4b
Update test_capi.test_bytearray for MemoryError vs OverflowError
cmaloney Oct 29, 2025
8ee14e6
More accurate size and alloc calculation
cmaloney Oct 29, 2025
d70e369
Comment and minor doc tweaks
cmaloney Oct 29, 2025
97be818
Apply suggestion from @vstinner
cmaloney Oct 29, 2025
99e49ef
Remove _PyByteArray_empty_string, add bytearray_reinit_from_bytes
cmaloney Oct 29, 2025
f4b62d9
Update Stable API concerns: restore _empty_string an dmove _PyBytesOb…
cmaloney Oct 29, 2025
48afb62
Restore _PyByteArray_empty_string in .c file
cmaloney Oct 29, 2025
8c81e03
remove line that shouldn't have been added
cmaloney Oct 29, 2025
313e78c
Apply suggestion from @encukou
cmaloney Oct 30, 2025
c028e2b
Remove original variable, no longer used
cmaloney Oct 30, 2025
a69b338
remove _PyByteArray_empty_string
cmaloney Oct 30, 2025
2a95118
Add take_bytes_n free-threading test
cmaloney Oct 30, 2025
9680e8a
Expand comment for ob_alloc
cmaloney Oct 31, 2025
02882af
Add note on memmove tradeoff
cmaloney Oct 31, 2025
b67d10c
Move suggested optimizing refactors to whatsnew
cmaloney Oct 31, 2025
6db8822
Remove unintended change
cmaloney Oct 31, 2025
fb84c14
Fix intermittent failure on deallocation from uninitialized memory
cmaloney Nov 4, 2025
0258891
PEP 7
cmaloney Nov 4, 2025
c470178
Tweak comment
cmaloney Nov 4, 2025
9681135
Add +1 so allocation is over max byte length
cmaloney Nov 7, 2025
442692a
Minor tweak for flow in whatsnew entry
cmaloney Nov 7, 2025
ee0d6d6
Apply suggestions from code review
cmaloney Nov 7, 2025
cc238c2
Merge branch 'main' into bytearray_bytes
encukou Nov 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3173,6 +3173,30 @@ objects.

.. versionadded:: 3.14

.. method:: take_bytes(n=None, /)

Remove the first *n* bytes from the bytearray and return them as an immutable
:class:`bytes`.
By default (if *n* is ``None``), return all bytes and clear the bytearray.

If *n* is negative, index from the end and take the first :func:`len`
plus *n* bytes. If *n* is out of bounds, raise :exc:`IndexError`.

Taking less than the full length will leave remaining bytes in the
:class:`bytearray`, which requires a copy. If the remaining bytes should be
discarded, use :func:`~bytearray.resize` or :keyword:`del` to truncate
then :func:`~bytearray.take_bytes` without a size.

.. impl-detail::

Taking all bytes is a zero-copy operation.

.. versionadded:: next

See the :ref:`What's New <whatsnew315-bytearray-take-bytes>` entry for
common code patterns which can be optimized with
:func:`bytearray.take_bytes`.

Since bytearray objects are sequences of integers (akin to a list), for a
bytearray object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be
a bytearray object of length 1. (This contrasts with text strings, where
Expand Down
80 changes: 80 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,86 @@ Other language changes
not only integers or floats, although this does not improve precision.
(Contributed by Serhiy Storchaka in :gh:`67795`.)

.. _whatsnew315-bytearray-take-bytes:

* Added :meth:`bytearray.take_bytes(n=None, /) <bytearray.take_bytes>` to take
bytes out of a :class:`bytearray` without copying. This enables optimizing code
which must return :class:`bytes` after working with a mutable buffer of bytes
such as data buffering, network protocol parsing, encoding, decoding,
and compression. Common code patterns which can be optimized with
:func:`~bytearray.take_bytes` are listed below.

(Contributed by Cody Maloney in :gh:`139871`.)

.. list-table:: Suggested Optimizing Refactors
:header-rows: 1

* - Description
- Old
- New

* - Return :class:`bytes` after working with :class:`bytearray`
- .. code:: python

def read() -> bytes:
buffer = bytearray(1024)
...
return bytes(buffer)

- .. code:: python

def read() -> bytes:
buffer = bytearray(1024)
...
return buffer.take_bytes()

* - Empty a buffer getting the bytes
- .. code:: python

buffer = bytearray(1024)
...
data = bytes(buffer)
buffer.clear()

- .. code:: python

buffer = bytearray(1024)
...
data = buffer.take_bytes()

* - Split a buffer at a specific separator
- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = bytes(buffer[:n + 1])
del buffer[:n + 1]
assert data == b'abc'
assert buffer == bytearray(b'def')

- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = buffer.take_bytes(n + 1)

* - Split a buffer at a specific separator; discard after the separator
- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = bytes(buffer[:n])
buffer.clear()
assert data == b'abc'
assert len(buffer) == 0

- .. code:: python

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
buffer.resize(n)
data = buffer.take_bytes()

* Many functions related to compiling or parsing Python code, such as
:func:`compile`, :func:`ast.parse`, :func:`symtable.symtable`,
and :func:`importlib.abc.InspectLoader.source_to_code`, now allow to pass
Expand Down
16 changes: 8 additions & 8 deletions Include/cpython/bytearrayobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,25 @@
/* Object layout */
typedef struct {
PyObject_VAR_HEAD
Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */
/* How many bytes allocated in ob_bytes

In the current implementation this is equivalent to Py_SIZE(ob_bytes_object).
The value is always loaded and stored atomically for thread safety.
There are API compatibilty concerns with removing so keeping for now. */
Py_ssize_t ob_alloc;
char *ob_bytes; /* Physical backing buffer */
char *ob_start; /* Logical start inside ob_bytes */
Py_ssize_t ob_exports; /* How many buffer exports */
PyObject *ob_bytes_object; /* PyBytes for zero-copy bytes conversion */
} PyByteArrayObject;

PyAPI_DATA(char) _PyByteArray_empty_string[];

/* Macros and static inline functions, trading safety for speed */
#define _PyByteArray_CAST(op) \
(assert(PyByteArray_Check(op)), _Py_CAST(PyByteArrayObject*, op))

static inline char* PyByteArray_AS_STRING(PyObject *op)
{
PyByteArrayObject *self = _PyByteArray_CAST(op);
if (Py_SIZE(self)) {
return self->ob_start;
}
return _PyByteArray_empty_string;
return _PyByteArray_CAST(op)->ob_start;
}
#define PyByteArray_AS_STRING(self) PyByteArray_AS_STRING(_PyObject_CAST(self))

Expand Down
8 changes: 8 additions & 0 deletions Include/internal/pycore_bytesobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,14 @@ PyAPI_FUNC(void)
_PyBytes_Repeat(char* dest, Py_ssize_t len_dest,
const char* src, Py_ssize_t len_src);

/* _PyBytesObject_SIZE gives the basic size of a bytes object; any memory allocation
for a bytes object of length n should request PyBytesObject_SIZE + n bytes.

Using _PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves
3 or 7 bytes per bytes object allocation on a typical system.
*/
#define _PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1)

/* --- PyBytesWriter ------------------------------------------------------ */

struct PyBytesWriter {
Expand Down
81 changes: 81 additions & 0 deletions Lib/test/test_bytes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1397,6 +1397,16 @@ def test_clear(self):
b.append(ord('p'))
self.assertEqual(b, b'p')

# Cleared object should be empty.
b = bytearray(b'abc')
b.clear()
self.assertEqual(b.__alloc__(), 0)
base_size = sys.getsizeof(bytearray())
self.assertEqual(sys.getsizeof(b), base_size)
c = b.copy()
self.assertEqual(c.__alloc__(), 0)
self.assertEqual(sys.getsizeof(c), base_size)

def test_copy(self):
b = bytearray(b'abc')
bb = b.copy()
Expand Down Expand Up @@ -1458,6 +1468,61 @@ def test_resize(self):
self.assertRaises(MemoryError, bytearray().resize, sys.maxsize)
self.assertRaises(MemoryError, bytearray(1000).resize, sys.maxsize)

def test_take_bytes(self):
ba = bytearray(b'ab')
self.assertEqual(ba.take_bytes(), b'ab')
self.assertEqual(len(ba), 0)
self.assertEqual(ba, bytearray(b''))
self.assertEqual(ba.__alloc__(), 0)
base_size = sys.getsizeof(bytearray())
self.assertEqual(sys.getsizeof(ba), base_size)

# Positive and negative slicing.
ba = bytearray(b'abcdef')
self.assertEqual(ba.take_bytes(1), b'a')
self.assertEqual(ba, bytearray(b'bcdef'))
self.assertEqual(len(ba), 5)
self.assertEqual(ba.take_bytes(-5), b'')
self.assertEqual(ba, bytearray(b'bcdef'))
self.assertEqual(len(ba), 5)
self.assertEqual(ba.take_bytes(-3), b'bc')
self.assertEqual(ba, bytearray(b'def'))
self.assertEqual(len(ba), 3)
self.assertEqual(ba.take_bytes(3), b'def')
self.assertEqual(ba, bytearray(b''))
self.assertEqual(len(ba), 0)

# Take nothing from emptiness.
self.assertEqual(ba.take_bytes(0), b'')
self.assertEqual(ba.take_bytes(), b'')
self.assertEqual(ba.take_bytes(None), b'')

# Out of bounds, bad take value.
self.assertRaises(IndexError, ba.take_bytes, -1)
self.assertRaises(TypeError, ba.take_bytes, 3.14)
ba = bytearray(b'abcdef')
self.assertRaises(IndexError, ba.take_bytes, 7)

# Offset between physical and logical start (ob_bytes != ob_start).
ba = bytearray(b'abcde')
del ba[:2]
self.assertEqual(ba, bytearray(b'cde'))
self.assertEqual(ba.take_bytes(), b'cde')

# Overallocation at end.
ba = bytearray(b'abcde')
del ba[-2:]
self.assertEqual(ba, bytearray(b'abc'))
self.assertEqual(ba.take_bytes(), b'abc')
ba = bytearray(b'abcde')
ba.resize(4)
self.assertEqual(ba.take_bytes(), b'abcd')

# Take of a bytearray with references should fail.
ba = bytearray(b'abc')
with memoryview(ba) as mv:
self.assertRaises(BufferError, ba.take_bytes)
self.assertEqual(ba.take_bytes(), b'abc')

def test_setitem(self):
def setitem_as_mapping(b, i, val):
Expand Down Expand Up @@ -2564,6 +2629,18 @@ def zfill(b, a):
c = a.zfill(0x400000)
assert not c or c[-1] not in (0xdd, 0xcd)

def take_bytes(b, a): # MODIFIES!
b.wait()
c = a.take_bytes()
assert not c or c[0] == 48 # '0'

def take_bytes_n(b, a): # MODIFIES!
b.wait()
try:
c = a.take_bytes(10)
assert c == b'0123456789'
except IndexError: pass

def check(funcs, a=None, *args):
if a is None:
a = bytearray(b'0' * 0x400000)
Expand Down Expand Up @@ -2625,6 +2702,10 @@ def check(funcs, a=None, *args):
check([clear] + [startswith] * 10)
check([clear] + [strip] * 10)

check([clear] + [take_bytes] * 10)
check([take_bytes_n] * 10, bytearray(b'0123456789' * 0x400))
check([take_bytes_n] * 10, bytearray(b'0123456789' * 5))

check([clear] + [contains] * 10)
check([clear] + [subscript] * 10)
check([clear2] + [ass_subscript2] * 10, None, bytearray(b'0' * 0x400000))
Expand Down
5 changes: 4 additions & 1 deletion Lib/test/test_capi/test_bytearray.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import sys
import unittest
from test.support import import_helper

Expand Down Expand Up @@ -55,7 +56,9 @@ def test_fromstringandsize(self):
self.assertEqual(fromstringandsize(b'', 0), bytearray())
self.assertEqual(fromstringandsize(NULL, 0), bytearray())
self.assertEqual(len(fromstringandsize(NULL, 3)), 3)
self.assertRaises(MemoryError, fromstringandsize, NULL, PY_SSIZE_T_MAX)
self.assertRaises(OverflowError, fromstringandsize, NULL, PY_SSIZE_T_MAX)
self.assertRaises(OverflowError, fromstringandsize, NULL,
PY_SSIZE_T_MAX-sys.getsizeof(b'') + 1)

self.assertRaises(SystemError, fromstringandsize, b'abc', -1)
self.assertRaises(SystemError, fromstringandsize, b'abc', PY_SSIZE_T_MIN)
Expand Down
2 changes: 1 addition & 1 deletion Lib/test/test_sys.py
Original file line number Diff line number Diff line change
Expand Up @@ -1583,7 +1583,7 @@ def test_objecttypes(self):
samples = [b'', b'u'*100000]
for sample in samples:
x = bytearray(sample)
check(x, vsize('n2Pi') + x.__alloc__())
check(x, vsize('n2PiP') + x.__alloc__())
# bytearray_iterator
check(iter(bytearray()), size('nP'))
# bytes
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Update :class:`bytearray` to use a :class:`bytes` under the hood as its buffer
and add :func:`bytearray.take_bytes` to take it out.
Loading
Loading