-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PEP 688: Making the buffer protocol accessible in Python #2549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
91927d5
First draft of Buffer PEP
JelleZijlstra 616c6c1
make it render
JelleZijlstra ca71541
Fix typo (#2)
hauntsaninja 49f6105
write some more
JelleZijlstra 671a307
fixes
JelleZijlstra e570c79
should not -> does not require
JelleZijlstra c8cd916
PEP 688
JelleZijlstra 6140324
lint
JelleZijlstra b4f02bb
rewording
JelleZijlstra 0fe1f6a
codeowners
JelleZijlstra f915584
more words, feedback from Adam
JelleZijlstra edcf640
reword first sentence
JelleZijlstra 5dd8910
add note on bytes/bytearray
JelleZijlstra 862105e
Apply CAM feedback
JelleZijlstra abb36f2
More CAM feedback
JelleZijlstra 052c49d
missed one
JelleZijlstra d32e0c6
Feedback from Alex
JelleZijlstra 9de8b60
More feedback
JelleZijlstra File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,275 @@ | ||
PEP: 688 | ||
Title: Making the buffer protocol accessible in Python | ||
Author: Jelle Zijlstra <[email protected]> | ||
Status: Draft | ||
Type: Standards Track | ||
Content-Type: text/x-rst | ||
Created: 23-Apr-2022 | ||
Python-Version: 3.12 | ||
|
||
|
||
Abstract | ||
======== | ||
|
||
This PEP proposes a mechanism for Python code to inspect whether a | ||
type supports the C-level buffer protocol. This allows type | ||
checkers to evaluate whether objects implement the protocol. | ||
|
||
|
||
Motivation | ||
========== | ||
|
||
The CPython C API provides a versatile mechanism for accessing the | ||
underlying memory of an object—the buffer protocol from :pep:`3118`. | ||
Functions that accept binary data are usually written to handle any | ||
object implementing the buffer protocol. For example, at the time of writing, | ||
there are around 130 functions in CPython using the Argument Clinic | ||
``Py_buffer`` type, which accepts the buffer protocol. | ||
|
||
Currently, there is no way for Python code to inspect whether an object | ||
supports the buffer protocol. Moreover, the static type system | ||
does not provide a type annotation to represent the protocol. | ||
This is a `common problem <https://github.com/python/typing/issues/593>`__ | ||
when writing type annotations for code that accepts generic buffers. | ||
|
||
|
||
Rationale | ||
========= | ||
|
||
Current options | ||
--------------- | ||
|
||
There are two current workarounds for annotating buffer types in | ||
the type system, but neither is adequate. | ||
|
||
First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__ | ||
for buffer types in typeshed is a type alias | ||
that lists well-known buffer types in the standard library, such as | ||
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This | ||
approach works for the standard library, but it does not extend to | ||
third-party buffer types. | ||
|
||
Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__ | ||
for ``typing.ByteString`` currently states: | ||
|
||
This type represents the types ``bytes``, ``bytearray``, and | ||
``memoryview`` of byte sequences. | ||
|
||
As a shorthand for this type, ``bytes`` can be used to annotate | ||
arguments of any of the types mentioned above. | ||
|
||
Although this sentence has been in the documentation | ||
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__, | ||
the use of ``bytes`` to include these other types is not specified | ||
in any of the typing PEPs. Furthermore, this mechanism has a number of | ||
problems. It does not include all possible buffer types, and it | ||
makes the ``bytes`` type ambiguous in type annotations. After all, | ||
there are many operations that are valid on ``bytes`` objects, but | ||
not on ``memoryview`` objects, and it is perfectly possible for | ||
a function to accept ``bytes`` but not ``memoryview`` objects. | ||
A mypy user | ||
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__ | ||
that this shortcut has caused significant problems for the ``psycopg`` project. | ||
|
||
Kinds of buffers | ||
---------------- | ||
|
||
The C buffer protocol supports | ||
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__, | ||
affecting strides, contiguity, and support for writing to the buffer. Some of these | ||
options would be useful in the type system. For example, typeshed | ||
currently provides separate type aliases for writable and read-only | ||
buffers. | ||
|
||
However, in the C buffer protocol, these options cannot be | ||
queried directly on the type object. The only way to figure out | ||
whether an object supports a writable buffer is to actually | ||
ask for the buffer. For some types, such as ``memoryview``, | ||
whether the buffer is writable depends on the instance: | ||
some instances are read-only and others are not. As such, we propose to | ||
expose only whether a type implements the buffer protocol at | ||
all, not whether it supports more specific options such as | ||
writable buffers. | ||
|
||
Specification | ||
============= | ||
|
||
types.Buffer | ||
------------ | ||
|
||
A new class, ``types.Buffer``, will be added. It cannot be instantiated or | ||
subclassed at runtime, but supports the ``__instancecheck__`` and | ||
``__subclasscheck__`` hooks. In CPython, these will check for the presence of the | ||
``bf_getbuffer`` slot in the type object: | ||
|
||
.. code-block:: pycon | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
>>> from types import Buffer | ||
>>> isinstance(b"xy", Buffer) | ||
True | ||
>>> issubclass(bytes, Buffer) | ||
True | ||
>>> issubclass(memoryview, Buffer) | ||
True | ||
>>> isinstance("xy", Buffer) | ||
False | ||
>>> issubclass(str, Buffer) | ||
False | ||
|
||
The new class can also be used in type annotations: | ||
|
||
.. code-block:: python | ||
|
||
def need_buffer(b: Buffer) -> memoryview: | ||
return memoryview(b) | ||
|
||
need_buffer(b"xy") # ok | ||
need_buffer("xy") # rejected by static type checkers | ||
|
||
Usage in stub files | ||
------------------- | ||
|
||
For static typing purposes, types defined in C extensions usually | ||
require stub files, as :pep:`described in PEP 484 <484#stub-files>`. | ||
In stub files, ``types.Buffer`` may be used as a base class to | ||
indicate that a class implements the buffer protocol. | ||
|
||
For example, ``memoryview`` may be declared as follows in a stub: | ||
|
||
.. code-block:: python | ||
|
||
class memoryview(types.Buffer, Sized, Sequence[int]): | ||
... | ||
|
||
The ``types.Buffer`` class does not require any special treatment | ||
by type checkers. | ||
|
||
Equivalent for older Python versions | ||
------------------------------------ | ||
|
||
New typing features are usually backported to older Python versions | ||
in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_ | ||
package. Because the buffer protocol | ||
is accessible only in C, ``types.Buffer`` cannot be implemented | ||
in a pure-Python package like ``typing_extensions``. As a temporary | ||
workaround, a ``typing_extensions.Buffer`` | ||
`abstract base class <Buffer ABC_>`__ will be provided for Python versions | ||
that do not have ``types.Buffer`` available. | ||
|
||
For the benefit of | ||
static type checkers, ``typing_extensions.Buffer`` can be used as | ||
a base class in stubs to mark types as supporting the buffer protocol. | ||
For runtime uses, the ``ABC.register`` API can be used to register | ||
buffer classes with ``typing_extensions.Buffer``. | ||
|
||
When ``types.Buffer`` is available, ``typing_extensions`` should simply | ||
re-export it. Thus, users who register their buffer class manually | ||
with ``typing_extensions.Buffer.register`` should use a guard to make | ||
sure their code continues to work once ``types.Buffer`` is in the | ||
standard library. | ||
|
||
|
||
No special meaning for ``bytes`` | ||
-------------------------------- | ||
|
||
The special case stating that ``bytes`` may be used as a shorthand | ||
for other ``ByteString`` types will be removed from the ``typing`` | ||
documentation. | ||
With ``types.Buffer`` available as an alternative, there will be no good | ||
reason to allow ``bytes`` as a shorthand. | ||
We suggest that type checkers currently implementing this behavior | ||
should deprecate and eventually remove it. | ||
|
||
|
||
Backwards Compatibility | ||
======================= | ||
|
||
As the runtime changes in this PEP only add a new class, there are | ||
no backwards compatibility concerns. | ||
|
||
However, the recommendation to remove the special behavior for | ||
``bytes`` in type checkers does have a backwards compatibility | ||
impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__ | ||
with mypy shows that several major open source projects that use it | ||
for type checking will see new errors if the ``bytes`` promotion | ||
is removed. Many of these errors can be fixed by improving | ||
the stubs in typeshed, as has already been done for the | ||
`builtins <https://github.com/python/typeshed/pull/7631>`__, | ||
`binascii <https://github.com/python/typeshed/pull/7677>`__, | ||
`pickle <https://github.com/python/typeshed/pull/7678>`__, and | ||
`re <https://github.com/python/typeshed/pull/7679>`__ modules. | ||
Overall, the change improves type safety and makes the type system | ||
more consistent, so we believe the migration cost is worth it. | ||
|
||
|
||
How to Teach This | ||
================= | ||
|
||
We will add notes pointing to ``types.Buffer`` in appropriate places in the | ||
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__ | ||
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__. | ||
Type checkers may provide additional pointers in their error messages. For example, | ||
JelleZijlstra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
when they encounter a buffer object being passed to a function that | ||
is annotated to only accept ``bytes``, the error message could include a note suggesting | ||
the use of ``types.Buffer`` instead. | ||
|
||
|
||
Reference Implementation | ||
======================== | ||
|
||
An implementation of ``types.Buffer`` is | ||
`available <https://github.com/python/cpython/compare/main...JelleZijlstra:typesbuffer?expand=1>`__ | ||
in the author's fork. | ||
|
||
|
||
Rejected Ideas | ||
============== | ||
|
||
Buffer ABC | ||
---------- | ||
|
||
An `earlier proposal <https://github.com/python/cpython/issues/71688>`__ suggested | ||
adding a ``collections.abc.Buffer`` | ||
`abstract base class <https://docs.python.org/3/glossary.html#term-abstract-base-class>`__ | ||
to represent buffer objects. This idea | ||
stalled because an ABC with no methods does not fit well into the ``collections.abc`` | ||
module. Furthermore, it required manual registration of buffer classes, including | ||
those in the standard library. This PEP's approach of using the ``__instancecheck__`` | ||
hook is more natural and does not require explicit registration. | ||
|
||
Nevertheless, the ABC proposal has the advantage that it does not require C changes. | ||
This PEP proposes to adopt a version of it in the third-party ``typing_extensions`` | ||
package for the benefit of users of older Python versions. | ||
|
||
Keep ``bytearray`` compatible with ``bytes`` | ||
-------------------------------------------- | ||
|
||
It has been suggested to remove the special case where ``memoryview`` is | ||
always compatible with ``bytes``, but keep it for ``bytearray``, because | ||
the two types have very similar interfaces. However, several standard | ||
library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept | ||
``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also | ||
not a very common type. We prefer to have users spell out accepted types | ||
explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of | ||
methods is required). | ||
|
||
|
||
Open Issues | ||
=========== | ||
|
||
Read-only and writable buffers | ||
------------------------------ | ||
|
||
To avoid making changes to the buffer protocol itself, this PEP currently | ||
does not provide a way to distinguish between read-only and writable buffers. | ||
That's unfortunate, because some APIs require a writable buffer, and one of | ||
the most common buffer types (``bytes``) is always read-only. | ||
Should we add a new mechanism in C to declare that a type implementing the | ||
buffer protocol is potentially writable? | ||
|
||
|
||
Copyright | ||
========= | ||
|
||
This document is placed in the public domain or under the | ||
CC0-1.0-Universal license, whichever is more permissive. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.