From 91927d5f27fb75d046db8d58b8946bc049a5d957 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Thu, 21 Apr 2022 20:46:31 -0700 Subject: [PATCH 01/18] First draft of Buffer PEP --- pep-9999.rst | 229 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 229 insertions(+) create mode 100644 pep-9999.rst diff --git a/pep-9999.rst b/pep-9999.rst new file mode 100644 index 00000000000..b09ab716a07 --- /dev/null +++ b/pep-9999.rst @@ -0,0 +1,229 @@ +PEP: +Title: Making the buffer protocol accessible in Python +Author: Jelle Zijlstra +Sponsor: Jelle Zijlstra +Discussions-To: +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: +Python-Version: 3.12 +Post-History: + + +Abstract +======== + +This PEP proposes a mechanism to inspect in Python whether a type implements +the C-level buffer protocol. + + +Motivation +========== + +The CPython C API provides a versatile mechanism for accessing the +underlying memory of an object, the buffer protocol. Functions that +accept binary data are usually written to accept any object implementing +the buffer protocol. For example, there are about 130 functions in +CPython using the Argument Clinic ``Py_buffer`` type, which accepts +the buffer protocol. + +Currently, there is no way to inspect in Python whether an object +implements the buffer protocol. Relatedly, the static type system +does not provide a type annotation to represent the protocol. +This is a common problem when type annotating code that accepts +generic buffers. + + +Rationale +========= + +Current options +--------------- + +There are two current workarounds for annotating buffer types in +the type system, but neither is adequate. + +First, the current workaround for buffer types in typeshed is a type alias +that lists well-known buffer types in the standard library, such as +``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This +approach works for the standard library, but it does not work for +third-party buffer types. + +Second, the documentation for ``typing.ByteString`` currently states: + + This type represents the types ``bytes``, ``bytearray``, and + ``memoryview`` of byte sequences. + + As a shorthand for this type, ``bytes`` can be used to annotate + arguments of any of the types mentioned above. + +Although this sentence has been in the documentation since 2015, +the use of ``bytes`` to include these other types is not specified +in any of the typing PEPs. However, this mechanism has a number of +problems. It does not include all possible buffer types, and it +makes the ``bytes`` type ambiguous in type annotations. After all, +there are many operations that are valid on ``bytes`` objects, but +not on ``memoryview`` objects, and it is perfectly possible for +a function to accept ``bytes`` but not ``memoryview`` objects. + +Kinds of buffers +---------------- + +The C buffer protocol supports many options, affecting strides, +contiguity, and support for writing to the buffer. Some of these +options would be useful in the type system. For example, typeshed +currently provides separate type aliases for writable and read-only +buffers. + +However, in the C buffer protocol, these options cannot be +queried directly on the type object. The only way to figure out +whether an object supports a writable buffer is to actually +ask for the buffer. For some types, such as ``memoryview``, +whether the buffer is writable depends on the exact object: +some are read-only and others are not. As such, I propose to +support only whether a type implements the buffer protocol at +all, not whether it supports more specific options such as +writable buffers. + +Specification +============= + +types.Buffer +------------ + +A new class ``types.Buffer`` will be added. It cannot be instantiated or +subclassed, but supports the ``__instancecheck__`` and +``__subclasscheck__`` hooks. In CPython, these will check for the presence of the +``bf_getbuffer`` slot in the type object: + +.. code-block:: pycon + + >>> from types import Buffer + >>> isinstance(b"xy", Buffer) + True + >>> issubclass(bytes, Buffer) + True + >>> issubclass(memoryview, Buffer) + True + >>> isinstance("xy", Buffer) + False + >>> issubclass(str, Buffer) + False + +The new class can also be used in type annotations: + +.. code-block:: python + + def need_buffer(b: Buffer) -> memoryview: + return memoryview(b) + + need_buffer(b"xy") # ok + need_buffer("xy") # rejected by static type checkers + +Usage in stubs +-------------- + +For static typing purposes, types defined in C extensions usually +require stub files, as described in :pep:`484`. In stub files, +``types.Buffer`` may be used as a base class to indicate that a +class implements the buffer protocol. + +For example, ``memoryview`` may be declared as follows in a stub: + +.. code-block:: python + + class bytes(types.Buffer, Sequence[int]): + def decode(self, ...): ... + ... + +Static type checkers should not give any special treatment to +this class. + +Equivalent for older Python versions +------------------------------------ + +New typing features are usually backported to older Python versions +in the ``typing_extensions`` package. Because the buffer protocol +is accessible only in C, ``types.Buffer`` cannot be implemented +in a pure Python package. As a temporary workaround, a +``typing_extensions.Buffer`` ABC will be provided on Python versions +that do not have ``types.Buffer`` available. For the benefit of +static type checkers, ``typing_extensions.Buffer`` can be used as +a base class in stubs to mark types as supporting the buffer protocol. +For runtime uses, the ``ABC.register`` API can be used to register +buffer classes with ``typing_extensions.Buffer``. + + +No special meaning for ``bytes`` +-------------------------------- + +The documentation for ``typing.ByteString`` currently states: + + This type represents the types ``bytes``, ``bytearray``, and + ``memoryview`` of byte sequences. + + As a shorthand for this type, ``bytes`` can be used to annotate + arguments of any of the types mentioned above. + +The behavior in the second paragraph was not specified in :pep:`484` +or any subsequent PEP. We propose to remove it from the documentation. +With ``types.Buffer`` available as an alternative, there is no good +reason to allow ``bytes`` as a shorthand. +Type checkers that implement this behavior should deprecate and +eventually remove it. + + +Backwards Compatibility +======================= + +[Describe potential impact and severity on pre-existing code.] + + +Security Implications +===================== + +None. + + +How to Teach This +================= + +[How to teach users, new and experienced, how to apply the PEP to their work.] + + +Reference Implementation +======================== + +[Link to any existing implementation and details about its state, e.g. proof-of-concept.] + + +Rejected Ideas +============== + +[Why certain ideas that were brought while discussing this PEP were not ultimately pursued.] + + +Open Issues +=========== + +[Any points that are still being decided/discussed.] + + +Footnotes +========= + +[A collection of footnotes cited in the PEP, and a place to list non-inline hyperlink targets.] + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + +.. notes +.. https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f (added ByteString verbiage) +.. https://github.com/python/typing/issues/593 +.. https://github.com/python/cpython/issues/71688 (proposed Buffer ABC) +.. https://github.com/python/mypy/issues/12643 (user report sad about current bytes behavior) From 616c6c1925916fa25edde894aca88a544b6558f8 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Thu, 21 Apr 2022 20:59:43 -0700 Subject: [PATCH 02/18] make it render --- pep-9999.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-9999.rst b/pep-9999.rst index b09ab716a07..0bd83880b3e 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -1,4 +1,4 @@ -PEP: +PEP: 9999 Title: Making the buffer protocol accessible in Python Author: Jelle Zijlstra Sponsor: Jelle Zijlstra From ca71541614f9c37cf361509aee6248f0b24593df Mon Sep 17 00:00:00 2001 From: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Date: Thu, 21 Apr 2022 21:01:42 -0700 Subject: [PATCH 03/18] Fix typo (#2) --- pep-9999.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-9999.rst b/pep-9999.rst index 0bd83880b3e..490f1721f76 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -129,7 +129,7 @@ require stub files, as described in :pep:`484`. In stub files, ``types.Buffer`` may be used as a base class to indicate that a class implements the buffer protocol. -For example, ``memoryview`` may be declared as follows in a stub: +For example, ``bytes`` may be declared as follows in a stub: .. code-block:: python From 49f61053f37df9f30d83e545c2f910f6ad0bc8d2 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 09:42:30 -0700 Subject: [PATCH 04/18] write some more --- pep-9999.rst | 109 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 66 insertions(+), 43 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 490f1721f76..250df0e28be 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -15,24 +15,25 @@ Abstract ======== This PEP proposes a mechanism to inspect in Python whether a type implements -the C-level buffer protocol. +the C-level buffer protocol. This allows type checkers to check for +objects that implement the protocol. Motivation ========== The CPython C API provides a versatile mechanism for accessing the -underlying memory of an object, the buffer protocol. Functions that -accept binary data are usually written to accept any object implementing -the buffer protocol. For example, there are about 130 functions in -CPython using the Argument Clinic ``Py_buffer`` type, which accepts -the buffer protocol. +underlying memory of an object, the buffer protocol from :pep:`3118`. +Functions that accept binary data are usually written to accept any +object implementing the buffer protocol. For example, as I write this, +there are about 130 functions in CPython using the Argument Clinic +``Py_buffer`` type, which accepts the buffer protocol. Currently, there is no way to inspect in Python whether an object implements the buffer protocol. Relatedly, the static type system does not provide a type annotation to represent the protocol. -This is a common problem when type annotating code that accepts -generic buffers. +This is a `common problem __` +when type annotating code that accepts generic buffers. Rationale @@ -44,13 +45,15 @@ Current options There are two current workarounds for annotating buffer types in the type system, but neither is adequate. -First, the current workaround for buffer types in typeshed is a type alias +First, the `current workaround __` +for buffer types in typeshed is a type alias that lists well-known buffer types in the standard library, such as ``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This -approach works for the standard library, but it does not work for +approach works for the standard library, but it does not extend to third-party buffer types. -Second, the documentation for ``typing.ByteString`` currently states: +Second, the `documentation __` +for ``typing.ByteString`` currently states: This type represents the types ``bytes``, ``bytearray``, and ``memoryview`` of byte sequences. @@ -58,20 +61,25 @@ Second, the documentation for ``typing.ByteString`` currently states: As a shorthand for this type, ``bytes`` can be used to annotate arguments of any of the types mentioned above. -Although this sentence has been in the documentation since 2015, +Although this sentence has been in the documentation +`since 2015 __`, the use of ``bytes`` to include these other types is not specified -in any of the typing PEPs. However, this mechanism has a number of +in any of the typing PEPs. Furthermore, this mechanism has a number of problems. It does not include all possible buffer types, and it makes the ``bytes`` type ambiguous in type annotations. After all, there are many operations that are valid on ``bytes`` objects, but not on ``memoryview`` objects, and it is perfectly possible for a function to accept ``bytes`` but not ``memoryview`` objects. +A mypy user +`reports __` +that this shortcut has caused significant problems for the `psycopg` project. Kinds of buffers ---------------- -The C buffer protocol supports many options, affecting strides, -contiguity, and support for writing to the buffer. Some of these +The C buffer protocol supports +`many options `__, +affecting strides, contiguity, and support for writing to the buffer. Some of these options would be useful in the type system. For example, typeshed currently provides separate type aliases for writable and read-only buffers. @@ -80,8 +88,8 @@ However, in the C buffer protocol, these options cannot be queried directly on the type object. The only way to figure out whether an object supports a writable buffer is to actually ask for the buffer. For some types, such as ``memoryview``, -whether the buffer is writable depends on the exact object: -some are read-only and others are not. As such, I propose to +whether the buffer is writable depends on the instance: +some instances are read-only and others are not. As such, I propose to support only whether a type implements the buffer protocol at all, not whether it supports more specific options such as writable buffers. @@ -152,22 +160,17 @@ that do not have ``types.Buffer`` available. For the benefit of static type checkers, ``typing_extensions.Buffer`` can be used as a base class in stubs to mark types as supporting the buffer protocol. For runtime uses, the ``ABC.register`` API can be used to register -buffer classes with ``typing_extensions.Buffer``. +buffer classes with ``typing_extensions.Buffer``. When +``types.Buffer`` is available, ``typing_extensions`` should simply +re-export it. No special meaning for ``bytes`` -------------------------------- -The documentation for ``typing.ByteString`` currently states: - - This type represents the types ``bytes``, ``bytearray``, and - ``memoryview`` of byte sequences. - - As a shorthand for this type, ``bytes`` can be used to annotate - arguments of any of the types mentioned above. - -The behavior in the second paragraph was not specified in :pep:`484` -or any subsequent PEP. We propose to remove it from the documentation. +The special case stating that ``bytes`` may be used as a shorthand +for other ``ByteString`` types will be removed from the ``typing`` +documentation. With ``types.Buffer`` available as an alternative, there is no good reason to allow ``bytes`` as a shorthand. Type checkers that implement this behavior should deprecate and @@ -177,7 +180,14 @@ eventually remove it. Backwards Compatibility ======================= -[Describe potential impact and severity on pre-existing code.] +As the runtime changes in this PEP only add a new class, there are +no backwards compatibility concerns. + +However, the recommendation to remove the special behavior for +``bytes`` in type checkers does have backwards compatibility +impact on users. + +.. TODO: https://github.com/python/mypy/pull/12661 Security Implications @@ -189,7 +199,13 @@ None. How to Teach This ================= -[How to teach users, new and experienced, how to apply the PEP to their work.] +We will add notes pointing to ``types.Buffer`` to appropriate places in the +documentation, such as `typing.readthedocs.io __` +and the `mypy cheat sheet __`. +Type checkers may provide additional pointers in their error messages. For example, +when they encounter a place where a buffer object is passed to a function that +is annotated to only accept ``bytes``, the error message could include a note suggesting +to use ``types.Buffer`` instead. Reference Implementation @@ -201,19 +217,32 @@ Reference Implementation Rejected Ideas ============== -[Why certain ideas that were brought while discussing this PEP were not ultimately pursued.] +Buffer ABC +---------- + +An `earlier proposal __` suggested +adding a ``collections.abc.Buffer`` ABC to represent buffer objects. This idea +stalled because an ABC with no methods does not fit well into the ``collections.abc`` +module. Furthermore, it required manual registration of buffer classes, including +those in the standard library. This PEP's approach of using the ``__instancecheck__`` +hook is more natural and does not require explicit registration. +Nevertheless, the ABC proposal has the advantage that it does not require C changes, +and we are proposing to adopt a version of it in the third-party ``typing_extensions`` +package for the benefit of users of older versions of Python. Open Issues =========== -[Any points that are still being decided/discussed.] - +Read-only and writable buffers +------------------------------ -Footnotes -========= - -[A collection of footnotes cited in the PEP, and a place to list non-inline hyperlink targets.] +To avoid making changes to the buffer protocol itself, this PEP currently +does not provide a way to distinguish between read-only and writable buffers. +That's unfortunate, because some APIs require a writable buffer, and one of +the most common buffer types (``bytes``) is always read-only. +Should we add a new mechanism in C to declare that a type implementing the +buffer protocol is always read-only? Copyright @@ -221,9 +250,3 @@ Copyright This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. - -.. notes -.. https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f (added ByteString verbiage) -.. https://github.com/python/typing/issues/593 -.. https://github.com/python/cpython/issues/71688 (proposed Buffer ABC) -.. https://github.com/python/mypy/issues/12643 (user report sad about current bytes behavior) From 671a3078b3352f87aeb33aa079f90915e0fd5577 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 09:45:02 -0700 Subject: [PATCH 05/18] fixes --- pep-9999.rst | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 250df0e28be..5f0ca58aa68 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -2,13 +2,11 @@ PEP: 9999 Title: Making the buffer protocol accessible in Python Author: Jelle Zijlstra Sponsor: Jelle Zijlstra -Discussions-To: Status: Draft Type: Standards Track Content-Type: text/x-rst -Created: +Created: 23-Apr-2022 Python-Version: 3.12 -Post-History: Abstract @@ -32,7 +30,7 @@ there are about 130 functions in CPython using the Argument Clinic Currently, there is no way to inspect in Python whether an object implements the buffer protocol. Relatedly, the static type system does not provide a type annotation to represent the protocol. -This is a `common problem __` +This is a `common problem `__ when type annotating code that accepts generic buffers. @@ -45,14 +43,14 @@ Current options There are two current workarounds for annotating buffer types in the type system, but neither is adequate. -First, the `current workaround __` +First, the `current workaround `__ for buffer types in typeshed is a type alias that lists well-known buffer types in the standard library, such as ``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This approach works for the standard library, but it does not extend to third-party buffer types. -Second, the `documentation __` +Second, the `documentation `__ for ``typing.ByteString`` currently states: This type represents the types ``bytes``, ``bytearray``, and @@ -62,7 +60,7 @@ for ``typing.ByteString`` currently states: arguments of any of the types mentioned above. Although this sentence has been in the documentation -`since 2015 __`, +`since 2015 `__, the use of ``bytes`` to include these other types is not specified in any of the typing PEPs. Furthermore, this mechanism has a number of problems. It does not include all possible buffer types, and it @@ -71,7 +69,7 @@ there are many operations that are valid on ``bytes`` objects, but not on ``memoryview`` objects, and it is perfectly possible for a function to accept ``bytes`` but not ``memoryview`` objects. A mypy user -`reports __` +`reports `__ that this shortcut has caused significant problems for the `psycopg` project. Kinds of buffers @@ -200,8 +198,8 @@ How to Teach This ================= We will add notes pointing to ``types.Buffer`` to appropriate places in the -documentation, such as `typing.readthedocs.io __` -and the `mypy cheat sheet __`. +documentation, such as `typing.readthedocs.io `__ +and the `mypy cheat sheet `__. Type checkers may provide additional pointers in their error messages. For example, when they encounter a place where a buffer object is passed to a function that is annotated to only accept ``bytes``, the error message could include a note suggesting @@ -220,7 +218,7 @@ Rejected Ideas Buffer ABC ---------- -An `earlier proposal __` suggested +An `earlier proposal `__ suggested adding a ``collections.abc.Buffer`` ABC to represent buffer objects. This idea stalled because an ABC with no methods does not fit well into the ``collections.abc`` module. Furthermore, it required manual registration of buffer classes, including From e570c790ccea8bceab4aef9fa14fd6306d95daa8 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 09:46:49 -0700 Subject: [PATCH 06/18] should not -> does not require --- pep-9999.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-9999.rst b/pep-9999.rst index 5f0ca58aa68..8d3bf111d72 100644 --- a/pep-9999.rst +++ b/pep-9999.rst @@ -143,8 +143,8 @@ For example, ``bytes`` may be declared as follows in a stub: def decode(self, ...): ... ... -Static type checkers should not give any special treatment to -this class. +The ``types.Buffer`` class does not require any special treatment +in type checkers. Equivalent for older Python versions ------------------------------------ From c8cd9167aef9f6e0b7502f79a0795e2d6053f093 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 09:47:35 -0700 Subject: [PATCH 07/18] PEP 688 --- pep-9999.rst => pep-0688.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename pep-9999.rst => pep-0688.rst (99%) diff --git a/pep-9999.rst b/pep-0688.rst similarity index 99% rename from pep-9999.rst rename to pep-0688.rst index 8d3bf111d72..01b62b57aa9 100644 --- a/pep-9999.rst +++ b/pep-0688.rst @@ -1,4 +1,4 @@ -PEP: 9999 +PEP: 688 Title: Making the buffer protocol accessible in Python Author: Jelle Zijlstra Sponsor: Jelle Zijlstra From 614032441f4ae8af8e9d810223c54a47e8644e72 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 09:49:29 -0700 Subject: [PATCH 08/18] lint --- pep-0688.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0688.rst b/pep-0688.rst index 01b62b57aa9..015cf8ba000 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -70,7 +70,7 @@ not on ``memoryview`` objects, and it is perfectly possible for a function to accept ``bytes`` but not ``memoryview`` objects. A mypy user `reports `__ -that this shortcut has caused significant problems for the `psycopg` project. +that this shortcut has caused significant problems for the ``psycopg`` project. Kinds of buffers ---------------- From b4f02bb279008b959d3608c2652291610bbfebae Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 09:56:46 -0700 Subject: [PATCH 09/18] rewording --- pep-0688.rst | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index 015cf8ba000..b0135ec9dab 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -171,7 +171,7 @@ for other ``ByteString`` types will be removed from the ``typing`` documentation. With ``types.Buffer`` available as an alternative, there is no good reason to allow ``bytes`` as a shorthand. -Type checkers that implement this behavior should deprecate and +We suggest that type checkers that implement this behavior should deprecate and eventually remove it. @@ -183,9 +183,11 @@ no backwards compatibility concerns. However, the recommendation to remove the special behavior for ``bytes`` in type checkers does have backwards compatibility -impact on users. - -.. TODO: https://github.com/python/mypy/pull/12661 +impact on users. An `experiment `__ +with mypy shows that several major open source projects type +checked with mypy will see new errors if the ``bytes`` promotion +is removed. Nevertheless, the change improves overall type safety, +so we believe the migration cost is worth it. Security Implications From 0fe1f6aff6abe56232d69db0eb4c3f8aab92cf03 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 10:36:00 -0700 Subject: [PATCH 10/18] codeowners --- .github/CODEOWNERS | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 5ac9e42e6b5..3662785b57a 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -568,6 +568,7 @@ pep-0684.rst @ericsnowcurrently pep-0685.rst @brettcannon pep-0686.rst @methane pep-0687.rst @encukou +pep-0688.rst @jellezijlstra # ... # pep-0754.txt # ... From f9155848a42872ebbfff7b19676d4f0c91e60fa3 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 12:14:37 -0700 Subject: [PATCH 11/18] more words, feedback from Adam --- pep-0688.rst | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index b0135ec9dab..2eb60a024f9 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -1,7 +1,6 @@ PEP: 688 Title: Making the buffer protocol accessible in Python Author: Jelle Zijlstra -Sponsor: Jelle Zijlstra Status: Draft Type: Standards Track Content-Type: text/x-rst @@ -23,7 +22,7 @@ Motivation The CPython C API provides a versatile mechanism for accessing the underlying memory of an object, the buffer protocol from :pep:`3118`. Functions that accept binary data are usually written to accept any -object implementing the buffer protocol. For example, as I write this, +object implementing the buffer protocol. For example, at the time of writing, there are about 130 functions in CPython using the Argument Clinic ``Py_buffer`` type, which accepts the buffer protocol. @@ -87,7 +86,7 @@ queried directly on the type object. The only way to figure out whether an object supports a writable buffer is to actually ask for the buffer. For some types, such as ``memoryview``, whether the buffer is writable depends on the instance: -some instances are read-only and others are not. As such, I propose to +some instances are read-only and others are not. As such, we propose to support only whether a type implements the buffer protocol at all, not whether it supports more specific options such as writable buffers. @@ -186,7 +185,12 @@ However, the recommendation to remove the special behavior for impact on users. An `experiment `__ with mypy shows that several major open source projects type checked with mypy will see new errors if the ``bytes`` promotion -is removed. Nevertheless, the change improves overall type safety, +is removed. However, many of these errors can be fixed by improving +the stubs in typeshed, as already done for the +`builtins `__, +`binascii `__, +`pickle `__ modules. +Overall, the change improves overall type safety, so we believe the migration cost is worth it. @@ -211,7 +215,9 @@ to use ``types.Buffer`` instead. Reference Implementation ======================== -[Link to any existing implementation and details about its state, e.g. proof-of-concept.] +An implementation of ``types.Buffer`` is +`available `__ +in the author's fork. Rejected Ideas From edcf64033ea40bed104012b959b8af6623322696 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sat, 23 Apr 2022 18:10:47 -0700 Subject: [PATCH 12/18] reword first sentence --- pep-0688.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index 2eb60a024f9..cd50b1857ea 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -11,9 +11,9 @@ Python-Version: 3.12 Abstract ======== -This PEP proposes a mechanism to inspect in Python whether a type implements -the C-level buffer protocol. This allows type checkers to check for -objects that implement the protocol. +This PEP proposes a mechanism for Python code to inspect whether a +type implements the C-level buffer protocol. This allows type +checkers to check for objects that implement the protocol. Motivation From 5dd89107f8cdf3c80418df082b4c0aa1bbebeb8b Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 06:46:58 -0700 Subject: [PATCH 13/18] add note on bytes/bytearray --- pep-0688.rst | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/pep-0688.rst b/pep-0688.rst index cd50b1857ea..9e026646aef 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -189,7 +189,8 @@ is removed. However, many of these errors can be fixed by improving the stubs in typeshed, as already done for the `builtins `__, `binascii `__, -`pickle `__ modules. +`pickle `__, and +`re `__ modules. Overall, the change improves overall type safety, so we believe the migration cost is worth it. @@ -236,6 +237,18 @@ Nevertheless, the ABC proposal has the advantage that it does not require C chan and we are proposing to adopt a version of it in the third-party ``typing_extensions`` package for the benefit of users of older versions of Python. +Keep ``bytearray`` compatible with ``bytes`` +-------------------------------------------- + +It has been suggested to remove the special case where ``memoryview`` is +always compatible with ``bytes``, but keep it for ``bytearray``, because +the two types have very similar interfaces. However, several standard +library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept +``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also +not a very common type. We prefer to have users spell out accepted types +explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of +methods is required). + Open Issues =========== From 862105e47a01abd8af4beb90130209675cadb56a Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 06:55:02 -0700 Subject: [PATCH 14/18] Apply CAM feedback --- pep-0688.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index 9e026646aef..8355f2d72d1 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -13,24 +13,24 @@ Abstract This PEP proposes a mechanism for Python code to inspect whether a type implements the C-level buffer protocol. This allows type -checkers to check for objects that implement the protocol. +checkers to evaluate whether objects that implement the protocol. Motivation ========== The CPython C API provides a versatile mechanism for accessing the -underlying memory of an object, the buffer protocol from :pep:`3118`. -Functions that accept binary data are usually written to accept any +underlying memory of an object—the buffer protocol from :pep:`3118`. +Functions that accept binary data are usually written to handle any object implementing the buffer protocol. For example, at the time of writing, there are about 130 functions in CPython using the Argument Clinic ``Py_buffer`` type, which accepts the buffer protocol. -Currently, there is no way to inspect in Python whether an object -implements the buffer protocol. Relatedly, the static type system +Currently, there is no way for Python code to inspect whether an object +supports the buffer protocol. Moreover, the static type system does not provide a type annotation to represent the protocol. This is a `common problem `__ -when type annotating code that accepts generic buffers. +when type-annotating code that accepts generic buffers. Rationale @@ -87,7 +87,7 @@ whether an object supports a writable buffer is to actually ask for the buffer. For some types, such as ``memoryview``, whether the buffer is writable depends on the instance: some instances are read-only and others are not. As such, we propose to -support only whether a type implements the buffer protocol at +only support exposing whether a type implements the buffer protocol at all, not whether it supports more specific options such as writable buffers. From abb36f2f2b943caa7ec344eb41ec315ff88c0eec Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 14:56:41 -0700 Subject: [PATCH 15/18] More CAM feedback --- pep-0688.rst | 52 ++++++++++++++++++++++++++++------------------------ 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index 8355f2d72d1..c985724f76d 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -12,8 +12,8 @@ Abstract ======== This PEP proposes a mechanism for Python code to inspect whether a -type implements the C-level buffer protocol. This allows type -checkers to evaluate whether objects that implement the protocol. +type supports the C-level buffer protocol. This allows type +checkers to evaluate whether objects implement the protocol. Motivation @@ -87,7 +87,7 @@ whether an object supports a writable buffer is to actually ask for the buffer. For some types, such as ``memoryview``, whether the buffer is writable depends on the instance: some instances are read-only and others are not. As such, we propose to -only support exposing whether a type implements the buffer protocol at +expose only whether a type implements the buffer protocol at all, not whether it supports more specific options such as writable buffers. @@ -126,13 +126,13 @@ The new class can also be used in type annotations: need_buffer(b"xy") # ok need_buffer("xy") # rejected by static type checkers -Usage in stubs --------------- +Usage in stub files +------------------- For static typing purposes, types defined in C extensions usually -require stub files, as described in :pep:`484`. In stub files, -``types.Buffer`` may be used as a base class to indicate that a -class implements the buffer protocol. +require stub files, as :pep:`described in PEP 484 <484#stub-files>`. +In stub files, ``types.Buffer`` may be used as a base class to +indicate that a class implements the buffer protocol. For example, ``bytes`` may be declared as follows in a stub: @@ -149,10 +149,12 @@ Equivalent for older Python versions ------------------------------------ New typing features are usually backported to older Python versions -in the ``typing_extensions`` package. Because the buffer protocol +in the `typing_extensions `_ +package. Because the buffer protocol is accessible only in C, ``types.Buffer`` cannot be implemented -in a pure Python package. As a temporary workaround, a -``typing_extensions.Buffer`` ABC will be provided on Python versions +in a pure-Python package like ``typing_extensions``. As a temporary +workaround, a ``typing_extensions.Buffer`` +`Abstract Base Class `__ will be provided on Python versions that do not have ``types.Buffer`` available. For the benefit of static type checkers, ``typing_extensions.Buffer`` can be used as a base class in stubs to mark types as supporting the buffer protocol. @@ -170,8 +172,8 @@ for other ``ByteString`` types will be removed from the ``typing`` documentation. With ``types.Buffer`` available as an alternative, there is no good reason to allow ``bytes`` as a shorthand. -We suggest that type checkers that implement this behavior should deprecate and -eventually remove it. +We suggest that type checkers currently implementing this behavior +should deprecate and eventually remove it. Backwards Compatibility @@ -181,18 +183,18 @@ As the runtime changes in this PEP only add a new class, there are no backwards compatibility concerns. However, the recommendation to remove the special behavior for -``bytes`` in type checkers does have backwards compatibility -impact on users. An `experiment `__ -with mypy shows that several major open source projects type -checked with mypy will see new errors if the ``bytes`` promotion +``bytes`` in type checkers does have a backwards compatibility +impact on their users. An `experiment `__ +with mypy shows that several major open source projects that use it +for type checking will see new errors if the ``bytes`` promotion is removed. However, many of these errors can be fixed by improving the stubs in typeshed, as already done for the `builtins `__, `binascii `__, `pickle `__, and `re `__ modules. -Overall, the change improves overall type safety, -so we believe the migration cost is worth it. +Overall, the change improves type safety and makes the type system +more consistent, so we believe the migration cost is worth it. Security Implications @@ -208,9 +210,9 @@ We will add notes pointing to ``types.Buffer`` to appropriate places in the documentation, such as `typing.readthedocs.io `__ and the `mypy cheat sheet `__. Type checkers may provide additional pointers in their error messages. For example, -when they encounter a place where a buffer object is passed to a function that +when they encounter a buffer object being passed to a function that is annotated to only accept ``bytes``, the error message could include a note suggesting -to use ``types.Buffer`` instead. +the use of ``types.Buffer`` instead. Reference Implementation @@ -228,14 +230,16 @@ Buffer ABC ---------- An `earlier proposal `__ suggested -adding a ``collections.abc.Buffer`` ABC to represent buffer objects. This idea +adding a ``collections.abc.Buffer`` +`abstract base class `__ +to represent buffer objects. This idea stalled because an ABC with no methods does not fit well into the ``collections.abc`` module. Furthermore, it required manual registration of buffer classes, including those in the standard library. This PEP's approach of using the ``__instancecheck__`` hook is more natural and does not require explicit registration. Nevertheless, the ABC proposal has the advantage that it does not require C changes, and we are proposing to adopt a version of it in the third-party ``typing_extensions`` -package for the benefit of users of older versions of Python. +package for the benefit of users of older Python versions. Keep ``bytearray`` compatible with ``bytes`` -------------------------------------------- @@ -261,7 +265,7 @@ does not provide a way to distinguish between read-only and writable buffers. That's unfortunate, because some APIs require a writable buffer, and one of the most common buffer types (``bytes``) is always read-only. Should we add a new mechanism in C to declare that a type implementing the -buffer protocol is always read-only? +buffer protocol is potentially writable? Copyright From 052c49d3d19b83229b6ac272721db2a69d53e043 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 14:57:30 -0700 Subject: [PATCH 16/18] missed one --- pep-0688.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0688.rst b/pep-0688.rst index c985724f76d..e76e1cb72f2 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -97,7 +97,7 @@ Specification types.Buffer ------------ -A new class ``types.Buffer`` will be added. It cannot be instantiated or +A new class, ``types.Buffer``, will be added. It cannot be instantiated or subclassed, but supports the ``__instancecheck__`` and ``__subclasscheck__`` hooks. In CPython, these will check for the presence of the ``bf_getbuffer`` slot in the type object: From d32e0c6dcf09499d68b54764ffbb8ea67dc4ba26 Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 15:54:57 -0700 Subject: [PATCH 17/18] Feedback from Alex --- pep-0688.rst | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index e76e1cb72f2..1998f52473b 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -23,14 +23,14 @@ The CPython C API provides a versatile mechanism for accessing the underlying memory of an object—the buffer protocol from :pep:`3118`. Functions that accept binary data are usually written to handle any object implementing the buffer protocol. For example, at the time of writing, -there are about 130 functions in CPython using the Argument Clinic +there are around 130 functions in CPython using the Argument Clinic ``Py_buffer`` type, which accepts the buffer protocol. Currently, there is no way for Python code to inspect whether an object supports the buffer protocol. Moreover, the static type system does not provide a type annotation to represent the protocol. This is a `common problem `__ -when type-annotating code that accepts generic buffers. +when writing type annotations for code that accepts generic buffers. Rationale @@ -134,16 +134,15 @@ require stub files, as :pep:`described in PEP 484 <484#stub-files>`. In stub files, ``types.Buffer`` may be used as a base class to indicate that a class implements the buffer protocol. -For example, ``bytes`` may be declared as follows in a stub: +For example, ``memoryview`` may be declared as follows in a stub: .. code-block:: python - class bytes(types.Buffer, Sequence[int]): - def decode(self, ...): ... + class memoryview(types.Buffer, Sized, Sequence[int]): ... The ``types.Buffer`` class does not require any special treatment -in type checkers. +by type checkers. Equivalent for older Python versions ------------------------------------ @@ -154,14 +153,19 @@ package. Because the buffer protocol is accessible only in C, ``types.Buffer`` cannot be implemented in a pure-Python package like ``typing_extensions``. As a temporary workaround, a ``typing_extensions.Buffer`` -`Abstract Base Class `__ will be provided on Python versions -that do not have ``types.Buffer`` available. For the benefit of +`abstract base class `__ will be provided for Python versions +that do not have ``types.Buffer`` available. + +For the benefit of static type checkers, ``typing_extensions.Buffer`` can be used as a base class in stubs to mark types as supporting the buffer protocol. For runtime uses, the ``ABC.register`` API can be used to register buffer classes with ``typing_extensions.Buffer``. When ``types.Buffer`` is available, ``typing_extensions`` should simply -re-export it. +re-export it. Thus, users who register their buffer class manually +with ``typing_extensions.Buffer.register`` should use a guard to make +sure their code continues to work once ``types.Buffer`` is in the +standard library. No special meaning for ``bytes`` @@ -170,7 +174,7 @@ No special meaning for ``bytes`` The special case stating that ``bytes`` may be used as a shorthand for other ``ByteString`` types will be removed from the ``typing`` documentation. -With ``types.Buffer`` available as an alternative, there is no good +With ``types.Buffer`` available as an alternative, there will be no good reason to allow ``bytes`` as a shorthand. We suggest that type checkers currently implementing this behavior should deprecate and eventually remove it. @@ -188,7 +192,7 @@ impact on their users. An `experiment `__, `binascii `__, `pickle `__, and @@ -197,16 +201,10 @@ Overall, the change improves type safety and makes the type system more consistent, so we believe the migration cost is worth it. -Security Implications -===================== - -None. - - How to Teach This ================= -We will add notes pointing to ``types.Buffer`` to appropriate places in the +We will add notes pointing to ``types.Buffer`` in appropriate places in the documentation, such as `typing.readthedocs.io `__ and the `mypy cheat sheet `__. Type checkers may provide additional pointers in their error messages. For example, @@ -237,8 +235,9 @@ stalled because an ABC with no methods does not fit well into the ``collections. module. Furthermore, it required manual registration of buffer classes, including those in the standard library. This PEP's approach of using the ``__instancecheck__`` hook is more natural and does not require explicit registration. -Nevertheless, the ABC proposal has the advantage that it does not require C changes, -and we are proposing to adopt a version of it in the third-party ``typing_extensions`` + +Nevertheless, the ABC proposal has the advantage that it does not require C changes. +This PEP proposes to adopt a version of it in the third-party ``typing_extensions`` package for the benefit of users of older Python versions. Keep ``bytearray`` compatible with ``bytes`` From 9de8b6056a94412248e79dd916711b8529896d9a Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 20:30:40 -0700 Subject: [PATCH 18/18] More feedback --- pep-0688.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/pep-0688.rst b/pep-0688.rst index 1998f52473b..7544f0331a6 100644 --- a/pep-0688.rst +++ b/pep-0688.rst @@ -98,7 +98,7 @@ types.Buffer ------------ A new class, ``types.Buffer``, will be added. It cannot be instantiated or -subclassed, but supports the ``__instancecheck__`` and +subclassed at runtime, but supports the ``__instancecheck__`` and ``__subclasscheck__`` hooks. In CPython, these will check for the presence of the ``bf_getbuffer`` slot in the type object: @@ -160,8 +160,9 @@ For the benefit of static type checkers, ``typing_extensions.Buffer`` can be used as a base class in stubs to mark types as supporting the buffer protocol. For runtime uses, the ``ABC.register`` API can be used to register -buffer classes with ``typing_extensions.Buffer``. When -``types.Buffer`` is available, ``typing_extensions`` should simply +buffer classes with ``typing_extensions.Buffer``. + +When ``types.Buffer`` is available, ``typing_extensions`` should simply re-export it. Thus, users who register their buffer class manually with ``typing_extensions.Buffer.register`` should use a guard to make sure their code continues to work once ``types.Buffer`` is in the @@ -191,7 +192,7 @@ However, the recommendation to remove the special behavior for impact on their users. An `experiment `__ with mypy shows that several major open source projects that use it for type checking will see new errors if the ``bytes`` promotion -is removed. However, many of these errors can be fixed by improving +is removed. Many of these errors can be fixed by improving the stubs in typeshed, as has already been done for the `builtins `__, `binascii `__,