Skip to content

Commit f9c9354

Browse files
authored
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)
1 parent 68fec31 commit f9c9354

35 files changed

+197
-2088
lines changed

Doc/c-api/arg.rst

+5-42
Original file line numberDiff line numberDiff line change
@@ -136,48 +136,6 @@ which disallows mutable objects such as :class:`bytearray`.
136136
attempting any conversion. Raises :exc:`TypeError` if the object is not
137137
a :class:`bytearray` object. The C variable may also be declared as :c:type:`PyObject*`.
138138

139-
``u`` (:class:`str`) [const Py_UNICODE \*]
140-
Convert a Python Unicode object to a C pointer to a NUL-terminated buffer of
141-
Unicode characters. You must pass the address of a :c:type:`Py_UNICODE`
142-
pointer variable, which will be filled with the pointer to an existing
143-
Unicode buffer. Please note that the width of a :c:type:`Py_UNICODE`
144-
character depends on compilation options (it is either 16 or 32 bits).
145-
The Python string must not contain embedded null code points; if it does,
146-
a :exc:`ValueError` exception is raised.
147-
148-
.. versionchanged:: 3.5
149-
Previously, :exc:`TypeError` was raised when embedded null code points
150-
were encountered in the Python string.
151-
152-
.. deprecated-removed:: 3.3 3.12
153-
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
154-
:c:func:`PyUnicode_AsWideCharString`.
155-
156-
``u#`` (:class:`str`) [const Py_UNICODE \*, :c:type:`Py_ssize_t`]
157-
This variant on ``u`` stores into two C variables, the first one a pointer to a
158-
Unicode data buffer, the second one its length. This variant allows
159-
null code points.
160-
161-
.. deprecated-removed:: 3.3 3.12
162-
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
163-
:c:func:`PyUnicode_AsWideCharString`.
164-
165-
``Z`` (:class:`str` or ``None``) [const Py_UNICODE \*]
166-
Like ``u``, but the Python object may also be ``None``, in which case the
167-
:c:type:`Py_UNICODE` pointer is set to ``NULL``.
168-
169-
.. deprecated-removed:: 3.3 3.12
170-
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
171-
:c:func:`PyUnicode_AsWideCharString`.
172-
173-
``Z#`` (:class:`str` or ``None``) [const Py_UNICODE \*, :c:type:`Py_ssize_t`]
174-
Like ``u#``, but the Python object may also be ``None``, in which case the
175-
:c:type:`Py_UNICODE` pointer is set to ``NULL``.
176-
177-
.. deprecated-removed:: 3.3 3.12
178-
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
179-
:c:func:`PyUnicode_AsWideCharString`.
180-
181139
``U`` (:class:`str`) [PyObject \*]
182140
Requires that the Python object is a Unicode object, without attempting
183141
any conversion. Raises :exc:`TypeError` if the object is not a Unicode
@@ -247,6 +205,11 @@ which disallows mutable objects such as :class:`bytearray`.
247205
them. Instead, the implementation assumes that the byte string object uses the
248206
encoding passed in as parameter.
249207

208+
.. versionchanged:: 3.12
209+
``u``, ``u#``, ``Z``, and ``Z#`` are removed because they used legacy ``Py_UNICODE*``
210+
representation.
211+
212+
250213
Numbers
251214
-------
252215

Doc/c-api/unicode.rst

+21-156
Original file line numberDiff line numberDiff line change
@@ -17,26 +17,12 @@ of Unicode characters while staying memory efficient. There are special cases
1717
for strings where all code points are below 128, 256, or 65536; otherwise, code
1818
points must be below 1114112 (which is the full Unicode range).
1919

20-
:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
21-
in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
22-
and inefficient.
23-
24-
Due to the transition between the old APIs and the new APIs, Unicode objects
25-
can internally be in two states depending on how they were created:
26-
27-
* "canonical" Unicode objects are all objects created by a non-deprecated
28-
Unicode API. They use the most efficient representation allowed by the
29-
implementation.
30-
31-
* "legacy" Unicode objects have been created through one of the deprecated
32-
APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
33-
:c:type:`Py_UNICODE*` representation; you will have to call
34-
:c:func:`PyUnicode_READY` on them before calling any other API.
20+
UTF-8 representation is created on demand and cached in the Unicode object.
3521

3622
.. note::
37-
The "legacy" Unicode object will be removed in Python 3.12 with deprecated
38-
APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
39-
for more information.
23+
The :c:type:`Py_UNICODE` representation has been removed since Python 3.12
24+
with deprecated APIs.
25+
See :pep:`623` for more information.
4026

4127

4228
Unicode Type
@@ -101,18 +87,12 @@ access to internal read-only data of Unicode objects:
10187
10288
.. c:function:: int PyUnicode_READY(PyObject *o)
10389
104-
Ensure the string object *o* is in the "canonical" representation. This is
105-
required before using any of the access macros described below.
106-
107-
.. XXX expand on when it is not required
108-
109-
Returns ``0`` on success and ``-1`` with an exception set on failure, which in
110-
particular happens if memory allocation fails.
90+
Returns ``0``. This API is kept only for backward compatibility.
11191
11292
.. versionadded:: 3.3
11393
114-
.. deprecated-removed:: 3.10 3.12
115-
This API will be removed with :c:func:`PyUnicode_FromUnicode`.
94+
.. deprecated:: 3.10
95+
This API do nothing since Python 3.12. Please remove code using this function.
11696
11797
11898
.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
@@ -130,23 +110,21 @@ access to internal read-only data of Unicode objects:
130110
Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
131111
integer types for direct character access. No checks are performed if the
132112
canonical representation has the correct character size; use
133-
:c:func:`PyUnicode_KIND` to select the right function. Make sure
134-
:c:func:`PyUnicode_READY` has been called before accessing this.
113+
:c:func:`PyUnicode_KIND` to select the right function.
135114
136115
.. versionadded:: 3.3
137116
138117
139-
.. c:macro:: PyUnicode_WCHAR_KIND
140-
PyUnicode_1BYTE_KIND
118+
.. c:macro:: PyUnicode_1BYTE_KIND
141119
PyUnicode_2BYTE_KIND
142120
PyUnicode_4BYTE_KIND
143121
144122
Return values of the :c:func:`PyUnicode_KIND` macro.
145123
146124
.. versionadded:: 3.3
147125
148-
.. deprecated-removed:: 3.10 3.12
149-
``PyUnicode_WCHAR_KIND`` is deprecated.
126+
.. versionchanged:: 3.12
127+
``PyUnicode_WCHAR_KIND`` has been removed.
150128
151129
152130
.. c:function:: int PyUnicode_KIND(PyObject *o)
@@ -155,8 +133,6 @@ access to internal read-only data of Unicode objects:
155133
bytes per character this Unicode object uses to store its data. *o* has to
156134
be a Unicode object in the "canonical" representation (not checked).
157135
158-
.. XXX document "0" return value?
159-
160136
.. versionadded:: 3.3
161137
162138
@@ -208,49 +184,6 @@ access to internal read-only data of Unicode objects:
208184
.. versionadded:: 3.3
209185
210186
211-
.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
212-
213-
Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
214-
code units (this includes surrogate pairs as 2 units). *o* has to be a
215-
Unicode object (not checked).
216-
217-
.. deprecated-removed:: 3.3 3.12
218-
Part of the old-style Unicode API, please migrate to using
219-
:c:func:`PyUnicode_GET_LENGTH`.
220-
221-
222-
.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
223-
224-
Return the size of the deprecated :c:type:`Py_UNICODE` representation in
225-
bytes. *o* has to be a Unicode object (not checked).
226-
227-
.. deprecated-removed:: 3.3 3.12
228-
Part of the old-style Unicode API, please migrate to using
229-
:c:func:`PyUnicode_GET_LENGTH`.
230-
231-
232-
.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
233-
const char* PyUnicode_AS_DATA(PyObject *o)
234-
235-
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
236-
returned buffer is always terminated with an extra null code point. It
237-
may also contain embedded null code points, which would cause the string
238-
to be truncated when used in most C functions. The ``AS_DATA`` form
239-
casts the pointer to :c:type:`const char *`. The *o* argument has to be
240-
a Unicode object (not checked).
241-
242-
.. versionchanged:: 3.3
243-
This function is now inefficient -- because in many cases the
244-
:c:type:`Py_UNICODE` representation does not exist and needs to be created
245-
-- and can fail (return ``NULL`` with an exception set). Try to port the
246-
code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
247-
:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
248-
249-
.. deprecated-removed:: 3.3 3.12
250-
Part of the old-style Unicode API, please migrate to using the
251-
:c:func:`PyUnicode_nBYTE_DATA` family of macros.
252-
253-
254187
.. c:function:: int PyUnicode_IsIdentifier(PyObject *o)
255188
256189
Return ``1`` if the string is a valid identifier according to the language
@@ -436,12 +369,17 @@ APIs:
436369
437370
Create a Unicode object from the char buffer *u*. The bytes will be
438371
interpreted as being UTF-8 encoded. The buffer is copied into the new
439-
object. If the buffer is not ``NULL``, the return value might be a shared
440-
object, i.e. modification of the data is not allowed.
372+
object.
373+
The return value might be a shared object, i.e. modification of the data is
374+
not allowed.
441375
442-
If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
443-
with the buffer set to ``NULL``. This usage is deprecated in favor of
444-
:c:func:`PyUnicode_New`, and will be removed in Python 3.12.
376+
This function raises :exc:`SystemError` when:
377+
378+
* *size* < 0,
379+
* *u* is ``NULL`` and *size* > 0
380+
381+
.. versionchanged:: 3.12
382+
*u* == ``NULL`` with *size* > 0 is not allowed anymore.
445383
446384
447385
.. c:function:: PyObject *PyUnicode_FromString(const char *u)
@@ -680,79 +618,6 @@ APIs:
680618
.. versionadded:: 3.3
681619
682620
683-
Deprecated Py_UNICODE APIs
684-
""""""""""""""""""""""""""
685-
686-
.. deprecated-removed:: 3.3 3.12
687-
688-
These API functions are deprecated with the implementation of :pep:`393`.
689-
Extension modules can continue using them, as they will not be removed in Python
690-
3.x, but need to be aware that their use can now cause performance and memory hits.
691-
692-
693-
.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
694-
695-
Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
696-
may be ``NULL`` which causes the contents to be undefined. It is the user's
697-
responsibility to fill in the needed data. The buffer is copied into the new
698-
object.
699-
700-
If the buffer is not ``NULL``, the return value might be a shared object.
701-
Therefore, modification of the resulting Unicode object is only allowed when
702-
*u* is ``NULL``.
703-
704-
If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
705-
string content has been filled before using any of the access macros such as
706-
:c:func:`PyUnicode_KIND`.
707-
708-
.. deprecated-removed:: 3.3 3.12
709-
Part of the old-style Unicode API, please migrate to using
710-
:c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
711-
:c:func:`PyUnicode_New`.
712-
713-
714-
.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
715-
716-
Return a read-only pointer to the Unicode object's internal
717-
:c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
718-
:c:type:`Py_UNICODE*` representation of the object if it is not yet
719-
available. The buffer is always terminated with an extra null code point.
720-
Note that the resulting :c:type:`Py_UNICODE` string may also contain
721-
embedded null code points, which would cause the string to be truncated when
722-
used in most C functions.
723-
724-
.. deprecated-removed:: 3.3 3.12
725-
Part of the old-style Unicode API, please migrate to using
726-
:c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
727-
:c:func:`PyUnicode_ReadChar` or similar new APIs.
728-
729-
730-
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
731-
732-
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
733-
array length (excluding the extra null terminator) in *size*.
734-
Note that the resulting :c:type:`Py_UNICODE*` string
735-
may contain embedded null code points, which would cause the string to be
736-
truncated when used in most C functions.
737-
738-
.. versionadded:: 3.3
739-
740-
.. deprecated-removed:: 3.3 3.12
741-
Part of the old-style Unicode API, please migrate to using
742-
:c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
743-
:c:func:`PyUnicode_ReadChar` or similar new APIs.
744-
745-
746-
.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
747-
748-
Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
749-
code units (this includes surrogate pairs as 2 units).
750-
751-
.. deprecated-removed:: 3.3 3.12
752-
Part of the old-style Unicode API, please migrate to using
753-
:c:func:`PyUnicode_GET_LENGTH`.
754-
755-
756621
.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
757622
758623
Copy an instance of a Unicode subtype to a new true Unicode object if

Doc/data/stable_abi.dat

-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Doc/howto/clinic.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -848,15 +848,15 @@ on the right is the text you'd replace it with.
848848
``'s#'`` ``str(zeroes=True)``
849849
``'s*'`` ``Py_buffer(accept={buffer, str})``
850850
``'U'`` ``unicode``
851-
``'u'`` ``Py_UNICODE``
852-
``'u#'`` ``Py_UNICODE(zeroes=True)``
851+
``'u'`` ``wchar_t``
852+
``'u#'`` ``wchar_t(zeroes=True)``
853853
``'w*'`` ``Py_buffer(accept={rwbuffer})``
854854
``'Y'`` ``PyByteArrayObject``
855855
``'y'`` ``str(accept={bytes})``
856856
``'y#'`` ``str(accept={robuffer}, zeroes=True)``
857857
``'y*'`` ``Py_buffer``
858-
``'Z'`` ``Py_UNICODE(accept={str, NoneType})``
859-
``'Z#'`` ``Py_UNICODE(accept={str, NoneType}, zeroes=True)``
858+
``'Z'`` ``wchar_t(accept={str, NoneType})``
859+
``'Z#'`` ``wchar_t(accept={str, NoneType}, zeroes=True)``
860860
``'z'`` ``str(accept={str, NoneType})``
861861
``'z#'`` ``str(accept={str, NoneType}, zeroes=True)``
862862
``'z*'`` ``Py_buffer(accept={buffer, str, NoneType})``

Doc/whatsnew/3.12.rst

+25-1
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,9 @@ Summary -- Release highlights
6666
6767
.. PEP-sized items next.
6868
69+
Important deprecations, removals or restrictions:
70+
71+
* :pep:`623`, Remove wstr from Unicode
6972

7073

7174
New Features
@@ -91,7 +94,9 @@ Improved Modules
9194
Optimizations
9295
=============
9396

94-
97+
* Removed ``wstr`` and ``wstr_length`` members from Unicode objects.
98+
It reduces object size by 8 or 16 bytes on 64bit platform. (:pep:`623`)
99+
(Contributed by Inada Naoki in :gh:`92536`.)
95100

96101

97102
Deprecated
@@ -140,6 +145,13 @@ New Features
140145
Porting to Python 3.12
141146
----------------------
142147

148+
* Legacy Unicode APIs based on ``Py_UNICODE*`` representation has been removed.
149+
Please migrate to APIs based on UTF-8 or ``wchar_t*``.
150+
151+
* Argument parsing functions like :c:func:`PyArg_ParseTuple` doesn't support
152+
``Py_UNICODE*`` based format (e.g. ``u``, ``Z``) anymore. Please migrate
153+
to other formats for Unicode like ``s``, ``z``, ``es``, and ``U``.
154+
143155
Deprecated
144156
----------
145157

@@ -150,3 +162,15 @@ Removed
150162
API. The ``token.h`` header file was only designed to be used by Python
151163
internals.
152164
(Contributed by Victor Stinner in :gh:`92651`.)
165+
166+
* Leagcy Unicode APIs has been removed. See :pep:`623` for detail.
167+
168+
* :c:macro:`PyUnicode_WCHAR_KIND`
169+
* :c:func:`PyUnicode_AS_UNICODE`
170+
* :c:func:`PyUnicode_AsUnicode`
171+
* :c:func:`PyUnicode_AsUnicodeAndSize`
172+
* :c:func:`PyUnicode_AS_DATA`
173+
* :c:func:`PyUnicode_FromUnicode`
174+
* :c:func:`PyUnicode_GET_SIZE`
175+
* :c:func:`PyUnicode_GetSize`
176+
* :c:func:`PyUnicode_GET_DATA_SIZE`

0 commit comments

Comments
 (0)