Skip to content

Conversation

@sorcio
Copy link
Contributor

@sorcio sorcio commented May 26, 2022

This adds a script to generate the mapping files for Traditional Chinese Big-5-based codecs, as discussed in the issue.

I initially planned to add support for later versions of HKSCS, but I decided to keep this minimal so to close gh-84508. The topic of refreshing the mappings is split to its own issue in gh-93271.

So this generates mappings_tw.h and mappings_hk.h files identical to the existing versions (with only one new line difference).

Notes about the mapping files:

  • I did not include the BIG5.TXT and CP950.TXT files that are available on the Unicode website. It looks like these are available for redistribution, so I can add them to the PR if needed.
  • I also did not include the hkscs-2004-big5-iso.txt file, but that's a different story. The terms of use from the source website include a clause that unilaterally binds to any update to the terms (clause 2), which I believe is incompatible with redistribution as part of CPython.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool, thanks for your contribution.
This issue was painful to me because we can not re-generate mapping files without this tool.

I left a small suggestion for this.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not include the BIG5.TXT and CP950.TXT files that are available on the Unicode website. It looks like these are available for redistribution, so I can add them to the PR if needed.

Please addig it.

@sorcio
Copy link
Contributor Author

sorcio commented May 28, 2022

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@corona10: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from corona10 May 28, 2022 09:46
Copy link
Member

@ezio-melotti ezio-melotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two files are somewhat big -- @corona10 do you think it's ok to include them in the repo?
If they are added the size of the repo will increase, making downloading/cloning it slower. I'm not sure if it might affect other things too.

#
# genmap_tchinese.py: Traditional Chinese Codecs Map Generator
#
# Original Author: Hye-Shik Chang <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW @hyeshik doesn't seem to be active on this repo, but he signed the CLA in 2005 (https://bugs.python.org/user1298), so it should be ok to include it here.

If this script is based on his script, you might want to clarify that in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezio-melotti do you have a suggestion on how to amend the comment to clarify that? The mention of "Original Author" is what it currently looks like in the other genmap_* scripts, so I didn't take the liberty to sway from that style.

The original source (as for the other equivalent scripts) is https://github.com/BackupTheBerlios/cjkpython/blob/master/cjkcodecs/tools/genmap_tchinese.py which was originally under 2-clause BSD. I might have made a wrong assumption that it was ok to include in CPython, since the other scripts followed the same process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In case it matters, the script in this PR is heavily modified, so while it probably still counts as derivative, it would be easy to do a full rewrite)

@corona10
Copy link
Member

corona10 commented May 28, 2022

@ezio-melotti
Yes, it's worth adding it. There are two reasons for this or we can separate the repo in the future as the submodule.

  1. Mapping files have existed already in the CPython repo.
    https://github.com/python/cpython/tree/main/Tools/unicode/python-mappings

  2. And I noticed that mapping files were not maintained well even if they should contain the diff file like https://github.com/python/cpython/tree/main/Tools/unicode/python-mappings/diff
    I am focusing on maintaining a well reproducible environment even if the external environment is changed.
    No one was interested in regenerating the mapping header files until I was concerned about the situation. This is an issue that can recur over time. Freezing the environment is important to prevent the same situation.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please regenerate Modules/cjkcodecs/mappings_hk.h and Modules/cjkcodecs/mappings_tw.h through the new script.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I will send a mail to @hyeshik to discuss this issue.
And also waiting for @ezio-melotti opinion about storing mapping files.
(I prefer to store at the own mapping repo under python org as a submodule, but let's do it at a separate issue.)

@ezio-melotti
Copy link
Member

The files that were there before were just a few kB (about 40kB in 9d5c071). GH-19602 introduced more files (~1.3MB, 2 years ago). GH-93309 recently added another ~1.4MB. The files included in this PR add yet another ~800kB.

The CPython source (without the Git history) is ~27MB zipped and ~92MB once extracted. With the Git history it's about 585MB (quite a bit more than I thought).

Compared to the rest, an extra 800kB is not particularly significant so it's probably ok to include them.

If you are planning to include more files, maybe it would be better to think about a different approach, depending on the goal (e.g. host copies somewhere on python.org and add a script to download them from there). Note that since the other PRs have already been merged, even if we (re)move the files from the repo we might be able to reduce the repo size by ~3MB but they will be included in the history forever.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files that were there before were just a few kB (about 40kB in [9d5c071]>>(https://github.com/python/cpython/tree/9d5c0710609320e51631750d1cf60c90cc618172/Tools/unicode/python-mappings)). #19602 introduced more files (~1.3MB, 2 years ago). #93309 recently added another ~1.4MB. The files included in this PR add yet another ~800kB.

Okay, I never thought that it will be a burden.

@sorcio
Can you please remove Tools/unicode/python-mappings/BIG5.TXT and Tools/unicode/python-mappings/CP950.TXT as a middle ground? (Sorry for duplicated works)

@ezio-melotti
I will revert #93309 too.
But can you open the discussion about storing mapping files that can be managed by CPython developers?
I think that you can understand what I was concerned about and what I intended to do.
It will cover all files under Tools/unicode/python-mappings/.

(e.g. host copies somewhere on python.org and add a script to download them from there).

That looks like a good suggestion as much as maintaining a submodule repository through GitHub.

Thank you for understanding.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@sorcio
Copy link
Contributor Author

sorcio commented May 29, 2022

I have made the requested changes; please review again

Sorry for duplicated works

No worries! It's good that we clarified that.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@corona10: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from corona10 May 29, 2022 10:07
Co-authored-by: Ezio Melotti <[email protected]>
Copy link
Member

@ezio-melotti ezio-melotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good to me. The script could be improved a bit (e.g. more docstring), but I'm assuming those parts were already in the original script and you didn't touch them.

I was also looking at e.g. open_mapping_file (defined in genmap_support) and noticed that it could be improved to report a clearer message. This however is out of the scope of this PR. @corona10, are you planning to do more work related to this? Do you think it would be worth to refactor these scripts?

I also noticed that the PR doesn't include any tests. For the tool itself is probably ok, however the resulting mapping should be tested. Are there already tests for that? It might also be useful to add some tests for the characters that diverge from one codec variant to the other, in order to avoid regressions.

@sorcio
Copy link
Contributor Author

sorcio commented May 30, 2022

It might also be useful to add some tests for the characters that diverge from one codec variant to the other, in order to avoid regressions.

That's a valid point. Keeping in mind that the codecs already have tests, and this PR doesn't change the behavior of the codecs in any way. But the tests rely on test data such as https://github.com/python/pythontestdotnet/blob/master/www/unicode/BIG5HKSCS-2004.TXT and the sample text at https://github.com/python/cpython/tree/main/Lib/test/cjkencodings.

I found no way to generate the test data files and I believe the sample text is hand-picked. In particular, looking at 5a15508:

  • the BIG5.TXT and CP950.TXT files originally were downloaded from unicode.org (sorry I missed this before, there is already a Python-hosted copy of this data!) and they are used as-is in the tests, so they should be okay
  • the BIG5HKSCS-2004.TXT file, on the other hand, used to refer to http://people.freebsd.org/~perky/i18n/BIG5HKSCS-2004.TXT as a source; I don't believe @hyeshik is still maintaining that repository, and it's not documented how the file was built
  • the sample texts similarly have no documented source, and are probably hand-crafted to cover some tricky cases, and require domain knowledge to maintain

Is it within the scope of this issue/PR to add a script to rebuild a file like BIG5HKSCS-2004.TXT?

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also looking at e.g. open_mapping_file (defined in genmap_support) and noticed that it could be improved to report a clearer message. This however is out of the scope of this PR. @corona10, are you planning to do more work related to this? Do you think it would be worth to refactor these scripts?

Please go ahead.

lgtm

@corona10 corona10 assigned ezio-melotti and unassigned corona10 May 30, 2022
@corona10
Copy link
Member

@ezio-melotti I got a mail from @hyeshik, He is going to review this PR soon. :)

@corona10 corona10 self-assigned this May 31, 2022
@hyeshik
Copy link
Contributor

hyeshik commented Jun 11, 2022

Thank you for your efforts in resolving this old problem, @sorcio, @ezio-melotti, and @corona10. All the changes look good to me. Also, I would like to put all my work included in the CJKCodecs releases into the public domain to allow more flexible adoptions in any derivative work. Please feel free to change the copyright conditions as needed.

@corona10 corona10 merged commit 733e15f into python:main Jun 11, 2022
@corona10
Copy link
Member

Thank you @hyeshik!

serhiy-storchaka added a commit that referenced this pull request Jun 26, 2022
* GH-93444: remove redundant fields from basicblock: b_nofallthrough, b_exit, b_return (GH-93445)

* netrc: Remove unused "import shlex" (#93311)

* gh-92886: Fix test that fails when running with `-O` in `test_imaplib.py` (#93237)

* Fix missing word in sys.float_info docstring (GH-93489)

* [doc] Correct a grammatical error in a docstring. (GH-93441)

* gh-93442: Make C++ version of _Py_CAST work with 0/NULL. (#93500)

Add C++ overloads for _Py_CAST_impl() to handle 0/NULL.  This will allow
C++ extensions that pass 0 or NULL to macros using _Py_CAST() to
continue to compile.  Without this, you get an error like:

    invalid ‘static_cast’ from type ‘int’ to type ‘_object*’

The modern way to use a NULL value in C++ is to use nullptr.  However,
we want to not break extensions that do things the old way.

Co-authored-by: serge-sans-paille

* gh-93442: Add test for _Py_CAST(nullptr). (gh-93505)

* gh-90473: wasmtime does not support absolute symlinks (GH-93490)

* gh-89973: Fix re.error in the fnmatch module. (GH-93072)

Character ranges with upper bound less that lower bound (e.g. [c-a])
are now interpreted as empty ranges, for compatibility with other glob
pattern implementations. Previously it was re.error.

* Document LOAD_FAST_CHECK opcode (#93498)

* gh-93247: Fix assert function in asyncio locks test (#93248)

* gh-90473: WASI requires proper open(2) flags (GH-93529)

* GH-92308 What's New: list pending removals in 3.13 and future versions (#92562)

* gh-90473: Skip POSIX tests that don't apply to WASI (GH-93536)

* asyncio.Barrier docs: Fix typo (#93371)

taks -> tasks

* gh-83728: Add hmac.new default parameter deprecation (GH-91939)

* gh-90473: Make chmod a dummy on WASI, skip chmod tests (GH-93534)

WASI does not have the ``chmod(2)`` syscall yet.

* Remove action=None kwarg from Barrier docs (GH-93538)

* [docs] fix some asyncio.Barrier.wait docs grammar (GH-93552)

* gh-93475: Expose FICLONE and FICLONERANGE constants in fcntl (#93478)

* gh-89018: Improve documentation of `sqlite3` exceptions (#27645)

- Order exceptions as in PEP 249
- Reword descriptions, so they match the current behaviour

Co-authored-by: Alex Waygood <[email protected]>

* bpo-42658: Use LCMapStringEx in ntpath.normcase to match OS behaviour for case-folding (GH-32010)

* Fix contributor name in WhatsNew 3.11 (GH-93556)

* Grammar fix to socket error string (GH-93523)

* gh-86986: bump min sphinx version to 3.2 (GH-93337)

* gh-79096: Protect cookie file created by {LWP,Mozilla}CookieJar.save() (GH-93463)

Note: This change is not effective on Microsoft Windows.

Cookies can store sensitive information and should therefore be protected
against unauthorized third parties. This is also described in issue #79096.

The filesystem permissions are currently set to 644, everyone can read the
file. This commit changes the permissions to 600, only the creater of the file
can read and modify it. This improves security, because it reduces the attack
surface. Now the attacker needs control of the user that created the cookie or
a ways to circumvent the filesystems permissions.

This change is backwards incompatible. Systems that rely on world-readable
cookies will breake. However, one could argue that those are misconfigured in
the first place.

* gh-93162: Add ability to configure QueueHandler/QueueListener together (GH-93269)

Also, provide getHandlerByName() and getHandlerNames() APIs.

Closes #93162.

* gh-57539: Increase calendar test coverage (GH-93468)

Co-authored-by: Sean Fleming
Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Łukasz Langa <[email protected]>

* gh-88831: In docs for asyncio.create_task, explain why strong references to tasks are needed (GH-93258)

Co-authored-by: Łukasz Langa <[email protected]>

* Shrink the LOAD_METHOD cache by one codeunit. (#93537)

* Fix MSVC compiler warnings in ceval.c (#93569)

* gh-93162: test_config_queue_handler requires threading (GH-93572)

* gh-84461: Emscripten's faccessat() does not accept flags (GHß92353)

* gh-92592: Allow logging filters to return a LogRecord. (GH-92591)

* Fix `PurePath.relative_to` links in the pathlib documentation. (GH-93268)

These are currently broken as they refer to :meth:`Path.relative_to` rather than :meth:`PurePath.relative_to`, and `relative_to` is a method on `PurePath`.

* GH-93481: Suppress expected deprecation warning in test_pyclbr (GH-93483)

* gh-93370: Deprecate sqlite3.version and sqlite3.version_info (#93482)

Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Erlend E. Aasland <[email protected]>

* GH-93521: For dataclasses, filter out `__weakref__` slot if present in bases (GH-93535)

* gh-93421: Update sqlite3 cursor.rowcount only after SQLITE_DONE (#93526)

* gh-93584: Make all install+tests targets depends on all (GH-93589)

All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.

"test" targets now correctly compile PGO build in a clean repo.

* gh-87961: Remove outdated notes from functions that aren't in the Limited API (GH-93581)

* Remove outdated notes from functions that aren't in the Limited API

Nowadays everything that *is* in the Limited API has a note added
automatically.
These notes could mislead people to think that these functions
could never be added to the limited API. Remove them.

* Also remove forgotten note on tp_vectorcall_offset not being finalized

* gh-93180: Update os.copy_file_range() documentation (#93182)

* gh-93575: Use correct way to calculate PyUnicode struct sizes (GH-93602)

* gh-93575: Use correct way to calculate PyUnicode struct sizes

* Add comment to keep test_sys and test_unicode in sync

* Fix case code < 256

* gh-90473: Define HOSTRUNNER for WASI (GH-93606)

* gh-79096: Fix/improve http cookiejar tests (GH-93614)

Fixup of GH-93463:
- remove stray print
- use proper way to check file mode
- add working chmod decorator

Co-authored-by: Łukasz Langa <[email protected]>

* gh-93616: Fix env changed issue in test_modulefinder (GH-93617)

* gh-90494: Reject 6th element of the __reduce__() tuple (GH-93609)

copy.copy() and copy.deepcopy() now always raise a TypeError if
__reduce__() returns a tuple with length 6 instead of silently ignore
the 6th item or produce incorrect result.

* Doc: Update references and examples of old, unsupported OSes and uarches (GH-92791)

* bpo-45383: Get metaclass from bases in PyType_From* (GH-28748)

This checks the bases of of a type created using the FromSpec
API to inherit the bases metaclasses.  The metaclass's alloc
function will be called as is done in `tp_new` for classes
created in Python.

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Erlend Egeberg Aasland <[email protected]>

* Improve logging documentation with example and additional cookbook re… (GH-93644)

* gh-90473: disable user site packages on WASI/Emscripten (GH-93633)

* gh-90473: Skip get_config_h() tests on WASI (GH-93645)

* gh-90549: Fix leak of global named resources using multiprocessing spawn (#30617)

Co-authored-by: XD Trol <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>

* gh-92434: Silence compiler warning in Modules/_sqlite/connection.c on 32-bit systems (#93090)

* gh-90763: Modernise xx template module initialisation (#93078)

Use C APIs such as PyModule_AddType instead of PyModule_AddObject.
Also remove incorrect module decrefs if module fails to initialise.

* gh-93491: Add support tier detection to configure (GH-93492)

Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Steve Dower <[email protected]>
Co-authored-by: Erlend Egeberg Aasland <[email protected]>

* gh-93466: Document PyType_Spec doesn't accept repeated slot IDs; raise where this was problematic (GH-93471)

* gh-93671: Avoid exponential backtracking in deeply nested sequence patterns in match statements (GH-93680)

Co-authored-by: Łukasz Langa <[email protected]>

* gh-81790: support "UNC" device paths in `ntpath.splitdrive()` (GH-91882)

* GH-93621: reorder code in with/async-with exception exit path to reduce the size of the exception table (GH-93622)

* gh-93461: Invalidate sys.path_importer_cache entries with relative paths (GH-93653)

* gh-91317: Document that Path does not collapse initial `//` (GH-32193)



Documentation for `pathlib` says:

> Spurious slashes and single dots are collapsed, but double dots ('..') are not, since this would change the meaning of a path in the face of symbolic links:

However, it omits that initial double slashes also aren't collapsed.

Later, in documentation of `PurePath.drive`, `PurePath.root`, and `PurePath.name` it mentions UNC but:

- this abbreviation says nothing to a person who is unaware about existence of UNC (Wikipedia doesn't help either by [giving a disambiguation page](https://en.wikipedia.org/wiki/UNC))
- it shows up only if a person needs to use a specific property or decides to fully learn what the module provides.

For context, see the BPO entry.

* gh-92886: Fix tests that fail when running with optimizations (`-O`) in `test_zipimport.py` (GH-93236)

* gh-92930: _pickle.c: Acquire strong references before calling save() (GH-92931)

* gh-84461: Use HOSTRUNNER to run regression tests (GH-93694)

Co-authored-by: Brett Cannon <[email protected]>

* gh-90473: Skip test_queue when threading is not available (GH-93712)

* gh-90153:  whatsnew: "z" option in format spec (GH-93624)

Add what's new entry for PEP 682 in Python 3.11.

* gh-86404: [doc] A make sucpicious false positive. (GH-93710)

* Change list to view object (#93661)

* gh-84508: tool to generate cjk traditional chinese mappings (gh-93272)

* Remove usage of _Py_IDENTIFIER from math module (#93739)

* gh-91162: Support splitting of unpacked arbitrary-length tuple over TypeVar and TypeVarTuple parameters (alt) (GH-93412)

For example:

  A[T, *Ts][*tuple[int, ...]] -> A[int, *tuple[int, ...]]
  A[*Ts, T][*tuple[int, ...]] -> A[*tuple[int, ...], int]

* gh-93728: fix memory leak in deepfrozen code objects (GH-93729)

* gh-93747: Fix Refleak when handling multiple Py_tp_doc slots (gh-93749)

* GH-90699: use statically allocated strings in typeobject.c (gh-93751)

* Add more FOR_ITER specialization stats (GH-32151)

* gh-89653: PEP 670: Convert PyFunction macros (#93765)

Convert PyFunction macros to static inline functions.

* Remove ANY_VARARGS() macro from the C API (#93764)

The macro was exposed by mistake.

* gh-84623: Remove unused imports in stdlib (#93773)

* gh-91731: Don't define 'static_assert' in C++11 where is a keyword to avoid UB (GH-93700)

* gh-84623: Remove unused imports in tests (#93772)

* gh-93353: Fix importlib.resources._tempfile() finalizer (#93377)

Fix the importlib.resources.as_file() context manager to remove the
temporary file if destroyed late during Python finalization: keep a
local reference to the os.remove() function. Patch by Victor Stinner.

* gh-84461: Fix parallel testing on WebAssembly (GH-93768)

* gh-89653: PEP 670: Macros always cast arguments in cpython/ (#93766)

Header files in the Include/cpython/ are only included if
the Py_LIMITED_API macro is not defined.

* gh-93353: Add test.support.late_deletion() (#93774)

* gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)

It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.

Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.

* gh-79512: Fixed names and __module__ value of weakref classes (GH-93719)

Classes ReferenceType, ProxyType and CallableProxyType have now correct
atrtributes __module__, __name__ and __qualname__.
It makes them (types, not instances) pickleable.

* gh-91810: Fix regression with writing an XML declaration with encoding='unicode' (GH-93426)

Suppress writing an XML declaration in open files in ElementTree.write()
with encoding='unicode' and xml_declaration=None.

If file patch is passed to ElementTree.write() with encoding='unicode',
always open a new file in UTF-8.

* gh-93761: Fix test to avoid simple delay when synchronizing. (GH-93779)

* gh-89546: Clean up PyType_FromMetaclass (GH-93686)



When changing PyType_FromMetaclass recently (GH-93012, GH-93466, GH-28748)
I found a bunch of opportunities to improve the code. Here they are.

Fixes: #89546

Automerge-Triggered-By: GH:encukou

* gh-91321: Fix compatibility with C++ older than C++11 (#93784)

Fix the compatibility of the Python C API with C++ older than C++11.

_Py_NULL is only defined as nullptr on C++11 and newer.

* GH-93662: Make sure that column offsets are correct in multi-line method calls. (GH-93673)

* GH-93516: Store offset of first traceable instruction in code object (GH-93769)

* gh-90473: Include stdlib dir in wasmtime PYTHONPATH (GH-93797)

* GH-93429: Merge `LOAD_METHOD` back into `LOAD_ATTR` (GH-93430)

* gh-93353: regrtest checks for leaked temporary files (#93776)

When running tests with -jN, create a temporary directory per process
and mark a test as "environment changed" if a test leaks a temporary
file or directory.

* gh-79579: Improve DML query detection in sqlite3 (#93623)

The fix involves using pysqlite_check_remaining_sql(), not only to check
for multiple statements, but now also to strip leading comments and
whitespace from SQL statements, so we can improve DML query detection.

pysqlite_check_remaining_sql() is renamed lstrip_sql(), to more
accurately reflect its function, and hardened to handle more SQL comment
corner cases.

* GH-93678: reduce boilerplate and code repetition in the compiler (GH-93682)

* gh-91877: Fix WriteTransport.get_write_buffer_{limits,size} docs (#92338)

- Amend docs for WriteTransport.get_write_buffer_limits
- Add docs for WriteTransport.get_write_buffer_size

* GH-93429: Document `LOAD_METHOD` removal (GH-93803)

* Include freelists in allocation total. (GH-93799)

* gh-93795: Use test.support TESTFN/unlink in sqlite3 tests (#93796)

* Remove LOAD_METHOD stats. (GH-93807)

* Rename 'LOAD_METHOD' specialization stat consts to 'ATTR'. (GH-93812)

* gh-93353: Fix regrtest for -jN with N >= 2 (GH-93813)

* [docs] Fix LOAD_ATTR version changed (GH-93816)

* gh-93814: Add infinite test for itertools.chain.from_iterable (GH-93815)



fix #93814

Automerge-Triggered-By: GH:rhettinger

* gh-93735: Split Docs CI to speed-up the build (GH-93736)

* gh-93183: Adjust wording in socket docs (#93832)

package => packet

Co-authored-by: Victor Norman

* gh-93829: In sqlite3, replace Py_BuildValue with faster APIs (#93830)

- In Modules/_sqlite/connection.c, use PyLong_FromLong
- In Modules/_sqlite/microprotocols.c, use PyTuple_Pack

* Add test.support.busy_retry() (#93770)

Add busy_retry() and sleeping_retry() functions to test.support.

* gh-87260: Update sqlite3 signature docs to reflect actual implementation (#93840)

Align the docs for the following methods with the actual implementation:

- sqlite3.complete_statement()
- sqlite3.Connection.create_function()
- sqlite3.Connection.create_aggregate()
- sqlite3.Connection.set_progress_handler()

* test_thread uses support.sleeping_retry() (#93849)

test_thread.test_count() now fails if it takes longer than
LONG_TIMEOUT seconds.

* Use support.sleeping_retry() and support.busy_retry() (#93848)

* Replace time.sleep(0.010) with sleeping_retry() to
  use an exponential sleep.
* support.wait_process(): reuse sleeping_retry().
* _test_eintr: remove unused variables.

* Update includes in call.c (GH-93786)

* gh-93857: Fix broken audit-event targets in sqlite3 docs (#93859)

Corrected targets for the following audit-events:

- sqlite3.enable_load_extension => sqlite3.Connection.enable_load_extension
- sqlite3.load_extension => sqlite3.Connection.load_extension

* GH-93850: Fix test_asyncio exception ignored tracebacks (#93854)

* gh-93824: Reenable installation of shell extension on Windows ARM64 (GH-93825)

* test_asyncio: run_until() implements exponential sleep (#93866)

run_until() of test.test_asyncio.utils now uses an exponential sleep
delay (max: 1 second), rather than a fixed delay of 1 ms. Similar
design than support.sleeping_retry() wait strategy that applies
exponential backoff.

* test_asyncore: Optimize capture_server() (#93867)

Remove time.sleep(0.01) in test_asyncore capture_server(). The sleep
was redundant and inefficient, since the loop starts with
select.select() which also implements a sleep (poll for socket data
with a timeout).

* Tests call sleeping_retry() with SHORT_TIMEOUT (#93870)

Tests now call busy_retry() and sleeping_retry() with SHORT_TIMEOUT
or LONG_TIMEOUT (of test.support), rather than hardcoded constants.

Add also WAIT_ACTIVE_CHILDREN_TIMEOUT constant to
_test_multiprocessing.

* gh-84461: Document how to install SDKs manually (GH-93844)

Co-authored-by: Brett Cannon <[email protected]>

* gh-93820: Fix copy() regression in enum.Flag (GH-93876)



GH-26658 introduced a regression in copy / pickle protocol for combined
`enum.Flag`s. `copy.copy(re.A | re.I)` would fail with
`AttributeError: ASCII|IGNORECASE`.

`enum.Flag` now has a `__reduce_ex__()` method that reduces flags by
combined value, not by combined name.

* Call busy_retry() and sleeping_retry() with error=True (#93871)

Tests no longer call busy_retry() and sleeping_retry() with
error=False: raise an exception if the loop times out.

* gh-87347: Add parenthesis around PyXXX_Check() arguments (#92815)

* gh-91321: Fix test_cppext for C++03 (#93902)

Don't build _testcppext.cpp with -Wzero-as-null-pointer-constant when
testing C++03: only use this compiler flag with C++11.

* gh-91577: SharedMemory move imports out of methods (#91579)

SharedMemory.unlink() uses the unregister() function from resource_tracker. Previously it was imported in the method, but this can fail if the method is called during interpreter shutdown, for example when unlink is part of a __del__() method.

Moving the import to the top of the file, means that the unregister() method is available during interpreter shutdown.

The register call in SharedMemory.__init__() can also use this imported resource_tracker.

* gh-92547: Amend What's New (#93872)

* Fix BINARY_SUBSCR_GETITEM stats (GH-93903)

* gh-93847: Fix repr of enum of generic aliases (GH-93885)

* gh-93353: regrtest supports checking tmp files with -j2 (#93909)

regrtest now also implements checking for leaked temporary files and
directories when using -jN for N >= 2. Use tempfile.mkdtemp() to
create the temporary directory. Skip this check on WASI.

* GH-91389: Fix dis position information for CACHEs (GH-93663)

* gh-91985: Ensure in-tree builds override platstdlib_dir in every path calculation (GH-93641)

* GH-83658: make multiprocessing.Pool raise an exception if maxtasksperchild is not None or a positive int (GH-93364)



Closes #83658.

* test_logging: Fix BytesWarning in SysLogHandlerTest (GH-93920)

* gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or allocation failure (GH-32283) (#93882)

Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"

This reverts commit 6e3eee5.

Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.

Thanks for reviews by Ma Lin and Serhiy Storchaka.

* gh-89745: Avoid exact match when comparing program_name in test_embed on Windows (GH-93888)

* gh-93852: Add test.support.create_unix_domain_name() (#93914)

test_asyncio, test_logging, test_socket and test_socketserver now
create AF_UNIX domains in the current directory to no longer fail
with OSError("AF_UNIX path too long") if the temporary directory (the
TMPDIR environment variable) is too long.

Modify the following tests to use create_unix_domain_name():

* test_asyncio
* test_logging
* test_socket
* test_socketserver

test_asyncio.utils: remove unused time import.

* gh-77782: Py_FdIsInteractive() now uses PyConfig.interactive (#93916)

* gh-74953: Add _PyTime_FromMicrosecondsClamp() function (#93942)

* gh-74953: Fix PyThread_acquire_lock_timed() code recomputing the timeout (#93941)

Set timeout, don't create a local variable with the same name.

* gh-77782: Deprecate global configuration variable (#93943)

Deprecate global configuration variable like
Py_IgnoreEnvironmentFlag: the Py_InitializeFromConfig() API should be
instead.

Fix declaration of Py_GETENV(): use PyAPI_FUNC(), not PyAPI_DATA().

* gh-93911: Specialize `LOAD_ATTR_PROPERTY` (GH-93912)

* gh-92888: Fix memoryview bad `__index__` use after free (GH-92946)

Co-authored-by: chilaxan <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>

* GH-89858: Fix test_embed for out-of-tree builds (GH-93465)

* gh-92611: Add details on replacements for cgi utility funcs (GH-92792)



Per @brettcannon 's [suggestions on the Discourse thread](https://discuss.python.org/t/pep-594-take-2-removing-dead-batteries-from-the-standard-library/13508/51), discussed in #92611 and as a followup to PR #92612 , this PR add additional specific per-function replacement information for the utility functions in the `cgi` module deprecated by PEP 594 (PEP-594).

@brettcannon , should this be backported (without the `deprecated-removed` , which I would update it accordingly and re-add in my other PR adding that to the others for 3.11+), or just go in 3.11+?

* GH-77403: Fix tests which fail when PYTHONUSERBASE is not normalized (GH-93917)

* gh-91387: Strip trailing slash from tarfile longname directories (GH-32423)

Co-authored-by: Brett Cannon <[email protected]>

* Add jaraco as primary owner of importlib.metadata and importlib.resources. (#93960)

* Add jaraco as primary owner of importlib.metadata and importlib.resources.

* Align indentation.

Co-authored-by: Ezio Melotti <[email protected]>

Co-authored-by: Ezio Melotti <[email protected]>

* gh-84461: Fix circulare dependency on BUILDPYTHON (GH-93977)

* gh-89828: Do not relay the __class__ attribute in GenericAlias (#93754)

list[int].__class__ returned type, and isinstance(list[int], type)
returned True. It caused numerous problems in code that checks
isinstance(x, type).

* gh-84461: Fix pydebug Emscripten browser builds (GH-93982)

wasm_assets script did not take the ABIFLAG flag of sysconfigdata into
account.

* gh-93955: Use unbound methods for slot `__getattr__` and `__getattribute__` (GH-93956)

* gh-91387: Fix tarfile test on WASI (GH-93984)

WASI's rmdir() syscall does not like the trailing slash.

* gh-93975: Nicer error reporting in test_venv (GH-93959)



- gh-93957: Provide nicer error reporting from subprocesses in test_venv.EnsurePipTest.test_with_pip.
- Update changelog

This change does three things:

1. Extract a function for trapping output in subprocesses.
2. Emit both stdout and stderr when encountering an error.
3. Apply the change to `ensurepip._uninstall` check.

* GH-93990: fix refcounting bug in `add_subclass` in `typeobject.c` (GH-93989)

* What's new in 3.10: fix link to issue (#93968)

* What's new in 3.10: fix link to issue

* What's new in 3.10: fix link to GH issue

Co-authored-by: Ezio Melotti <[email protected]>

Co-authored-by: Ezio Melotti <[email protected]>

* gh-93761: Fix test_logging test_config_queue_handler() race condition (#93952)

Fix a race condition in test_config_queue_handler() of test_logging.

* gh-74953: Reformat PyThread_acquire_lock_timed() (#93947)

Reformat the pthread implementation of PyThread_acquire_lock_timed()
using a mutex and a conditioinal variable.

* Add goto to avoid multiple indentation levels and exit quickly
* Use "while(1)" and make the control flow more obvious.
* PEP 7: Add braces around if blocks.

* gh-93937, C API: Move PyFrame_GetBack() to Python.h (#93938)

Move the follow functions and type from frameobject.h to pyframe.h,
so the standard <Python.h> provide frame getter functions:

* PyFrame_Check()
* PyFrame_GetBack()
* PyFrame_GetBuiltins()
* PyFrame_GetGenerator()
* PyFrame_GetGlobals()
* PyFrame_GetLasti()
* PyFrame_GetLocals()
* PyFrame_Type

Remove #include "frameobject.h" from many C files. It's no longer
needed.

* gh-93991: Use boolean instead of 0/1 for condition check (GH-93992)



# gh-93991: Use boolean instead of 0/1 for condition check

* gh-84461: Fix Emscripten umask and permission issues (GH-94002)

- Emscripten's default umask is too strict, see
  emscripten-core/emscripten#17269
- getuid/getgid and geteuid/getegid are stubs that always return 0
  (root). Disable effective uid/gid syscalls and fix tests that use
  chmod() current user.
- Cannot drop X bit from directory.

* gh-84461: Skip test_unwritable_directory again on Emscripten (GH-94007)

GH-93992 removed geteuid() and enabled the test again on Emscripten.

* gh-93925: Improve clarity of sqlite3 commit/rollback, and close docs (#93926)

Co-authored-by: CAM Gerlach <[email protected]>

* gh-61162: Clarify sqlite3 connection context manager docs (GH-93890)



Explicitly note that transactions are only closed if there is an open
transation at `__exit__`, and that transactions are not implicitly
opened during `__enter__`.

Co-authored-by: CAM Gerlach <[email protected]>
Co-authored-by: Stanley <[email protected]>

Automerge-Triggered-By: GH:erlend-aasland

* gh-79009: sqlite3.iterdump now correctly handles tables with autoincrement (#9621)

Co-authored-by: Erlend E. Aasland <[email protected]>

* gh-84461: Silence some compiler warnings on WASM (GH-93978)

* GH-93897: Store frame size in code object and de-opt if insufficient space on thread frame stack. (GH-93908)

* GH-93516: Speedup line number checks when tracing. (GH-93763)

* Use a lookup table to reduce overhead of getting line numbers during tracing.

* gh-90539: doc: Expand on what should not go into CFLAGS, LDFLAGS (#92754)

* gh-87347: Add parenthesis around macro arguments (#93915)

Add unit test on Py_MEMBER_SIZE() and some other macros.

* gh-93937: PyOS_StdioReadline() uses PyConfig.legacy_windows_stdio (#94024)

On Windows, PyOS_StdioReadline() now gets
PyConfig.legacy_windows_stdio from _PyOS_ReadlineTState, rather than
using the deprecated global Py_LegacyWindowsStdioFlag variable.

Fix also a compiler warning in Py_SetStandardStreamEncoding().

* GH-93249: relax overly strict assertion on bounds->ar_start (GH-93961)

* gh-94021: Address unreachable code warning in specialize code (GH-94022)

* GH-93678: refactor compiler so that optimizer does not need the assembler and compiler structs (GH-93842)

* gh-93839: Move Lib/ctypes/test/ to Lib/test/test_ctypes/ (#94041)

* Move Lib/ctypes/test/ to Lib/test/test_ctypes/
* Remove Lib/test/test_ctypes.py
* Update imports and build system.

* gh-93839: Move Lib/unttest/test/ to Lib/test/test_unittest/ (#94043)

* Move Lib/unittest/test/ to Lib/test/test_unittest/
* Remove Lib/test/test_unittest.py
* Replace unittest.test with test.test_unittest
* Remove unittest.load_tests()
* Rewrite unittest __init__.py and __main__.py
* Update build system, CODEOWNERS, and wasm_assets.py

* GH-91432: Specialize FOR_ITER (GH-91713)

* Adds FOR_ITER_LIST and FOR_ITER_RANGE specializations.

* Adds _PyLong_AssignValue() internal function to avoid temporary boxing of ints.

* gh-94028: Clear and reset sqlite3 statements properly in cursor iternext (GH-94042)

* gh-94052: Don't re-run failed tests with --python option (#94054)

* gh-93839: Use load_package_tests() for testmock (GH-94055)



Fixes failing tests on WebAssembly platforms.

Automerge-Triggered-By: GH:tiran

* gh-54781: Move Lib/lib2to3/tests/ to Lib/test/test_lib2to3/ (#94049)

* Move Lib/lib2to3/tests/ to Lib/test/test_lib2to3/.
* Remove Lib/test/test_lib2to3.py.
* Update imports.
* all_project_files(): use different paths and sort files
  to make the tests more reproducible.
* Update references to tests.

* gh-74953: _PyThread_cond_after() uses _PyTime_t (#94056)

pthread _PyThread_cond_after() implementation now uses the _PyTime_t
type to handle properly overflow: clamp to the maximum value.

Remove MICROSECONDS_TO_TIMESPEC() function.

* GH-93841: Allow stats to be turned on and off, cleared and dumped at runtime. (GH-93843)

* gh-86986: Drop compatibility support for Sphinx 2 (GH-93737)

* Revert "bpo-42843: Keep Sphinx 1.8 and Sphinx 2 compatibility (GH-24282)"

This reverts commit 5c1f15b

* Revert "bpo-42579: Make workaround for various versions of Sphinx more robust (GH-23662)"

This reverts commit b63a620.

* gh-94068: Remove HVSOCKET_CONTAINER_PASSTHRU constant because it has been removed from Windows (GH-94069)



Fixes #94068

Automerge-Triggered-By: GH:zware

* Closes gh-94038: Update Release Schedule in README.rst from PEP 664 to PEP 693 (GH-94046)

* gh-93851: Fix all broken links in Doc/ (GH-93853)

* gh-93675: Fix typos in `Doc/` (GH-93676)

Closes #93675

* Minor optimization for Fractions.limit_denominator (GH-93730)

When we construct the upper and lower candidates in limit_denominator,
the numerator and denominator are already relatively prime (and the
denominator positive) by construction, so there's no need to go through
the usual normalisation in the constructor. This saves a couple of
potentially expensive gcd calls.

Suggested by Michael Scott Asato Cuthbert in GH-93477.

* gh-93240: clarify wording in IO tutorial (GH-93276)

Co-authored-by: Adam Turner <[email protected]>

* Tutorial: specify match cases don't fall through (GH-93615)

* gh-93021: Fix __text_signature__ for __get__ (GH-93023)

Because of the way wrap_descr_get is written, the second argument
to __get__ methods implemented through the wrapper is always
optional.

* gh-82927: Update files related to HTML entities. (GH-92504)

* DOC: correct bytesarray -> bytearray in comments (GH-92410)

* gh-87389: Fix an open redirection vulnerability in http.server. (#93879)

Fix an open redirection vulnerability in the `http.server` module when
an URI path starts with `//` that could produce a 301 Location header
with a misleading target.  Vulnerability discovered, and logic fix
proposed, by Hamza Avvan (@hamzaavvan).

Test and comments authored by Gregory P. Smith [Google].

* gh-89336: Remove configparser APIs that were deprecated for 3.12 (#92503)

https://github.com/python/cpython/issue/89336: Remove configparser 3.12 deprecations.

Co-authored-by: Hugo van Kemenade <[email protected]>

* bpo-30535: [doc] state that sys.meta_path is not empty by default (GH-94098)

Co-authored-by: Windson yang <[email protected]>

* gh-88123: Implement new Enum __contains__ (GH-93298)

Co-authored-by: Ethan Furman <[email protected]>

* Stats: Add summary of top instructions for misses and deferred specialization. (GH-94072)

* gh-74696: Do not change the current working directory in shutil.make_archive() if possible (GH-93160)

It is no longer changed when create a zip or tar archive.

It is still changed for custom archivers registered with shutil.register_archive_format()
if root_dir is not None.

Co-authored-by: Éric <[email protected]>
Co-authored-by: Łukasz Langa <[email protected]>

* gh-94101 Disallow instantiation of SSLSession objects (GH-94102)



Fixes #94101

Automerge-Triggered-By: GH:tiran

* Fix typo in _io.TextIOWrapper Clinic input (#94037)

Co-authored-by: Łukasz Langa <[email protected]>

* gh-93951: In test_bdb.StateTestCase.test_skip, avoid including auxiliary importers. (GH-93962)

Co-authored-by: Brett Cannon <[email protected]>

* gh-91172: Create a workflow for verifying bundled pip and setuptools (GH-31885)

Co-authored-by: Hugo van Kemenade <[email protected]>
Co-authored-by: Adam Turner <[email protected]>

* gh-94114: Remove obsolete reference to python.org mirrors (GH-94115)



* gh-94114

* gh-84623: Remove unused imports (#94132)

* gh-54781: Move Lib/tkinter/test/test_ttk/ to Lib/test/test_ttk/ (#94070)

* Move Lib/tkinter/test/test_tkinter/ to Lib/test/test_tkinter/.
* Move Lib/tkinter/test/test_ttk/ to Lib/test/test_ttk/.
* Add Lib/test/test_ttk/__init__.py based on test_ttk_guionly.py.
* Add Lib/test/test_tkinter/__init__.py
* Remove old Lib/test/test_tk.py.
* Remove old Lib/test/test_ttk_guionly.py.
* Add __main__ sub-modules.
* Update imports and update references to rename files.

* gh-84623: Move imports in doctests (#94133)

Move imports in doctests to prevent false alarms in pyflakes.

* Add ABI dump Makefile target (#94136)

* gh-84623: Remove unused imports in idlelib (#94143)

Remove commented code in test_debugger_r.py.

Co-authored-by: Terry Jan Reedy <[email protected]>

* gh-85308: argparse: Use filesystem encoding for arguments file (GH-93277)

* Closes gh-94152: Update pyvideo.org URL (GH-94075)

The URL is now https://pyvideo.org, which uses HTTPS and avoids a redirect.

* gh-91456: [Enum] Deprecate default auto() behavior with mixed value types (GH-91457)

When used with plain Enum, auto() returns the last numeric value assigned, skipping any incompatible member values (such as strings); starting in 3.13 the default auto() for plain Enums will require all the values to be of compatible types, and will return a new value that is 1 higher than any existing value.

Co-authored-by: Ethan Furman <[email protected]>

* gh-84461: Fix test_sqlite for Emscripten/WASI (#94125)

* gh-86404: [doc] Fix missing backtick and double target name. (#94120)

* gh-89121: Keep the number of pending SQLite statements to a minimum (#30379)

Make sure statements that have run to completion or errored are
reset and cleared off the cursor for all paths in execute() and
executemany().

* GH-91742: Fix pdb crash after jump  (GH-94171)

* [Enum] fix typo (GH-94158)

* gh-92858: Improve error message for some suites with syntax error before ':' (#92894)

* gh-93771: Clarify how deepfreeze.py is run (#94150)

* gh-91219: Add an index_pages default list and parameter to SimpleHTTPRequestHandler (GH-31985)

* Add an index_pages default list to SimpleHTTPRequestHandler and an
optional constructor parameter that allows the default indexes pages
list to be overridden.  This makes it easy to set a new index page name
without having to override send_head.

* [Enum] Remove automatic docstring generation (GH-94188)

* Add ABI dump script (#94135)

* Add more tests for throwing into yield from (GH-94097)

* gh-94169: Remove deprecated io.OpenWrapper (#94170)

Remove io.OpenWrapper and _pyio.OpenWrapper, deprecated in Python
3.10: just use :func:`open` instead. The open() (io.open()) function
is a built-in function. Since Python 3.10, _pyio.open() is also a
static method.

* gh-94199: Remove ssl.RAND_pseudo_bytes() function (#94202)

Remove the ssl.RAND_pseudo_bytes() function, deprecated in Python
3.6: use os.urandom() or ssl.RAND_bytes() instead.

* gh-94196: Remove gzip.GzipFile.filename attribute (#94197)

gzip: Remove the filename attribute of gzip.GzipFile,
deprecated since Python 2.6, use the name attribute instead. In write
mode, the filename attribute added '.gz' file extension if it was not
present.

* gh-93692: remove "build finished successfully" message from setup.py (#93693)

The message was only emitted when the build succeeded _and_ there were
missing modules.

* gh-84461: Fix ctypes and test_ctypes on Emscripten (#94142)

- c_longlong and c_longdouble need experimental WASM bigint.
- Skip tests that need threading
- Define ``CTYPES_MAX_ARGCOUNT`` for Emscripten. libffi-emscripten 2022-06-23 supports up to 1000 args.

* gh-94205: Ensures all required DLLs are copied on Windows for underpth tests (GH-94206)

* gh-84461: Build Emscripten with WASM BigInt support (#94219)

* gh-94172: urllib.request avoids deprecated check_hostname (#94193)

The urllib.request no longer uses the deprecated check_hostname
parameter of the http.client module.

Add private http.client._create_https_context() helper to http.client,
used by urllib.request.

Remove the now redundant check on check_hostname and verify_mode in
http.client: the SSLContext.check_hostname setter already implements
the check.

* IDLE: replace if statement with expression (#94228)

* Docs: Remove `Provides [...]` from `multiprocessing.shared_memory` description (#92761)

* gh-93382: Sync up `co_code` changes with 3.11 (GH-94227)

Sync up co_code changes with 3.11 commit 852b4d4.

* gh-94217: Skip import tests when _testcapi is a builtin (GH-94218)

* gh-85308: Add argparse tests for reading non-ASCII arguments from file (GH-94160)

* bpo-46642: Explicitly disallow subclassing of instaces of TypeVar, ParamSpec, etc (GH-31148)

The existing test covering this case passed only incidentally. We
explicitly disallow doing this and add a proper error message.

Co-authored-by: Serhiy Storchaka <[email protected]>

* bpo-26253: Add compressionlevel to tarfile stream (GH-2962)

`tarfile` already accepts a compressionlevel argument for creating
files. This patch adds the same for stream-based tarfile usage.
The default is 9, the value that was previously hard-coded.

* gh-70441: Fix test_tarfile on systems w/o bz2 (gh-2962) (#94258)

* gh-94199: Remove ssl.match_hostname() function (#94224)

* gh-94207: Fix struct module leak (GH-94239)

Make _struct.Struct a GC type

This fixes a memory leak in the _struct module, where as soon
as a Struct object is stored in the cache, there's a cycle from
the _struct module to the cache to Struct objects to the Struct
type back to the module. If _struct.Struct is not gc-tracked, that
cycle is never collected.

This PR makes _struct.Struct GC-tracked, and adds a regression test.

* gh-94245: Test pickling and copying of typing.Tuple[()] (GH-94259)

* gh-77560: Report possible errors in restoring builtins at finalization (GH-94255)

Seems in the past the copy of builtins was not made in some scenarios,
and the error was silenced. Write it now to stderr, so we have a chance
to see it.

* gh-90016: Reword sqlite3 adapter/converter docs (#93095)

Also add adapters and converter recipes.

Co-authored-by: CAM Gerlach <[email protected]>
Co-authored-by: Alex Waygood <[email protected]

* bpo-39971: Change examples to be runnable (GH-32172)

* gh-70474: [doc] fix wording of GET_ANEXT doc (GH-94048)

* gh-93259: Validate arg to ``Distribution.from_name``. (GH-94270)

Syncs with importlib_metadata 4.12.0.

Co-authored-by: Irit Katriel <[email protected]>
Co-authored-by: Ulises Ojeda <[email protected]>
Co-authored-by: jackh-ncl <[email protected]>
Co-authored-by: Mark Dickinson <[email protected]>
Co-authored-by: Colin Delahunty <[email protected]>
Co-authored-by: Neil Schemenauer <[email protected]>
Co-authored-by: Christian Heimes <[email protected]>
Co-authored-by: Dennis Sweeney <[email protected]>
Co-authored-by: Cyker Way <[email protected]>
Co-authored-by: Hugo van Kemenade <[email protected]>
Co-authored-by: Omer Katz <[email protected]>
Co-authored-by: Stanley <[email protected]>
Co-authored-by: Thomas Grainger <[email protected]>
Co-authored-by: Illia Volochii <[email protected]>
Co-authored-by: Erlend Egeberg Aasland <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: AN Long <[email protected]>
Co-authored-by: Samodya Abeysiriwardane <[email protected]>
Co-authored-by: Evorage <[email protected]>
Co-authored-by: Davide Rizzo <[email protected]>
Co-authored-by: Pascal Wittmann <[email protected]>
Co-authored-by: Vinay Sajip <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Łukasz Langa <[email protected]>
Co-authored-by: Andreas Grommek <[email protected]>
Co-authored-by: Mark Shannon <[email protected]>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Adrian Garcia Badaracco <[email protected]>
Co-authored-by: jacksonriley <[email protected]>
Co-authored-by: Kalyan <[email protected]>
Co-authored-by: Bluenix <[email protected]>
Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: CAM Gerlach <[email protected]>
Co-authored-by: Sebastian Berg <[email protected]>
Co-authored-by: Leo Trol <[email protected]>
Co-authored-by: XD Trol <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: neonene <[email protected]>
Co-authored-by: Steve Dower <[email protected]>
Co-authored-by: Pablo Galindo Salgado <[email protected]>
Co-authored-by: Barney Gale <[email protected]>
Co-authored-by: Oleg Iarygin <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
Co-authored-by: John Belmonte <[email protected]>
Co-authored-by: Julien Palard <[email protected]>
Co-authored-by: Pamela Fox <[email protected]>
Co-authored-by: Dong-hee Na <[email protected]>
Co-authored-by: Kumar Aditya <[email protected]>
Co-authored-by: Victor Stinner <[email protected]>
Co-authored-by: Sanket Shanbhag <[email protected]>
Co-authored-by: Jeong YunWon <[email protected]>
Co-authored-by: Steve Dower <[email protected]>
Co-authored-by: samtygier <[email protected]>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Brandt Bucher <[email protected]>
Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: chilaxan <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: Chris Fernald <[email protected]>
Co-authored-by: Jason R. Coombs <[email protected]>
Co-authored-by: Ezio Melotti <[email protected]>
Co-authored-by: Lei Zhang <[email protected]>
Co-authored-by: Erlend Egeberg Aasland <[email protected]>
Co-authored-by: itssme <[email protected]>
Co-authored-by: Matthias Köppe <[email protected]>
Co-authored-by: MilanJuhas <[email protected]>
Co-authored-by: luzpaz <[email protected]>
Co-authored-by: paulreece <[email protected]>
Co-authored-by: max <[email protected]>
Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: Thomas A Caswell <[email protected]>
Co-authored-by: Windson yang <[email protected]>
Co-authored-by: Carl Bordum Hansen <[email protected]>
Co-authored-by: Ethan Furman <[email protected]>
Co-authored-by: Éric <[email protected]>
Co-authored-by: chgnrdv <[email protected]>
Co-authored-by: fikotta <[email protected]>
Co-authored-by: partev <[email protected]>
Co-authored-by: Terry Jan Reedy <[email protected]>
Co-authored-by: Inada Naoki <[email protected]>
Co-authored-by: Oscar R <[email protected]>
Co-authored-by: wookie184 <[email protected]>
Co-authored-by: Guido van Rossum <[email protected]>
Co-authored-by: Myron Walker <[email protected]>
Co-authored-by: Sam Ezeh <[email protected]>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Gregory Beauregard <[email protected]>
Co-authored-by: Yaron de Leeuw <[email protected]>
Co-authored-by: Mark Dickinson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tools for generating mappings_XX.h

6 participants