Unladen `error_already_set` #3964

rwgk · 2022-05-20T00:57:07Z

Description

Adopts the Boost.Python approach for error_already_set.

Motivations (see PR #1895):

Fixes undefined behavior of existing error_already_set copy constructor: it is very unlikely to trigger Python reference count corruption, but that is a possibility in the general case.
Maximizes performance, in particular avoids building what(), which is very rarely actually used.

This is a breaking change.

Based on the changes to the unit tests, I believe most required adjustments in user code are backward compatible:

Additionally needed PyErr_Clear(); are no-ops with released pybind11 versions.

It seems to be very rare that what() is called. #ifdefs (in user code) will be needed to replace those calls in a backward-compatible way.

More work is needed to be sure that there are no bigger obstacles.

Suggested changelog entry:

Skylion007 · 2022-05-20T01:06:39Z

@rwgk what() is called quite a bit in big projects like PyTorch. I strongly oppose removing the what() method.

rwgk · 2022-05-20T01:11:21Z

@rwgk what() is called quite a bit in big projects like PyTorch. I strongly oppose removing the what() method.

Currently the price we are paying for what() is UB. I'd have to explain a lot, making that part of a core technology at Google.

pytorch is included in our global testing. I'll learn the hard way how bad it is :-)

But first thing, I have to get the CI green here.

rwgk · 2022-05-20T02:08:34Z

Currently the price we are paying for what() is UB.

I need to correct myself: computing the what() non-lazily is fine. The UB is because we're holding py::objects that may be INCREFed without the GIL held. (And if we're trying to acquire the GIL, we're in trouble when the interpreter is finalizing.)

Still trying to get everything green. It's too early to make a judgement.

jbms · 2022-05-20T19:32:46Z

This is the behavior that one might expect based on the name error_already_set. However I think there are some drawbacks:

Most Python API functions cannot be called with an error set. Therefore, the user needs to be aware that one of these exceptions is in flight and make sure not to call any Python API functions without first clearing/saving the python error indicator. That constraint is manageable with Python C code that manually propagates the NULL return up the stack, but I think would not be very manageable with C++ exceptions that propagate implicitly.
Related to above point, can't support nested errors.
The user must not catch the exception without knowing what it is.
Major breaking change to how error handling works in pybind11, likely to break a lot of users, and may break in ways not caught by tests since it affects only error paths.

…< 3.8

…ists already on master (see PR pybind#3965).

…s used.

…verlooked before).

… failed: ` to not trigger a failure in test_embed/test_interpreter.cpp. Patch the test itself to not expect a chained error.

…&)` and add `PyErr_Clear()`

rwgk · 2022-05-23T05:40:36Z

The last commit (0a4d0c0) is what I used for exploratory global testing. Based on that, a quick estimate of what it would take to adapt to this change Google-internally:

"Everything": About 2000 files that match '#include.*/pybind11\.h' (not counting multiple copies of the pybind11 sources).
"Portion likely affected": About 100 files that match pybind11 error_already_set.
<20 files that use .restore(), .what(), .type(), .value(), .trace(). Only a handful of those are non-trivial to change. (This estimation is based on global testing, only fixing some build failures. Internal test ID OCL:450289283:BASE:450349644:1653281009482:4931dc01)

Very rough estimation: some of the 100 will probably not need to be touched, but a few changes may be needed elsewhere.

Net files that need changes will be around 100, i.e. about 5% of pybind11 user code.

rwgk · 2022-05-23T05:41:19Z

I'll close this PR for now. For future reference, the CI @ 0a4d0c0 was all green:

All checks have passed
2 skipped and 62 successful checks

This commit also passed testing with ASAN, MSAN, TSAN using the Google-internal toolchain.

…nd#3964

* Add missing error handling to module_::def_submodule * Add test_def_submodule_failures * PyPy only: Skip test with trigger for PyModule_GetName() failure. * Reapply minor fix that accidentally got lost in transfer from PR #3964

rwgk mentioned this pull request May 20, 2022

error_already_set::what() is now constructed lazily #1895

Merged

rwgk added 8 commits May 21, 2022 08:53

First version that passes all tests.

ba0c042

Remove e.restore(); (to resolve macos failure(s))

e407509

Compatibility with older compilers.

093df89

embed.h bug fix & required adjustments in test_interpreter.cpp

0d1ef98

Remove unused variable (oversight)

d85b6c9

PyErr_NormalizeException() before PyErr_WriteUnraisable() for Python …

e5dd168

…< 3.8

Remove unused variable (oversight)

fdab517

Temporary workaround for PYBIND11_CATCH_INIT_EXCEPTIONS issue that ex…

6010f5b

…ists already on master (see PR pybind#3965).

rwgk force-pushed the unladen_error_already_set branch from e13f29a to 6010f5b Compare May 21, 2022 16:24

rwgk added 8 commits May 21, 2022 17:26

bug fix: move PyErr_NormalizeException() up, before scope.value i…

2503684

…s used.

Add more PyErr_Clear() after catch (const error_already_set &) (o…

9663112

…verlooked before).

Fix up temporary workaround: prefix error_string with `initialization…

19797d8

… failed: ` to not trigger a failure in test_embed/test_interpreter.cpp. Patch the test itself to not expect a chained error.

fix py:: vs pybind11:: oversight

0838358

Add missing error handling to module_::def_submodule

3b1a823

Replace import numpy catch(...) with `catch (py::error_already_set …

9c858cc

…&)` and add `PyErr_Clear()`

Very minor: Use char literal instead of const char *.

6671d23

Add pybind11::get_error_string_and_clear_error()

0a4d0c0

rwgk closed this May 23, 2022

This was referenced May 25, 2022

Move PyErr_NormalizeException() up a few lines #3971

Merged

Add missing error handling to module_::def_submodule #3973

Merged

Avoid catch (...) for expected import numpy failures #3974

Merged

rwgk added a commit to rwgk/pybind11 that referenced this pull request May 28, 2022

Reapply minor fix that accidentally got lost in transfer from PR pybi…

dbbe9d1

…nd#3964

rwgk mentioned this pull request Feb 10, 2023

FWD pybind11 google/pybind11clif#3964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unladen `error_already_set` #3964

Unladen `error_already_set` #3964

rwgk commented May 20, 2022

Skylion007 commented May 20, 2022 •

edited

Loading

rwgk commented May 20, 2022 •

edited

Loading

rwgk commented May 20, 2022

jbms commented May 20, 2022

rwgk commented May 23, 2022

rwgk commented May 23, 2022 •

edited

Loading

Unladen error_already_set #3964

Unladen error_already_set #3964

Conversation

rwgk commented May 20, 2022

Description

Suggested changelog entry:

Skylion007 commented May 20, 2022 • edited Loading

rwgk commented May 20, 2022 • edited Loading

rwgk commented May 20, 2022

jbms commented May 20, 2022

rwgk commented May 23, 2022

rwgk commented May 23, 2022 • edited Loading

Unladen `error_already_set` #3964

Unladen `error_already_set` #3964

Skylion007 commented May 20, 2022 •

edited

Loading

rwgk commented May 20, 2022 •

edited

Loading

rwgk commented May 23, 2022 •

edited

Loading