Skip to content

Unladen error_already_set #3964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from
Closed

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented May 20, 2022

Description

Adopts the Boost.Python approach for error_already_set.

Motivations (see PR #1895):

  • Fixes undefined behavior of existing error_already_set copy constructor: it is very unlikely to trigger Python reference count corruption, but that is a possibility in the general case.
  • Maximizes performance, in particular avoids building what(), which is very rarely actually used.

This is a breaking change.

Based on the changes to the unit tests, I believe most required adjustments in user code are backward compatible:

  • Additionally needed PyErr_Clear(); are no-ops with released pybind11 versions.

It seems to be very rare that what() is called. #ifdefs (in user code) will be needed to replace those calls in a backward-compatible way.

More work is needed to be sure that there are no bigger obstacles.

See also:

Suggested changelog entry:

@Skylion007
Copy link
Collaborator

Skylion007 commented May 20, 2022

@rwgk what() is called quite a bit in big projects like PyTorch. I strongly oppose removing the what() method.

@rwgk
Copy link
Collaborator Author

rwgk commented May 20, 2022

@rwgk what() is called quite a bit in big projects like PyTorch. I strongly oppose removing the what() method.

Currently the price we are paying for what() is UB. I'd have to explain a lot, making that part of a core technology at Google.

pytorch is included in our global testing. I'll learn the hard way how bad it is :-)

But first thing, I have to get the CI green here.

@rwgk
Copy link
Collaborator Author

rwgk commented May 20, 2022

Currently the price we are paying for what() is UB.

I need to correct myself: computing the what() non-lazily is fine. The UB is because we're holding py::objects that may be INCREFed without the GIL held. (And if we're trying to acquire the GIL, we're in trouble when the interpreter is finalizing.)

Still trying to get everything green. It's too early to make a judgement.

@jbms
Copy link
Contributor

jbms commented May 20, 2022

This is the behavior that one might expect based on the name error_already_set. However I think there are some drawbacks:

  • Most Python API functions cannot be called with an error set. Therefore, the user needs to be aware that one of these exceptions is in flight and make sure not to call any Python API functions without first clearing/saving the python error indicator. That constraint is manageable with Python C code that manually propagates the NULL return up the stack, but I think would not be very manageable with C++ exceptions that propagate implicitly.

  • Related to above point, can't support nested errors.

  • The user must not catch the exception without knowing what it is.

  • Major breaking change to how error handling works in pybind11, likely to break a lot of users, and may break in ways not caught by tests since it affects only error paths.

@rwgk rwgk force-pushed the unladen_error_already_set branch from e13f29a to 6010f5b Compare May 21, 2022 16:24
@rwgk
Copy link
Collaborator Author

rwgk commented May 23, 2022

The last commit (0a4d0c0) is what I used for exploratory global testing. Based on that, a quick estimate of what it would take to adapt to this change Google-internally:

  • "Everything": About 2000 files that match '#include.*/pybind11\.h' (not counting multiple copies of the pybind11 sources).
  • "Portion likely affected": About 100 files that match pybind11 error_already_set.
  • <20 files that use .restore(), .what(), .type(), .value(), .trace(). Only a handful of those are non-trivial to change. (This estimation is based on global testing, only fixing some build failures. Internal test ID OCL:450289283:BASE:450349644:1653281009482:4931dc01)

Very rough estimation: some of the 100 will probably not need to be touched, but a few changes may be needed elsewhere.

Net files that need changes will be around 100, i.e. about 5% of pybind11 user code.

@rwgk
Copy link
Collaborator Author

rwgk commented May 23, 2022

I'll close this PR for now. For future reference, the CI @ 0a4d0c0 was all green:

All checks have passed
2 skipped and 62 successful checks

This commit also passed testing with ASAN, MSAN, TSAN using the Google-internal toolchain.

@rwgk rwgk closed this May 23, 2022
rwgk added a commit to rwgk/pybind11 that referenced this pull request May 28, 2022
rwgk added a commit that referenced this pull request May 28, 2022
* Add missing error handling to module_::def_submodule

* Add test_def_submodule_failures

* PyPy only: Skip test with trigger for PyModule_GetName() failure.

* Reapply minor fix that accidentally got lost in transfer from PR #3964
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants