gh-94215: Fix reference count issue in exception_unwind #94659

tiran · 2022-07-07T15:09:31Z

Zero-cost exception handling did not take frame_setlineno() into
account. It can pop off stacks. This can lead to a crash.

Exception unwinding now re-calculates the stack pointer.

Co-authored-by: Irit Katriel [email protected]

Issue: GC crash _PyObject_AssertFailed with pdb #94215

Misc/NEWS.d/next/Core and Builtins/2022-07-07-17-07-00.gh-issue-94215.cXltGH.rst

bedevere-bot · 2022-07-07T15:16:09Z

🤖 New build scheduled with the buildbot fleet by @tiran for commit 990279b062ac4d61361ef7587e1977879606de52 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

tiran · 2022-07-07T16:01:13Z

The fix seems to introduce a reference leak:

test_sys_settrace leaked [17, 17, 17] references, sum=51
test_sys_setprofile leaked [4, 4, 4] references, sum=12

Leaking tests:

test.test_sys_settrace.JumpTestCase.test_no_jump_between_except_blocks
test.test_sys_settrace.JumpTestCase.test_no_jump_between_except_blocks_2
test.test_sys_settrace.JumpTestCase.test_no_jump_into_bare_except_block_from_try_block
test.test_sys_settrace.JumpTestCase.test_no_jump_into_qualified_except_block_from_try_block
test.test_sys_settrace.JumpTestCase.test_no_jump_out_of_bare_except_block
test.test_sys_settrace.JumpTestCase.test_no_jump_out_of_qualified_except_block
test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_1
test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_2
test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_3
test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_4
test.test_sys_settrace.JumpTestCase.test_no_jump_within_except_block
test.test_sys_setprofile.ProfileHookTestCase.test_raise_reraise
test.test_sys_setprofile.ProfileHookTestCase.test_raise_twice

./python -m test test_sys_setprofile --list-cases | while read line; do ./python -m test test_sys_setprofile -R:: -m $line || echo $line >> leaks.txt; done

tiran · 2022-07-07T16:21:06Z

@encukou How did you figure out the leaking tests so quickly? I had to figure out that shell one-liner to run each test function separately.

encukou · 2022-07-07T16:25:13Z

Umm, manually: deleting chunks of the test suite & only keeping the deletion if the leaks stayed at [17, 17, 17] ;)

tiran · 2022-07-07T16:27:19Z

My approach takes longer, but is less manual work.

brandtbucher · 2022-07-07T18:41:37Z

My approach takes longer, but is less manual work.

There's also ./python -m test.bisect_cmd -R:, which I use pretty frequently for these sorts of things. It will get it down to as few tests as possible (even just one, if it can).

ericsnowcurrently

LGTM

At the least this seems to be an effective short-term solution. It may be valid for the long-term too. I've left some thoughts on the issue.

Python/ceval.c

brandtbucher · 2022-07-07T20:22:45Z

@tiran, applying this diff to your branch fixes the leaks for me:

diff --git a/Python/ceval.c b/Python/ceval.c
index 083f881807..e23bfda725 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -5683,6 +5683,8 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
                     err = maybe_call_line_trace(tstate->c_tracefunc,
                                                 tstate->c_traceobj,
                                                 tstate, frame, instr_prev);
+                    stack_pointer = _PyFrame_GetStackPointer(frame);
+                    frame->stacktop = -1;
                     if (err) {
                         /* trace function raised an exception */
                         next_instr++;
@@ -5690,9 +5692,6 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
                     }
                     /* Reload possibly changed frame fields */
                     next_instr = frame->prev_instr;
-
-                    stack_pointer = _PyFrame_GetStackPointer(frame);
-                    frame->stacktop = -1;
                 }
             }
         }
@@ -5795,11 +5794,6 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
 
                 /* Pop remaining stack entries. */
                 PyObject **stackbase = _PyFrame_Stackbase(frame);
-                if (frame->stacktop != -1) {
-                    // frame_setlineno() may have popped off additional stacks in
-                    // frame_stack_pop(). Re-calculate stack pointer.
-                    stack_pointer = _PyFrame_GetStackPointer(frame);
-                }
                 while (stack_pointer > stackbase) {
                     PyObject *o = POP();
                     Py_XDECREF(o);

I'm not entirely sure why yet, but my understanding (gained over the last couple of hours) is that frame->stacktop is expected to always be -1 in this function, except when we're calling a trace function (since GC of frame objects needs to work correctly there). This looks like one place where we forget to set it back to -1 and reload the (possibly changed) stack pointer if an error is raised in while tracing...

...maybe?

Zero-cost exception handling did not take ``frame_setlineno()`` into account. It can pop off stacks. This can lead to a crash. Exception unwinding now re-calculates the stack pointer. Co-authored-by: Irit Katriel <[email protected]>

tiran · 2022-07-08T06:33:28Z

superseded by #94681

tiran added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump needs backport to 3.11 only security fixes labels Jul 7, 2022

tiran requested review from iritkatriel, markshannon and pablogsal July 7, 2022 15:09

bedevere-bot added the awaiting core review label Jul 7, 2022

iritkatriel reviewed Jul 7, 2022

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2022-07-07-17-07-00.gh-issue-94215.cXltGH.rst Show resolved Hide resolved

tiran added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 7, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 7, 2022

ericsnowcurrently approved these changes Jul 7, 2022

View reviewed changes

Python/ceval.c Outdated Show resolved Hide resolved

bedevere-bot added awaiting merge and removed awaiting core review labels Jul 7, 2022

tiran added the DO-NOT-MERGE label Jul 7, 2022

tiran and others added 2 commits July 7, 2022 22:31

Let's try Brandt's patch

cee2887

tiran force-pushed the gh-94215-jump-crash branch from 990279b to cee2887 Compare July 7, 2022 20:31

brandtbucher mentioned this pull request Jul 8, 2022

gh-94215: Fix error handling for line-tracing events #94681

Merged

tiran closed this Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-94215: Fix reference count issue in exception_unwind #94659

gh-94215: Fix reference count issue in exception_unwind #94659

tiran commented Jul 7, 2022 •

edited by bedevere-bot

Loading

bedevere-bot commented Jul 7, 2022

tiran commented Jul 7, 2022 •

edited

Loading

tiran commented Jul 7, 2022

encukou commented Jul 7, 2022

tiran commented Jul 7, 2022 •

edited

Loading

brandtbucher commented Jul 7, 2022

ericsnowcurrently left a comment

brandtbucher commented Jul 7, 2022

tiran commented Jul 8, 2022

gh-94215: Fix reference count issue in exception_unwind #94659

gh-94215: Fix reference count issue in exception_unwind #94659

Conversation

tiran commented Jul 7, 2022 • edited by bedevere-bot Loading

bedevere-bot commented Jul 7, 2022

tiran commented Jul 7, 2022 • edited Loading

tiran commented Jul 7, 2022

encukou commented Jul 7, 2022

tiran commented Jul 7, 2022 • edited Loading

brandtbucher commented Jul 7, 2022

ericsnowcurrently left a comment

Choose a reason for hiding this comment

brandtbucher commented Jul 7, 2022

tiran commented Jul 8, 2022

tiran commented Jul 7, 2022 •

edited by bedevere-bot

Loading

tiran commented Jul 7, 2022 •

edited

Loading

tiran commented Jul 7, 2022 •

edited

Loading