Skip to content

gh-94215: Fix reference count issue in exception_unwind #94659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tiran
Copy link
Member

@tiran tiran commented Jul 7, 2022

Zero-cost exception handling did not take frame_setlineno() into
account. It can pop off stacks. This can lead to a crash.

Exception unwinding now re-calculates the stack pointer.

Co-authored-by: Irit Katriel [email protected]

@tiran tiran added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump needs backport to 3.11 only security fixes labels Jul 7, 2022
@tiran tiran added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 7, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @tiran for commit 990279b062ac4d61361ef7587e1977879606de52 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 7, 2022
@tiran
Copy link
Member Author

tiran commented Jul 7, 2022

The fix seems to introduce a reference leak:

test_sys_settrace leaked [17, 17, 17] references, sum=51
test_sys_setprofile leaked [4, 4, 4] references, sum=12

Leaking tests:

  • test.test_sys_settrace.JumpTestCase.test_no_jump_between_except_blocks
  • test.test_sys_settrace.JumpTestCase.test_no_jump_between_except_blocks_2
  • test.test_sys_settrace.JumpTestCase.test_no_jump_into_bare_except_block_from_try_block
  • test.test_sys_settrace.JumpTestCase.test_no_jump_into_qualified_except_block_from_try_block
  • test.test_sys_settrace.JumpTestCase.test_no_jump_out_of_bare_except_block
  • test.test_sys_settrace.JumpTestCase.test_no_jump_out_of_qualified_except_block
  • test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_1
  • test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_2
  • test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_3
  • test.test_sys_settrace.JumpTestCase.test_no_jump_to_except_4
  • test.test_sys_settrace.JumpTestCase.test_no_jump_within_except_block
  • test.test_sys_setprofile.ProfileHookTestCase.test_raise_reraise
  • test.test_sys_setprofile.ProfileHookTestCase.test_raise_twice
./python -m test test_sys_setprofile --list-cases | while read line; do ./python -m test test_sys_setprofile -R:: -m $line || echo $line >> leaks.txt; done

@tiran
Copy link
Member Author

tiran commented Jul 7, 2022

@encukou How did you figure out the leaking tests so quickly? I had to figure out that shell one-liner to run each test function separately.

@encukou
Copy link
Member

encukou commented Jul 7, 2022

Umm, manually: deleting chunks of the test suite & only keeping the deletion if the leaks stayed at [17, 17, 17] ;)

@tiran
Copy link
Member Author

tiran commented Jul 7, 2022

My approach takes longer, but is less manual work.

@brandtbucher
Copy link
Member

My approach takes longer, but is less manual work.

There's also ./python -m test.bisect_cmd -R:, which I use pretty frequently for these sorts of things. It will get it down to as few tests as possible (even just one, if it can).

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

At the least this seems to be an effective short-term solution. It may be valid for the long-term too. I've left some thoughts on the issue.

@brandtbucher
Copy link
Member

@tiran, applying this diff to your branch fixes the leaks for me:

diff --git a/Python/ceval.c b/Python/ceval.c
index 083f881807..e23bfda725 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -5683,6 +5683,8 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
                     err = maybe_call_line_trace(tstate->c_tracefunc,
                                                 tstate->c_traceobj,
                                                 tstate, frame, instr_prev);
+                    stack_pointer = _PyFrame_GetStackPointer(frame);
+                    frame->stacktop = -1;
                     if (err) {
                         /* trace function raised an exception */
                         next_instr++;
@@ -5690,9 +5692,6 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
                     }
                     /* Reload possibly changed frame fields */
                     next_instr = frame->prev_instr;
-
-                    stack_pointer = _PyFrame_GetStackPointer(frame);
-                    frame->stacktop = -1;
                 }
             }
         }
@@ -5795,11 +5794,6 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
 
                 /* Pop remaining stack entries. */
                 PyObject **stackbase = _PyFrame_Stackbase(frame);
-                if (frame->stacktop != -1) {
-                    // frame_setlineno() may have popped off additional stacks in
-                    // frame_stack_pop(). Re-calculate stack pointer.
-                    stack_pointer = _PyFrame_GetStackPointer(frame);
-                }
                 while (stack_pointer > stackbase) {
                     PyObject *o = POP();
                     Py_XDECREF(o);

I'm not entirely sure why yet, but my understanding (gained over the last couple of hours) is that frame->stacktop is expected to always be -1 in this function, except when we're calling a trace function (since GC of frame objects needs to work correctly there). This looks like one place where we forget to set it back to -1 and reload the (possibly changed) stack pointer if an error is raised in while tracing...

...maybe?

tiran and others added 2 commits July 7, 2022 22:31
Zero-cost exception handling did not take ``frame_setlineno()`` into
account. It can pop off stacks. This can lead to a crash.

Exception unwinding now re-calculates the stack pointer.

Co-authored-by: Irit Katriel <[email protected]>
@tiran
Copy link
Member Author

tiran commented Jul 8, 2022

superseded by #94681

@tiran tiran closed this Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge DO-NOT-MERGE interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.11 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants