Skip to content

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Mar 29, 2021

  • Changes all offsets in jumps and branches to instruction from byte offsets. This doubles the range of instruction that can be targeted by an instruction.
  • Change the meaning of the C field f_lasti to match, which saves a bit of shifting in the interpreter.

https://bugs.python.org/issue27129

@gvanrossum gvanrossum self-requested a review March 29, 2021 16:36
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: I'd swap the two PyCode_Addr2Line's in Python/ceval.c with PyFrame_GetLineNumber for improved readability. (Unless the extra function call impacts performance.)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I didn't notice that when the wordcode work has been done. It's unfortunate that we didn't switch to instruction offset. I vaguely recall discussion about sizeof(_Py_CODEUNIT).

I support this change, here are some comments. My main worry is about PyCode_Addr2Line().

cc @serhiy-storchaka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a strange comment :-D I read it as "oh, Python 2.7 is too young, let's wait a little bit longer before we can use this shiny new function" !?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to treat the second argument as an instruction offset, rather than a byte offset. Rather than changing PyCode_Addr2Line() calls. It sounds unfortunate that the API allows to ask the address in the middle of a word (2 bytes).

It sounds common to pass f_lasti to PyCode_Addr2Line(): PyCode_Addr2Line(code, f_lasti) would still work, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how you think the API is defined.
If you want the the line number from an offset derived from code.co_linetable, then the API should be unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyCode_Addr2Line(code, offset) must take a byte offset.
To getf_lasti from the frame, the only API to do so is something like PyObject_GetAttrString(frame, "f_lasti"), which would return a byte offset.

Also see:
https://www.python.org/dev/peps/pep-0626/#c-api
https://www.python.org/dev/peps/pep-0626/#out-of-process-debuggers-and-profilers

@vstinner
Copy link
Member

I wanted to ask to document properly that PyCode_Addr2Line() uses a byte offset, but the function is not documented! Can you please document the function at:
https://docs.python.org/dev/c-api/code.html

Please document f_lasti unit and the change at: https://docs.python.org/dev/library/inspect.html

@markshannon
Copy link
Member Author

The comment on PyCode_Addr2Line() says
"If you just need the line number of a frame, use PyFrame_GetLineNumber() instead."
PEP 626 provides a faster way to extract line numbers: https://www.python.org/dev/peps/pep-0626/#out-of-process-debuggers-and-profilers
So I think we should keep PyCode_Addr2Line() undocumented, unless you want to document it as "don't use this."

There are no changes to the inspect module.

@vstinner
Copy link
Member

So I think we should keep PyCode_Addr2Line() undocumented, unless you want to document it as "don't use this."

It's part of the public C API, so people uses it (for good or bad reasons). A search on the top 4000 PyPI projects give me:

vstinner@apu$ ./search.sh PyCode_Addr2Line
2021-02-18/uWSGI-2.0.19.1.tar.gz
2021-02-18/rook-0.1.138.tar.gz
2021-02-18/pyuwsgi-2.0.19.1.post0.tar.gz
2021-02-18/Pygments-2.8.0.tar.gz
2021-02-18/pydevd-2.2.0.tar.gz
2021-02-18/pydevd-pycharm-211.5538.22.tar.gz
2021-02-18/pyelftools-0.27.tar.gz
2021-02-18/mod_wsgi-4.7.1.tar.gz
2021-02-18/line_profiler-3.1.0.tar.gz
2021-02-18/graphene-federation-0.1.0.tar.gz
2021-02-18/faulthandler-3.2.tar.gz
2021-02-18/Cython-0.29.21.tar.gz

Hopefully, the list is short. The documentation should describes the behavior, and it's ok to strongly advice to use another function instead ;-)

Lib/dis.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not go a step further and divide the labels in dis output by 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of reasons.

  • I wanted to keep changes to a minimum.
  • There is no guarantee that instructions will remain the same size, so using byte offsets keeps things more consistent across versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. However now we have a confusing discrepancy between what's displayed and how we're encouraged to think about the bytecode in the C code. I guess there's no perfect answer, so status quo wins.

@markshannon
Copy link
Member Author

@vstinner I'm fine with documenting PyCode_Addr2Line() but it hasn't been changed in this PR, so that is out of scope for this PR. I'll make another PR.

@markshannon markshannon force-pushed the use-instruction-offsets branch from 79b3b3b to 325d135 Compare March 31, 2021 10:12
@markshannon markshannon closed this Apr 1, 2021
@markshannon markshannon reopened this Apr 1, 2021
@markshannon markshannon closed this Apr 1, 2021
@markshannon markshannon reopened this Apr 1, 2021
@markshannon markshannon merged commit fcb55c0 into python:master Apr 1, 2021
@vstinner
Copy link
Member

vstinner commented Apr 1, 2021

@markshannon markshannon closed this 8 hours ago
@markshannon markshannon reopened this 8 hours ago

You should be allowed to click on the CI to "Re-run all jobs" to avoid closing/reopening the PR. Sadly, it's not possible to only re-run a single job.

sweeneyde added a commit to sweeneyde/cpython that referenced this pull request Apr 4, 2021
markshannon pushed a commit that referenced this pull request Apr 4, 2021
)

* Update magic numbers and bootstrapping for GH-25069

* add blurb

Co-authored-by: Terry Jan Reedy <[email protected]>
@markshannon markshannon deleted the use-instruction-offsets branch April 14, 2021 13:26
esc added a commit to esc/numba that referenced this pull request Sep 14, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Nov 2, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Nov 12, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Nov 16, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Dec 7, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
saulshanabrook added a commit to metadsl/metadsl that referenced this pull request Jan 5, 2022
saulshanabrook added a commit to metadsl/metadsl that referenced this pull request Jan 11, 2022
saulshanabrook added a commit to metadsl/python-code-data that referenced this pull request Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants