Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 21 additions & 9 deletions Doc/library/dis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,9 @@ code.
.. class:: Bytecode(x, *, first_line=None, current_offset=None)


Analyse the bytecode corresponding to a function, generator, method, string
of source code, or a code object (as returned by :func:`compile`).
Analyse the bytecode corresponding to a function, generator, asynchronous
generator, coroutine, method, string of source code, or a code object (as
returned by :func:`compile`).

This is a convenience wrapper around many of the functions listed below, most
notably :func:`get_instructions`, as iterating over a :class:`Bytecode`
Expand Down Expand Up @@ -92,6 +93,9 @@ code.
Return a formatted multi-line string with detailed information about the
code object, like :func:`code_info`.

.. versionchanged:: 3.7
This can now handle coroutine and asynchronous generator objects.

Example::

>>> bytecode = dis.Bytecode(myfunc)
Expand All @@ -114,14 +118,18 @@ operation is being performed, so the intermediate analysis object isn't useful:
.. function:: code_info(x)

Return a formatted multi-line string with detailed code object information
for the supplied function, generator, method, source code string or code object.
for the supplied function, generator, asynchronous generator, coroutine,
method, source code string or code object.

Note that the exact contents of code info strings are highly implementation
dependent and they may change arbitrarily across Python VMs or Python
releases.

.. versionadded:: 3.2

.. versionchanged:: 3.7
This can now handle coroutine and asynchronous generator objects.


.. function:: show_code(x, *, file=None)

Expand All @@ -141,12 +149,13 @@ operation is being performed, so the intermediate analysis object isn't useful:
.. function:: dis(x=None, *, file=None, depth=None)

Disassemble the *x* object. *x* can denote either a module, a class, a
method, a function, a generator, a code object, a string of source code or
a byte sequence of raw bytecode. For a module, it disassembles all functions.
For a class, it disassembles all methods (including class and static methods).
For a code object or sequence of raw bytecode, it prints one line per bytecode
instruction. It also recursively disassembles nested code objects (the code
of comprehensions, generator expressions and nested functions, and the code
method, a function, a generator, an asynchronous generator, a couroutine,
a code object, a string of source code or a byte sequence of raw bytecode.
For a module, it disassembles all functions. For a class, it disassembles
all methods (including class and static methods). For a code object or
sequence of raw bytecode, it prints one line per bytecode instruction.
It also recursively disassembles nested code objects (the code of
comprehensions, generator expressions and nested functions, and the code
used for building nested classes).
Strings are first compiled to code objects with the :func:`compile`
built-in function before being disassembled. If no object is provided, this
Expand All @@ -164,6 +173,9 @@ operation is being performed, so the intermediate analysis object isn't useful:
.. versionchanged:: 3.7
Implemented recursive disassembling and added *depth* parameter.

.. versionchanged:: 3.7
This can now handle coroutine and asynchronous generator objects.


.. function:: distb(tb=None, *, file=None)

Expand Down
44 changes: 31 additions & 13 deletions Lib/dis.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,30 @@ def _try_compile(source, name):
return c

def dis(x=None, *, file=None, depth=None):
"""Disassemble classes, methods, functions, generators, or code.
"""Disassemble classes, methods, functions, and other compiled objects.

With no argument, disassemble the last traceback.
With no argument, disassemble the last traceback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align the docstring at the left side of """. Seems this is never specified explicitly (likely because nobody could think of other way of formatting), but all multiline docstrings are aligned at the left side of """.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change it, but this convention is inconsistent even within dis.py: compare e.g. _try_compile, Instruction and _get_name_info, which align to the right of """, with _disassemble, get_instructions, and get_instructions_bytes, which use the alignment you prefer. I was matching _try_compile since it's directly above.

git blame sheds no light on this that I can see, since inconsistent alignments were used even within the same patch.

Also, I just confirmed that after applying trim from PEP 257 (in 3.x you have to manually add a dummy maxint to sys), all three docstrings below are equivalent. (PEP 257 only mentions foo and bar).

def foo():
    """A multi-line
    docstring.
    """

def bar():
    """
    A multi-line
    docstring.
    """
    
def baz():
    """A multi-line
       docstring.
    """

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are sometimes inconsistent, but as @serhiy-storchaka notes, the preferred layouts are the ones that align with the opening triple-quote - the last option isn't shown in PEP 257 because we're not supposed to use it :)

The inconsistent ones just get left alone once they're in, since we don't typically do formatting-only edits - we're more likely to fix the indentation as part of a change that needed to update the affected docstring anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks! Makes sense.


Compiled objects currently include generator objects, async generator
objects, and coroutine objects, all of which store their code object
in a special attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this and thinking about how modules and classes are handled made me realise that there's now a discrepancy between the types that are accepted as arguments to dis, and those that will be disassembled when present as instance, class or module attributes.

Specifically, the filter for the __dict__ handling case is defined in terms of isinstance and the _have_code type tuple, while dis() itself works entirely by checking for particular attributes.

Resolving that is out of scope for this PR, so I've filed a new issue to cover it: https://bugs.python.org/issue31197

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and the refactoring you propose could probably also DRY up the repetition in dis and _get_code_object as a side bonus.

"""
if x is None:
distb(file=file)
return
if hasattr(x, '__func__'): # Method
# Extract functions from methods.
if hasattr(x, '__func__'):
x = x.__func__
if hasattr(x, '__code__'): # Function
# Extract compiled code objects from...
if hasattr(x, '__code__'): # ...a function, or
x = x.__code__
if hasattr(x, 'gi_code'): # Generator
elif hasattr(x, 'gi_code'): #...a generator object, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't think we should hold up this PR for it, this does make we wonder if we should just be exposing a __code__ attribute on all of these types.

Copy link
Contributor Author

@syncosmic syncosmic Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struck me as well, so I did a little digging on the history and context. It looks as though gi_code was added to generators in bpo-1473257. At this time, function bytecode was still stored in f.func_code, so gi_code was a clear analogy.

Then func_code was changed in 3.0 to __code__ as part of the f.func_X to f.__X__ renaming. The 3.0 whatsnew explained that the purpose of this change was to "free up [the f.func_X] names in the function attribute namespace for user-defined attributes". But this wasn't done for the analogous code object in generators. On a quick look, I didn't find any discussion of this at the time, but that doesn't mean there wasn't any.

It seems, then, that there are two questions now:

  1. Should the code objects in these three types also be dundered for namespace reasons?
  2. Should they share a common name with functions or is there a reason why (following bpo-1473257) they should have different names (whether or not dundered)?

1 seems easier to answer, although that kind of breaking change was probably more in the spirit of 3.0 than 3.7. I have a couple of dim thoughts on 2 but I'll save them for possible later discussion elsewhere.

This also raises the question of whether at least some of the other [gi|ag|cr]_X attributes should be dundered and/or homogenized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least some of the others are deliberately different because they refer to slightly different things (e.g. cr_await vs gi_yieldfrom).

Anyway, I've added a note about that to the refactoring issue: https://bugs.python.org/issue31197#msg300291

x = x.gi_code
elif hasattr(x, 'ag_code'): #...an asynchronous generator object, or
x = x.ag_code
elif hasattr(x, 'cr_code'): #...a coroutine.
x = x.cr_code
# Perform the disassembly.
if hasattr(x, '__dict__'): # Class or module
items = sorted(x.__dict__.items())
for name, x1 in items:
Expand Down Expand Up @@ -107,16 +117,24 @@ def pretty_flags(flags):
return ", ".join(names)

def _get_code_object(x):
"""Helper to handle methods, functions, generators, strings and raw code objects"""
if hasattr(x, '__func__'): # Method
"""Helper to handle methods, compiled or raw code objects, and strings."""
# Extract functions from methods.
if hasattr(x, '__func__'):
x = x.__func__
if hasattr(x, '__code__'): # Function
# Extract compiled code objects from...
if hasattr(x, '__code__'): # ...a function, or
x = x.__code__
if hasattr(x, 'gi_code'): # Generator
elif hasattr(x, 'gi_code'): #...a generator object, or
x = x.gi_code
if isinstance(x, str): # Source code
elif hasattr(x, 'ag_code'): #...an asynchronous generator object, or
x = x.ag_code
elif hasattr(x, 'cr_code'): #...a coroutine.
x = x.cr_code
# Handle source code.
if isinstance(x, str):
x = _try_compile(x, "<disassembly>")
if hasattr(x, 'co_code'): # Code object
# By now, if we don't have a code object, we can't disassemble x.
if hasattr(x, 'co_code'):
return x
raise TypeError("don't know how to disassemble %s objects" %
type(x).__name__)
Expand Down Expand Up @@ -443,8 +461,8 @@ def findlinestarts(code):
class Bytecode:
"""The bytecode operations of a piece of code

Instantiate this with a function, method, string of code, or a code object
(as returned by compile()).
Instantiate this with a function, method, other compiled object, string of
code, or a code object (as returned by compile()).

Iterating over this yields the bytecode operations as Instruction instances.
"""
Expand Down
32 changes: 26 additions & 6 deletions Lib/test/test_dis.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,13 @@ def _fstring(a, b, c, d):
def _g(x):
yield x

async def _ag(x):
yield x

async def _co(x):
async for item in _ag(x):
pass

def _h(y):
def foo(x):
'''funcdoc'''
Expand Down Expand Up @@ -390,6 +397,7 @@ def foo(x):
_h.__code__.co_firstlineno + 3,
)


class DisTests(unittest.TestCase):

maxDiff = None
Expand Down Expand Up @@ -531,10 +539,22 @@ def test_disassemble_class_method(self):
self.do_disassembly_test(_C.cm, dis_c_class_method)

def test_disassemble_generator(self):
gen_func_disas = self.get_disassembly(_g) # Disassemble generator function
gen_disas = self.get_disassembly(_g(1)) # Disassemble generator itself
gen_func_disas = self.get_disassembly(_g) # Generator function
gen_disas = self.get_disassembly(_g(1)) # Generator iterator
self.assertEqual(gen_disas, gen_func_disas)

def test_disassemble_async_generator(self):
agen_func_disas = self.get_disassembly(_ag) # Async generator function
agen_disas = self.get_disassembly(_ag(1)) # Async generator iterator
self.assertEqual(agen_disas, agen_func_disas)

def test_disassemble_coroutine(self):
coro_func_disas = self.get_disassembly(_co) # Coroutine function
coro = _co(1) # Coroutine object
coro.close() # Avoid a RuntimeWarning (never awaited)
coro_disas = self.get_disassembly(coro)
self.assertEqual(coro_disas, coro_func_disas)

def test_disassemble_fstring(self):
self.do_disassembly_test(_fstring, dis_fstring)

Expand Down Expand Up @@ -1050,12 +1070,12 @@ def test_explicit_first_line(self):
self.assertEqual(list(actual), expected_opinfo_outer)

def test_source_line_in_disassembly(self):
# Use the line in the source code
actual = dis.Bytecode(simple).dis()[:3]
# Use the line in the source code (split extracts the line no)
actual = dis.Bytecode(simple).dis().split(" ")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only want the first entry, .partition(" ")[0] would be a better option here (it doesn't really matter all that much, given that disassembly lines are short, but we may as well avoid pointlessly splitting the rest of the line up into fields when we don't care about them).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just use [:3].

Or use [:4] and add a space to the expected value (this is more robust).

actual = dis.Bytecode(simple).dis()[:4]
expected = "{:>3} ".format(simple.__code__.co_firstlineno)

Copy link
Contributor Author

@syncosmic syncosmic Aug 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[:3] raises now (since this is the patch that puts the line over 999) and [:4] would raise if it's ever driven above 9999; that would be a complex test suite (!) but I'll go with .partition (good call on that vs. .split; thanks!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use [:len(expected)]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected hasn't been defined yet, so we'd have to change the order (raising the point that we would make an order-independent setup order-dependent). It also breaks consistency with the second comparison; closest w/o refactoring would to be [:len("350")], repeating the magic number a third time, just as [:3] encodes the length of the magic number.

If it were up to me I'd just pick .partition(" ")[0]; in addition to the control flow simplicity and consistency I think it's clearer (we're splitting the string up and picking part of it). @serhiy-storchaka , do you feel really strongly about this? @ncoghlan , do you prefer [:len(expected)] (w/ [:len('350')] ) as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem is more complicated code and repeated " ". It may takes few minutes for figuring out why the code is written in this way. But this may be subjective. If this code looks good for other reviewers I have no objections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my eye, it's clearer this way, since the focus is on what we're doing to the dis line and the verbs are all simple: strip whitespace from the left, split on whitespace and pick the first region.

The main clarity improvement would seem to be refactoring out an extract_line_num method, but per my commit comment that seems like overkill in test code, and I'm not sure what the best place to put it would be.

Others?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@syncosmic I think you can simplify the call by just stripping all surrounding whitespace rather than specifically leading spaces:

actual = dis.Bytecode(simple).dis().strip().partition(" ")[0]

The fact there's only one string literal in there then more clearly conveys "partition the text on the first internal space".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that is better. Seeing it written out w/your explanation I get @serhiy-storchaka 's point about not doubling the literal. I'll tweak that and fix the docstring now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I did this but in two lines each. It's clearer, takes the comment better, and avoids having to do an ugly line continuation in the second case.

expected = "{:>3}".format(simple.__code__.co_firstlineno)
self.assertEqual(actual, expected)
# Use an explicit first line number
actual = dis.Bytecode(simple, first_line=350).dis()[:3]
# Use an explicit first line number (split extracts the line no)
actual = dis.Bytecode(simple, first_line=350).dis().split(" ")[0]
self.assertEqual(actual, "350")

def test_info(self):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
`dis` now works with asynchronous generator and coroutine objects. Patch by
George Collins based on diagnosis by Luciano Ramalho.