Define opcodes in terms of smaller "micro-ops"

We've talked about this concept on-and-off for some time now, so I thought I would create this issue to collect some concrete ideas and start trying to reach a consensus on what the best path to take here is. This is a big, high-risk project, so we should probably start discussing it sooner rather than later.

Currently, our instructions are pretty much defined as "whatever the corresponding blob of C code in `ceval.c` does". While there are several places (`mark_stacks`, `stack_effect`, `dis` and its docs, the various `haswhatever` lists in `opcode.py`) that describe aspects of their behavior, they are scattered all over the place and must be kept in-sync (by hand) with the actual implementation. In other words, they are a burden just as much as they are a useful resource.

It's also not always very clear (for example):

- How many stack items a given opcode expects.
- If it clobbers, duplicates, or swaps any of them.
- If it can push a `NULL` to the stack.
- If it can handle a `NULL` on the stack.
- If it checks for tracing.
- If it's a superinstruction.
- If it can pop a frame.
- If it can push a frame.
- If it uses its oparg.
- If it can raise.
- If it can check for eval breakers.
- If it jumps forwards, backwards, or neither.
- If it branches.

And potentially, in the future:

- If it uses or leaves unboxed values on the stack.
- If it uses or leaves borrowed references on the stack.

...and, of course, much more.

One way to improve this situation is by ditching the free-form C code and defining our opcodes in terms of structured, statically-analyzable micro-ops (or "uops"). These uops represent lowered, composable units of work, such as incrementing the instruction pointer, decrementing a value's reference count, writing values to the stack, etc. Ideally, they'd represent a sort of IR that bridges the gap between Python and native code.

Last week I put together a [simple proof-of-concept](https://github.com/python/cpython/compare/main...brandtbucher:cpython:uops-new) that converts much of the common logic in our instructions to a couple dozen initial uops. There is much more to be done, of course, but it does a pretty good job of communicating what this could look like, and many opcodes have been mostly or entirely converted. Note that GitHub collapses the huge `ceval.c` diff by default, but most of the interesting stuff is there.

Other random thoughts to spark discussion:

- While [our 3.12 workflow graph](https://github.com/faster-cpython/ideas/wiki/Workflow-for-3.12-cycle) has an edge leading from "Move opcode definitions to their own file and generate PyEval_EvalDefault" to "Breakup instruction definitions into 'mini-ops'", I believe that we may benefit from inverting that dependency. Defining opcodes in terms of uops gives us much more power and flexibility when generating things like eval loops, since:
  - A sequence of uops is much easier to define and maintain declaratively than free-form C code.
  - Sequences of uops are easier (and more efficient) to automatically combine into things like superinstructions.
  - We can use them to also generate things like `mark_stacks` and `stack_effect`, or eval loops with debugging features added or removed.
- Uops will likely help if/when we start recording and compiling traces, since they are much simpler, explicit operations that are more granular than normal opcode boundaries. They also make things like guard-elimination and refcount-elimination a bit more ergonomic.
- It's not totally clear whether core devs will like using uops. They might lower the barrier to entry for people who don't know C, but uops can also be very verbose and make some common patterns a bit awkward, at least in their current form.
- We've also had issues in the past getting MSVC to inline stuff in the eval loop, so we might need to work around that somehow.
- Since uops make it is trivial to determine what stack items and temporaries are used by a given instruction, experimenting with architectural changes like a register VM becomes a lot more straightforward.
- In general, there are lots of cool things we could do with some basic tooling to analyze or modify sequences of uops. As a simple example, we could assert that specialized forms of instructions preserve some basic properties of their deoptimized forms.
- Finally, I wouldn't be surprised if something like this made life easier for alternate Python implementations. In addition to more strongly specifying the semantics of individual bytecode instructions, maintainers would only need to implement the much simpler uops and have a way of composing them. New/changed uops are much easier to update, and adding or changing opcode implementations is only a matter of using the new uop sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define opcodes in terms of smaller "micro-ops" #454

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Define opcodes in terms of smaller "micro-ops" #454

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions