-
Notifications
You must be signed in to change notification settings - Fork 26
Reduce the overhead of tracing, profiling, and quickening checks for calls #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the overhead of tracing, profiling, and quickening checks for calls #8
Conversation
Pulling in @gvanrossum, who asked me to create this PR to review. |
f684de5
to
1419a87
Compare
1419a87
to
669edfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing why you're adding a second RETURN code.
669edfa
to
a22fab8
Compare
/* FIXME(lpereira): The stack pointer and f_state will be set to the same | ||
* value here, and when jumping to return_value_without_tracing above. This | ||
* is redundant, but saves us from copying some code. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I would just duplicate the code. You also redundantly have an extra variable retval
, which is only used in the TRACE_FUNCTION_EXIT()
macro. Maybe something needs to be refactored again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some refactoring and the variable that's only used when tracing is enabled is now gone. (It's now inside the macro, which now takes a parameter.)
I dislike both copying code and doing what has been done in this PR, to be honest... while I do have a slight preference to the way I've done things as it's quite a bit of code that don't need to be copied around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible option: split the opcode in two? So we have one opcode which does the tracing, and one which does the return.
The original code would be EXIT_FUNCTION + RETURN_VALUE
, and the quickening step would replace EXIT_FUNCTION
with NOP
(or, even better, RETURN_VALUE
).
I'm pretty sure that could work correctly here, but I might be missing some edge case where an eval break could leave us in a weird state where we trace a return that never happens (or something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, never mind. I don't know how much that would actually clean this up.
No change in generated code, other than this is a tiny bit easier to read in the rare cases you need to.
No need to pass an "is_async" parameter if we can look at the statement we're generating code for.
a22fab8
to
65eadae
Compare
This new opcode will check for tracing/profiling and check if the function needs to be quickened. This change mostly prepares the terrain by emitting a START_FUNCTION opcode at every function prologue, that currently does as much as a NOP.
Introduce a RETURN_VALUE_QUICK opcode that replaces RETURN_VALUE on quickened functions. The implementation for this new opcode is exactly the same as the actual RETURN_VALUE opcode, with the exception that it doesn't expand the {DTRACE,TRACE}_FUNCTION_EXIT() functions.
65eadae
to
6e1698e
Compare
(I can't seem to add anybody else to review this, but @markshannon asked to be tagged here.) |
I tweaked some permissions. Can you do this now? |
Yes, just added Mark as a reviewer. Thanks. |
Any more progress on this? |
I have been looking at other things the last day or so, so no. (I don't know if you have had the time to take a look at the PR.) |
What do you want me to look at? |
I think this is going to be difficult without breaking open |
See python#30035 |
This PR is stale because it has been open for 30 days with no activity. |
Obsolete |
This is my first attempt at implementing this idea. This modifies the compiler to add a new
START_FUNCTION
instruction right before visiting all the function body nodes, and then modifies the interpreter to move the tracing/profiling/quickening checks right after thestart_frame
label to the execution of this opcode. When quickening, this instruction is transformed into aNOP
instruction, effectively removing. these checks.Code compiles, runs, and I can see the
START_FUNCTION
opcode added on functions withdis
, but I don't know yet how well this is working; especially the quickening step. Still need a bit more time to learn how to inspect these things.I know for a fact that there are some things that I didn't consider yet, and will continue to look at this -- I'm opening the draft PR here just to make it easier to review the code and know if I'm in the right direction, as I'm new to this code base. One of the things these changes are missing are uses of the
start_frame
by things other than functions; I don't understand enough of the machinery there yet to fix this but this is just the question of spending some time following the code.