-
Notifications
You must be signed in to change notification settings - Fork 88
Collect specialization statistics from running benchmarks #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vstinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If running benchmarks on a CPython built with the
--enable-pystatsflag, pyperf will automatically collect statistics from about the bytecode specializer during the run of benchmark code.
I don't see any code to store statistics, so I don't get the purpose of this change. pyperf spawns many subprocesses, how are statistics collected? Do you compute the average? I don't get it.
pyperf/__init__.py
Outdated
| # Reset the stats collection if running a --enable-pystats build | ||
| try: | ||
| sys._stats_off() | ||
| sys._stats_clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't execute code on "import pyperf". It should be a deliberate action to clear such cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably get away with putting this in the Runner constructor -- it's possible, I suppose, to have more than one of those, but in practice that doesn't happen in the pyperformance benchmarks.
And to be clear, this doesn't affect anything on disk -- it is only the in-memory statistics collected so far. (This is basically to remove the effect of the statistics collected during pyperf's startup and book-keeping).
pyperf/_runner.py
Outdated
| if not self._check_worker_task(): | ||
| return None | ||
|
|
||
| func = stats_wrapper(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a Python built with sys._stats_on(), this wrapper adds a fixed overhead. If you want to execute code before/after the benchmark, it should be done before/after code measuring time.
See task_func() below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the benchmarking timings are useless anyway (due to the stats overhead), but it would have a (minor) impact on the statistics themselves. I'll see if I can refactor this, though. It probably requires a copy-and-paste of task_func to really minimize the overhead -- I'm not sure it's worth it, to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having tried this, I'm not sure how to remove the overhead without having duplicate versions of task_func, and also its async equivalent below, which could be hard to keep up-to-date down the road. I think it's better as-is, since the performance impact really doesn't matter in the stats-collecting case (and there is no performance impact in the existing non-stats-collecting case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are gathering stats, the timings are meaningless.
To maximize the accuracy of the stats, we want to turn them on and off as close to the function call as possible.
This looks about as good as we can get.
|
The statistics are collected in the specializing optimizer in the interpreter loop in CPython itself when built with |
vstinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are gathering stats, the timings are meaningless. To maximize the accuracy of the stats, we want to turn them on and off as close to the function call as possible. This looks about as good as we can get.
Hum, ok. I'm not against this change in this case, but I still have some concerns about the documentation of this change.
doc/run_benchmark.rst
Outdated
| Specializer statistics (`pystats`) | ||
| ================================== | ||
|
|
||
| If running benchmarks on a CPython built with the ``--enable-pystats`` flag, pyperf will automatically collect statistics from about the bytecode specializer during the run of benchmark code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike "will automatically collect statistics". pyperf usually puts everything in the JSON file, but here all it does is to call sys._stats_on() before and sys._stats_off() after. It doesn't clear existing statistics in /tmp/py_stats, it doesn't store the in the JSON file, it doesn't compute average or anything.
Would you mind to mention sys._stats_on() and sys._stats_off() functions instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try to add a link to https://docs.python.org/dev/using/configure.html#cmdoption-enable-pystats which is the current official documentation for pystats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the docs to hopefully reflect this better.
doc/run_benchmark.rst
Outdated
|
|
||
| If running benchmarks on a CPython built with the ``--enable-pystats`` flag, pyperf will automatically collect statistics from about the bytecode specializer during the run of benchmark code. | ||
|
|
||
| Due to the overhead of collecting the statistics, it is unlikely the timing results will be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, Mark's word is stronger: meaningless. What do you think of saying meaningless instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine.
|
Anything else you'd like to see on this, @vstinner? |
|
I'm converting this to a draft -- on full A/A testing, it turns out the results from this a quite non-deterministic. I think the noise is coming from the warmup / calibration stages, which don't always run the same number of times. I think by excluding these runs, we'll get more stable results, but checking that over here first. |
|
This is ready for review. With this change, we get consistent results running the pyperformance benchmark suite multiple times. See this prototype. |
| ================================== | ||
|
|
||
| ``pyperf`` has built-in support for `specializer statistics (``pystats``) <https://docs.python.org/dev/using/configure.html#cmdoption-enable-pystats>`_. | ||
| If running benchmarks on a CPython built with the ``--enable-pystats`` flag, pyperf will automatically collect ``pystats`` on the benchmark code by calling ``sys._stats_on`` immediately before the benchmark and calling ``sys._stats_off`` immediately after. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you go into details, you should also mention that they start by calling _stats_clear(). Moreover, you mention "only if we aren't warming up or calibrating".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added "Stats are not collected when running pyperf's own code or when warming up or calibrating the benchmarks." I don't think mentioning _stats_clear() is necessary -- the point of that is to not include pyperf's own code.
pyperf/_collect_metadata.py
Outdated
|
|
||
| # pystats enabled? | ||
| if hasattr(sys, "_stats_clear"): | ||
| metadata['python_pystats'] = 'enabled' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO "python_" prefix is redundant with "py" prefix:
| metadata['python_pystats'] = 'enabled' | |
| metadata['pystats'] = 'enabled' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
vstinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the multiple updates ;-)
This will collect statistics on CPython builds with
--enable-pyperf, and only during the run of the benchmarks themselves.Cc @markshannon