Skip to content

Commit f49dd54

Browse files
gh-96143: Add some comments and minor fixes missed in the original PR (#96433)
* gh-96132: Add some comments and minor fixes missed in the original PR * Update Doc/using/cmdline.rst Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Kumar Aditya <[email protected]>
1 parent 45fd368 commit f49dd54

File tree

4 files changed

+17
-1
lines changed

4 files changed

+17
-1
lines changed

Doc/howto/perf_profiling.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,9 @@ active since the start of the Python interpreter, you can use the `-Xperf` optio
155155

156156
$ python -Xperf my_script.py
157157

158+
You can also set the :envvar:`PYTHONPERFSUPPORT` to a nonzero value to actiavate perf
159+
profiling mode globally.
160+
158161
There is also support for dynamically activating and deactivating the perf
159162
profiling mode by using the APIs in the :mod:`sys` module:
160163

Doc/using/cmdline.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,8 @@ Miscellaneous options
582582
.. versionadded:: 3.11
583583
The ``-X frozen_modules`` option.
584584

585+
.. versionadded:: 3.12
586+
The ``-X perf`` option.
585587

586588

587589
Options you shouldn't use

Lib/test/test_perf_profiler.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def baz():
5858
script = make_script(script_dir, "perftest", code)
5959
with subprocess.Popen(
6060
[sys.executable, "-Xperf", script],
61-
universal_newlines=True,
61+
text=True,
6262
stderr=subprocess.PIPE,
6363
stdout=subprocess.PIPE,
6464
) as process:

Objects/perf_trampoline.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,12 +284,23 @@ new_code_arena(void)
284284
void *start = &_Py_trampoline_func_start;
285285
void *end = &_Py_trampoline_func_end;
286286
size_t code_size = end - start;
287+
// TODO: Check the effect of alignment of the code chunks. Initial investigation
288+
// showed that this has no effect on performance in x86-64 or aarch64 and the current
289+
// version has the advantage that the unwinder in GDB can unwind across JIT-ed code.
290+
//
291+
// We should check the values in the future and see if there is a
292+
// measurable performance improvement by rounding trampolines up to 32-bit
293+
// or 64-bit alignment.
287294

288295
size_t n_copies = mem_size / code_size;
289296
for (size_t i = 0; i < n_copies; i++) {
290297
memcpy(memory + i * code_size, start, code_size * sizeof(char));
291298
}
292299
// Some systems may prevent us from creating executable code on the fly.
300+
// TODO: Call icache invalidation intrinsics if available:
301+
// __builtin___clear_cache/__clear_cache (depending if clang/gcc). This is
302+
// technically not necessary but we could be missing something so better be
303+
// safe.
293304
int res = mprotect(memory, mem_size, PROT_READ | PROT_EXEC);
294305
if (res == -1) {
295306
PyErr_SetFromErrno(PyExc_OSError);

0 commit comments

Comments
 (0)