Skip to content

gh-118518: Allow perf to work without frame pointers #112254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Doc/c-api/init_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1251,7 +1251,10 @@ PyConfig
for more information.

Set by :option:`-X perf <-X>` command line option and by the
:envvar:`PYTHONPERFSUPPORT` environment variable.
:envvar:`PYTHONPERFSUPPORT` environment variable for perf support
with stack pointers and :option:`-X perfjit <-X>` command line option
and by the :envvar:`PYTHONPERFJITSUPPORT` environment variable for perf
support with DWARF JIT information.

Default: ``-1``.

Expand Down
33 changes: 33 additions & 0 deletions Doc/howto/perf_profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,36 @@ You can check if your system has been compiled with this flag by running::
If you don't see any output it means that your interpreter has not been compiled with
frame pointers and therefore it may not be able to show Python functions in the output
of ``perf``.


How to work without frame pointers
----------------------------------

If you are working with a Python interpreter that has been compiled without frame pointers
you can still use the ``perf`` profiler but the overhead will be a bit higher because Python
needs to generate unwinding information for every Python function call on the fly. Additionally,
``perf`` will take more time to process the data because it will need to use the DWARF debugging
information to unwind the stack and this is a slow process.

To enable this mode, you can use the environment variable :envvar:`PYTHONPERFJITSUPPORT` or the
:option:`-X perfjit <-X>` option, which will enable the JIT mode for the ``perf`` profiler.

When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to
call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file.

$ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperfjit my_script.py
$ perf inject -i perf.data --jit
$ perf report -g -i perf.data

or using the environment variable::

$ PYTHONPERFJITSUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py
$ perf inject -i perf.data --jit
$ perf report -g -i perf.data

Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of
the process being profiled and save the information in the ``perf.data`` file. By default the size of
the stack dump is 8192 bytes but the user can change the size by passing the size after comma like
``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small
``perf`` will not be able to unwind the stack and the output will be incomplete.

24 changes: 24 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,15 @@ Miscellaneous options

.. versionadded:: 3.12

* ``-X perfjit`` enables support for the Linux ``perf`` profiler with DWARF
support. When this option is provided, the ``perf`` profiler will be able
to report Python calls using DWARF ifnormation. This option is only available on
some platforms and will do nothing if is not supported on the current
system. The default value is "off". See also :envvar:`PYTHONPERFJITSUPPORT`
and :ref:`perf_profiling`.

.. versionadded:: 3.13

* :samp:`-X cpu_count={n}` overrides :func:`os.cpu_count`,
:func:`os.process_cpu_count`, and :func:`multiprocessing.cpu_count`.
*n* must be greater than or equal to 1.
Expand Down Expand Up @@ -1127,6 +1136,21 @@ conflict.

.. versionadded:: 3.12

.. envvar:: PYTHONPERFJITSUPPORT

If this variable is set to a nonzero value, it enables support for
the Linux ``perf`` profiler so Python calls can be detected by it
using DWARF information.

If set to ``0``, disable Linux ``perf`` profiler support.

See also the :option:`-X perfjit <-X>` command-line option
and :ref:`perf_profiling`.

.. versionadded:: 3.13



.. envvar:: PYTHON_CPU_COUNT

If this variable is set to a positive integer, it overrides the return
Expand Down
5 changes: 5 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,11 @@ Other Language Changes
equivalent of the :option:`-X frozen_modules <-X>` command-line option.
(Contributed by Yilei Yang in :gh:`111374`.)

* Add :ref:`support for the perf profiler <perf_profiling>` working without
frame pointers through the new environment variable
:envvar:`PYTHONPERFJITSUPPORT` and command-line option :option:`-X perfjit
<-X>` (Contributed by Pablo Galindo in :gh:`118518`.)

* The new :envvar:`PYTHON_HISTORY` environment variable can be used to change
the location of a ``.python_history`` file.
(Contributed by Levi Sabah, Zackery Spytz and Hugo van Kemenade in
Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_ceval.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ extern int _PyIsPerfTrampolineActive(void);
extern PyStatus _PyPerfTrampoline_AfterFork_Child(void);
#ifdef PY_HAVE_PERF_TRAMPOLINE
extern _PyPerf_Callbacks _Py_perfmap_callbacks;
extern _PyPerf_Callbacks _Py_perfmap_jit_callbacks;
#endif

static inline PyObject*
Expand Down
2 changes: 2 additions & 0 deletions Include/internal/pycore_ceval_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ struct trampoline_api_st {
unsigned int code_size, PyCodeObject* code);
int (*free_state)(void* state);
void *state;
Py_ssize_t code_padding;
};
#endif

Expand All @@ -83,6 +84,7 @@ struct _ceval_runtime_state {
struct {
#ifdef PY_HAVE_PERF_TRAMPOLINE
perf_status_t status;
int perf_trampoline_type;
Py_ssize_t extra_code_index;
struct code_arena_st *code_arena;
struct trampoline_api_st trampoline_api;
Expand Down
146 changes: 114 additions & 32 deletions Lib/test/test_perf_profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import sysconfig
import os
import pathlib
import shutil
from test import support
from test.support.script_helper import (
make_script,
Expand Down Expand Up @@ -76,14 +77,27 @@ def baz():
perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map")
self.assertTrue(perf_file.exists())
perf_file_contents = perf_file.read_text()
perf_lines = perf_file_contents.splitlines();
expected_symbols = [f"py::foo:{script}", f"py::bar:{script}", f"py::baz:{script}"]
perf_lines = perf_file_contents.splitlines()
expected_symbols = [
f"py::foo:{script}",
f"py::bar:{script}",
f"py::baz:{script}",
]
for expected_symbol in expected_symbols:
perf_line = next((line for line in perf_lines if expected_symbol in line), None)
self.assertIsNotNone(perf_line, f"Could not find {expected_symbol} in perf file")
perf_line = next(
(line for line in perf_lines if expected_symbol in line), None
)
self.assertIsNotNone(
perf_line, f"Could not find {expected_symbol} in perf file"
)
perf_addr = perf_line.split(" ")[0]
self.assertFalse(perf_addr.startswith("0x"), "Address should not be prefixed with 0x")
self.assertTrue(set(perf_addr).issubset(string.hexdigits), "Address should contain only hex characters")
self.assertFalse(
perf_addr.startswith("0x"), "Address should not be prefixed with 0x"
)
self.assertTrue(
set(perf_addr).issubset(string.hexdigits),
"Address should contain only hex characters",
)

def test_trampoline_works_with_forks(self):
code = """if 1:
Expand Down Expand Up @@ -212,7 +226,7 @@ def test_sys_api_get_status(self):
assert_python_ok("-c", code)


def is_unwinding_reliable():
def is_unwinding_reliable_with_frame_pointers():
cflags = sysconfig.get_config_var("PY_CORE_CFLAGS")
if not cflags:
return False
Expand Down Expand Up @@ -259,24 +273,49 @@ def perf_command_works():
return True


def run_perf(cwd, *args, **env_vars):
def run_perf(cwd, *args, use_jit=False, **env_vars):
if env_vars:
env = os.environ.copy()
env.update(env_vars)
else:
env = None
output_file = cwd + "/perf_output.perf"
base_cmd = ("perf", "record", "-g", "--call-graph=fp", "-o", output_file, "--")
if not use_jit:
base_cmd = ("perf", "record", "-g", "--call-graph=fp", "-o", output_file, "--")
else:
base_cmd = (
"perf",
"record",
"-g",
"--call-graph=dwarf,65528",
"-F99",
"-k1",
"-o",
output_file,
"--",
)
proc = subprocess.run(
base_cmd + args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env=env,
)
if proc.returncode:
print(proc.stderr)
print(proc.stderr, file=sys.stderr)
raise ValueError(f"Perf failed with return code {proc.returncode}")

if use_jit:
jit_output_file = cwd + "/jit_output.dump"
command = ("perf", "inject", "-j", "-i", output_file, "-o", jit_output_file)
proc = subprocess.run(
command, stderr=subprocess.PIPE, stdout=subprocess.PIPE, env=env
)
if proc.returncode:
print(proc.stderr)
raise ValueError(f"Perf failed with return code {proc.returncode}")
# Copy the jit_output_file to the output_file
os.rename(jit_output_file, output_file)

base_cmd = ("perf", "script")
proc = subprocess.run(
("perf", "script", "-i", output_file),
Expand All @@ -290,20 +329,9 @@ def run_perf(cwd, *args, **env_vars):
)


@unittest.skipUnless(perf_command_works(), "perf command doesn't work")
@unittest.skipUnless(is_unwinding_reliable(), "Unwinding is unreliable")
class TestPerfProfiler(unittest.TestCase):
def setUp(self):
super().setUp()
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map"))

def tearDown(self) -> None:
super().tearDown()
files_to_delete = (
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files
)
for file in files_to_delete:
file.unlink()
class TestPerfProfilerMixin:
def run_perf(self, script_dir, perf_mode, script):
raise NotImplementedError()

def test_python_calls_appear_in_the_stack_if_perf_activated(self):
with temp_dir() as script_dir:
Expand All @@ -322,14 +350,14 @@ def baz(n):
baz(10000000)
"""
script = make_script(script_dir, "perftest", code)
stdout, stderr = run_perf(script_dir, sys.executable, "-Xperf", script)
stdout, stderr = self.run_perf(script_dir, script)
self.assertEqual(stderr, "")

self.assertIn(f"py::foo:{script}", stdout)
self.assertIn(f"py::bar:{script}", stdout)
self.assertIn(f"py::baz:{script}", stdout)

def test_python_calls_do_not_appear_in_the_stack_if_perf_activated(self):
def test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated(self):
with temp_dir() as script_dir:
code = """if 1:
def foo(n):
Expand All @@ -346,13 +374,38 @@ def baz(n):
baz(10000000)
"""
script = make_script(script_dir, "perftest", code)
stdout, stderr = run_perf(script_dir, sys.executable, script)
stdout, stderr = self.run_perf(
script_dir, script, activate_trampoline=False
)
self.assertEqual(stderr, "")

self.assertNotIn(f"py::foo:{script}", stdout)
self.assertNotIn(f"py::bar:{script}", stdout)
self.assertNotIn(f"py::baz:{script}", stdout)

@unittest.skipUnless(perf_command_works(), "perf command doesn't work")
@unittest.skipUnless(
is_unwinding_reliable_with_frame_pointers(),
"Unwinding is unreliable with frame pointers",
)
class TestPerfProfiler(unittest.TestCase, TestPerfProfilerMixin):
def run_perf(self, script_dir, script, activate_trampoline=True):
if activate_trampoline:
return run_perf(script_dir, sys.executable, "-Xperf", script)
return run_perf(script_dir, sys.executable, script)

def setUp(self):
super().setUp()
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map"))

def tearDown(self) -> None:
super().tearDown()
files_to_delete = (
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files
)
for file in files_to_delete:
file.unlink()

def test_pre_fork_compile(self):
code = """if 1:
import sys
Expand All @@ -370,7 +423,7 @@ def bar_fork():
foo_fork()

def foo():
pass
import time; time.sleep(1)

def bar():
foo()
Expand Down Expand Up @@ -423,12 +476,41 @@ def compile_trampolines_for_all_functions():
# identical in both the parent and child perf-map files.
perf_file_lines = perf_file_contents.split("\n")
for line in perf_file_lines:
if (
f"py::foo_fork:{script}" in line
or f"py::bar_fork:{script}" in line
):
if f"py::foo_fork:{script}" in line or f"py::bar_fork:{script}" in line:
self.assertIn(line, child_perf_file_contents)

def _is_kernel_version_at_least(major, minor):
try:
with open("/proc/version") as f:
version = f.readline().split()[2]
except FileNotFoundError:
return False
version = version.split(".")
return int(version[0]) > major or (int(version[0]) == major and int(version[1]) >= minor)

@unittest.skipUnless(perf_command_works(), "perf command doesn't work")
@unittest.skipUnless(_is_kernel_version_at_least(6, 6), "perf command may not work due to a perf bug")
class TestPerfProfilerWithDwarf(unittest.TestCase, TestPerfProfilerMixin):
def run_perf(self, script_dir, script, activate_trampoline=True):
if activate_trampoline:
return run_perf(
script_dir, sys.executable, "-Xperfjit", script, use_jit=True
)
return run_perf(script_dir, sys.executable, script, use_jit=True)

def setUp(self):
super().setUp()
self.perf_files = set(pathlib.Path("/tmp/").glob("jit*.dump"))
self.perf_files |= set(pathlib.Path("/tmp/").glob("jitted-*.so"))

def tearDown(self) -> None:
super().tearDown()
files_to_delete = set(pathlib.Path("/tmp/").glob("jit*.dump"))
files_to_delete |= set(pathlib.Path("/tmp/").glob("jitted-*.so"))
files_to_delete = files_to_delete - self.perf_files
for file in files_to_delete:
file.unlink()


if __name__ == "__main__":
unittest.main()
1 change: 1 addition & 0 deletions Makefile.pre.in
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,7 @@ PYTHON_OBJS= \
Python/fileutils.o \
Python/suggestions.o \
Python/perf_trampoline.o \
Python/perf_jit_trampoline.o \
Python/$(DYNLOADFILE) \
$(LIBOBJS) \
$(MACHDEP_OBJS) \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Allow the Linux perf support to work without frame pointers using perf's
advanced JIT support. The feature is activated when using the
``PYTHONPERFJITSUPPORT`` environment variable or when running Python with
``-Xperfjit``. Patch by Pablo Galindo
1 change: 1 addition & 0 deletions PCbuild/_freeze_module.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@
<ClCompile Include="..\Python\parking_lot.c" />
<ClCompile Include="..\Python\pathconfig.c" />
<ClCompile Include="..\Python\perf_trampoline.c" />
<ClCompile Include="..\Python\perf_jit_trampoline.c" />
<ClCompile Include="..\Python\preconfig.c" />
<ClCompile Include="..\Python\pyarena.c" />
<ClCompile Include="..\Python\pyctype.c" />
Expand Down
3 changes: 3 additions & 0 deletions PCbuild/_freeze_module.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@
<ClCompile Include="..\Python\perf_trampoline.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="..\Python\perf_jit_trampoline.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="..\Python\compile.c">
<Filter>Source Files</Filter>
</ClCompile>
Expand Down
Loading