devlib.collector: Add PerfettoCollector #618

mrkajetanp · 2023-02-06T18:02:50Z

Add a Collector for accessing Google's Perfetto tracing infrastructure. The Collector takes a path to an on-device config file, starts tracing in the background using the perfetto binary and then stops by killing the tracing process.

Signed-off-by: Kajetan Puchalski [email protected]

devlib/collector/perfetto.py

douglas-raillard-arm

Some drive-by comments

devlib/collector/perfetto.py

mrkajetanp · 2023-02-14T15:57:54Z

I updated the PR with Douglas' suggestions.
I also had to bring back the manual pull timeout bit. I noticed that without it I'd end up getting a pull timeout exception when trying to pull the trace from the device. No idea why but having the manual timeout fixes the problem.

douglas-raillard-arm · 2023-02-14T17:34:38Z

I noticed that without it I'd end up getting a pull timeout exception when trying to pull the trace from the device.

That's strange, from a cursory look I could not find any non-None default timeout on the pull() path, except in the transfer manager but it's set to 3600

mrkajetanp · 2023-02-15T13:59:25Z

Now that I'm thinking about it, are we sure switching to the self.target.background method is the better way? Perfetto's 'intended' way to trace in the background is to pass the --background argument which just returns the PID and then later to kill that PID whenever you're done. If we're using self.target.background we need to use Perfetto's non-background variant designed to be run in a terminal and cancelled with Ctrl-C. It seems to work either way so I'm not going to make a stand here but I'm just wondering if we might not end up interfering with how the tool is meant to operate on account of making the code more 'pythonic' or whatever.

douglas-raillard-arm · 2023-02-15T16:41:43Z

I would definitely use background command. The previous version of the code was straightfoward and yet already had issues:

if the process dies, we would still attempt to kill an old PID even though it might point at an entirely different task. Android can exhaust PIDs pretty quickly.
it allowed starting perfetto twice, which would result in failures down the line
less likely, but if perfetto hanged at SIGINT/SIGTERM, devlib would freeze. With BackgroundCommand.cancel(), SIGTERM is sent and then SIGKILL after a timeout.
if devlib disconnects, perfetto will be left running. Background commands are terminated properly when the target is destroyed or disconnected.

The change to the background command API fixed 1. and made 2. obvious. On top of that, we can now capture stdout/stderr if required for debug, instead of having to go hunt for some log file somewhere. Process control is a tricky business and it's definitely the sort of stuff we don't want to duplicate. Ctrl-C in a terminal is simply SIGINT, which will likely be handled like SIGTERM so I expect perfetto to work fine with that.

douglas-raillard-arm · 2023-02-15T16:44:53Z

devlib/collector/perfetto.py

+        cmd = "cat {} | {} --txt -c - -o {}".format(
+            quote(self.config), quote(self.target_binary), quote(self.target_output_file)
+        )
+        self.start_time = time.time()


We need to figure out why the pull times out as it's not really expected to happen instead of this hack. Also time.time() is not monotonic so end - start < 0 is possible

It's not even that it times out, more that there's an issue if we don't specify an explicit timeout?

> /home/kajpuc01/tools/lisa/external/devlib/devlib/collector/perfetto.py(69)get_data() -> self.target.pull(self.target_output_file, self.output_path) (Pdb) s --Call-- > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(202)__get__() -> def __get__(self, *args, **kwargs): (Pdb) n > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(203)__get__() -> return self.__class__( (Pdb) > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(204)__get__() -> asyn=self.asyn.__get__(*args, **kwargs), (Pdb) > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(205)__get__() -> blocking=self.blocking.__get__(*args, **kwargs), (Pdb) > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(203)__get__() -> return self.__class__( (Pdb) --Return-- > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(203)__get__()-><devlib.utils...x7f326301bc10> -> return self.__class__( (Pdb) --Call-- > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(208)__call__() -> def __call__(self, *args, **kwargs): (Pdb) > /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(209)__call__() -> return self.blocking(*args, **kwargs) (Pdb) WARNING Cancelling file transfer ['/sdcard/devlib-target/trace.perfetto-trace'] -> perfetto_test/wk1-speedometer-1/trace.perfetto-trace due to 'transfer inactive' WARNING Cancelling file transfer ['/sdcard/devlib-target/trace.perfetto-trace'] -> perfetto_test/wk1-speedometer-1/trace.perfetto-trace due to 'exception during transfer' TimeoutError: adb -s 18251FDF60083A pull /sdcard/devlib-target/trace.perfetto-trace perfetto_test/wk1-speedometer-1/trace.perfetto-trace

I think the issue might be here:

def _push_pull(self, action, sources, dest, timeout): # [...] if timeout or not self.poll_transfers: adb_command(self.device, command, timeout=timeout, adb_server=self.adb_server) else: with self.transfer_mgr.manage(sources, dest, action): bg_cmd = adb_command_background(self.device, command, adb_server=self.adb_server) self.transfer_mgr.set_transfer_and_wait(bg_cmd)

Depending on whether a timeout was specified we either use adb_command or adb_command_background. The former seems to work fine, the latter not so much in this case.

Ok so that definitely needs fixing. Can you get a backtrace of that TimeoutError and debug logging ?

There also seems to be a leak of the bg_cmd in the transfer manager (it only gets destroyed when the next transfer happens), and a Thread that should be created with daemon=True but is not so a code robustness review seems to be in order.

EDIT: I also found broken uses of None:

self.transfer_mgr = SSHTransferManager(self, **transfer_opts) if poll_transfers else None

and then:

with _handle_paramiko_exceptions(), self.transfer_mgr.manage(sources, dest, action, scp):

so there are definitely a bunch of smelly/brittle stuff around that transfer manager implementation that ought to be fixed ASAP if that is enabled by default

mrkajetanp · 2023-02-28T12:57:01Z

I dropped the explicit timeout argument but that requires #619 so it should be either merged after the other one or at the same time

devlib/collector/perfetto.py

mrkajetanp · 2023-05-12T13:16:57Z

Since the transfer manager PR has been merged we should now be able to go ahead and merge this as well?

marcbonnici

Apologies for the delay yes we can proceed with this PR. I just wanted to spend some time understand this a bit and get it running on my end as I had found some issues during testing but it turns out they were unrelated to this PR. (Opened a PR to address these #628)

However I have a couple of questions for you.

IIUC running the perfetto binary requires a running service on the device to connect to, and if this is not running then currently we fail silently. On a linux target (my test device) this is provided by traced binary or both can packaged as part of the tracebox binary.

I'm wondering is you know an easy way to detect whether this service is running, or even adding some config to enable running the service as a background command as part of the collector (I'm currently testing with manually running with ./tracebox traced before starting a workload.

Secondly do you know if we would be able to include a compiled perfetto binary (or tracebox) similar to what we do for trace-cmd etc to make it easier for users to get started?

devlib/collector/perfetto.py

mrkajetanp · 2023-05-23T16:29:18Z

First of all, these are not issues for Android - on Android Perfetto always comes included and is always running so it'll just be fine. For Linux targets there's a bit more steps, true.

I'm wondering is you know an easy way to detect whether this service is running

I'm not sure if there's any official or proper way to check, we could always grep the ps output or something like that if we really wanted to?

or even adding some config to enable running the service as a background command

We could just make the tool use tracebox if it can't find the running service on Linux?

Secondly do you know if we would be able to include a compiled perfetto binary (or tracebox) similar to what we do for trace-cmd etc to make it easier for users to get started?

As mentioned above, it comes bundled with Android so on Android there's no point. On Linux if the tracebox binary already works for you then I see no issues with bundling it.

https://perfetto.dev/docs/quickstart/linux-tracing

marcbonnici

I see, thanks for the information. I understand that the requirements for Android and Linux are different, one option would be to check for the OS of the target and either auto deploy the binary and start it as a background task for Linux or assume it is present for Android?

At the minimum I think we should outline what needs to be done in the docsctring for the collector. Additionally it might be worth trying to ensure that if something fails, that we can provide some error message to indicate this (my initial testing was failing silently with an empty output file so it wasn't immediately obvious what steps needed to be performed).

devlib/collector/perfetto.py

mrkajetanp · 2023-07-04T16:35:06Z

Sorry for the delay, here are the updates. I added a check to switch Perfetto on for Android 9 & 10, as well as a check to install tracebox on Linux targets if the service is missing. The tracebox binary was bundled along with the copied Perfetto license.

I also added a "is_running" function in Target, as far as I can tell there was no already existing equivalent and it seems like a useful thing to have so I made it a helper inside Target instead of implementing it just inside the collector.

devlib/target.py

mrkajetanp · 2023-07-05T13:56:28Z

Updated with using the fstring and a more proper implementation of is_running that takes a regexp so that we can match the name more accurately and avoid a false positive. An added upside is that awk will just make execute return an empty string instead of an exception in the absent case so we can avoid the exception handler.

devlib/target.py

mrkajetanp · 2023-07-06T13:29:39Z

I updated with just directly matching the comm to not overcomplicate things. Busybox ps solves the potential spaces problem already as far as I can tell.
busybox ps with no arguments:

PID   USER     TIME  COMMAND
    1 0         1h22 /system/bin/init second_stage
    2 0         0:03 [kthreadd]
    3 0         0:00 [rcu_gp]

busybox ps -o comm,stat

COMMAND          STAT
init             S
kthreadd         SW
rcu_gp           IW<

None of the tasks that have spaces in the default output have them in the -o variant, it abbreviates the task names to a fixed width column and gets rid of whitespace.

douglas-raillard-arm · 2023-07-06T15:10:14Z

None of the tasks that have spaces in the default output have them in the -o variant, it abbreviates the task names to a fixed width column and gets rid of whitespace.

That's good to know, have you tried the args instead of comm ? That would contain the full command line though, which is a bit different.

mrkajetanp · 2023-07-06T19:08:15Z

have you tried the args instead of comm

comm just gives us exactly what we want, args appears slightly broken in the busybox implementation. Namely args,stat only shows the args part while stat,args shows both as expected. Still not really relevant here, the use case for this function is to check whether a background deamon is present or not and just matching comm does that more than adequately imo.

douglas-raillard-arm · 2023-07-18T14:29:13Z

comm just gives us exactly what we want

This is not what we want:

None of the tasks that have spaces in the default output have them in the -o variant, it abbreviates the task names to a fixed width column and gets rid of whitespace.

Fortunately I found that it's actually not true, the busybox binary shipped in devlib does report comm with spaces properly, e.g.

/devlib/devlib/bin/x86_64/busybox ps -A -o stat,comm | grep foo
S    foo bar.sh

for a script called "foo bar.sh"

I think the confusion comes from the full command line (args in ps parlance) and the task name (comm). comm is not a simplification of args, it's 2 separate strings:

comm is the same as in the kernel and is set using prctl(PR_SET_NAME, ...) syscall (task_rename ftrace event)
cmdline (args for busybox ps) is taken from the auxiliary vector of the process, i.e. the concatenated content of argv parameter of main() in C.

devlib/target.py

devlib/collector/perfetto.py

marcbonnici · 2023-07-21T22:46:43Z

devlib/collector/perfetto.py

+            # Android 9 and 10 require traced to be enabled manually
+            if os_version == '9' or os_version == '10':
+                target.execute('setprop persist.traced.enable 1')
+        elif target.os in ['linux', 'android'] and not target.is_running('traced'):


When launching traced via tracebox sadly the traced argument does not appear in the ps output causing this check to fail. Perhaps we could check for traced or tracebox?

Hmm that's intentional though. We only check for traced so that we can use tracebox if it's missing. If traced is present this means that we don't need to use tracebox and should call perfetto instead.
Calling tracebox with traced running in the background would be an odd halfway setup I think. From their docs:

Due to Perfetto's service-based architecture, in order to capture a trace, the traced (session daemon) and traced_probes (probes and ftrace-interop daemon) need to be running. As per Perfetto v16, the tracebox binary bundles together all the binaries you need in a single executable (a bit like toybox or busybox).

mrkajetanp · 2023-08-22T12:59:48Z

I updated the PR so that for Android targets it'll write to Android's default
perfetto trace directory instead of the devlib one since Perfetto seems to be
having permission issues on non-rooted phones. This makes the collector work
without root access on Android phones, tested on Android 13 + Pixel 6. I also
renamed the output file to 'devlib-trace.perfetto-trace' just to avoid
potential clashes with perfetto traces collected in other ways.

devlib/target.py

devlib/collector/perfetto.py

Add the "is_running" function that can be used to check if a given process is running on the target device. It will return True if a process matching the name is found and Falsa otherwise. Signed-off-by: Kajetan Puchalski <[email protected]>

Add a Collector for accessing Google's Perfetto tracing infrastructure. The Collector takes a path to an on-device config file, starts tracing in the background using the perfetto binary and then stops by killing the tracing process. Signed-off-by: Kajetan Puchalski <[email protected]>

marcbonnici reviewed Feb 10, 2023

View reviewed changes

devlib/collector/perfetto.py Outdated Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch from d944a86 to 625dd8c Compare February 10, 2023 18:08

douglas-raillard-arm reviewed Feb 13, 2023

View reviewed changes

devlib/collector/perfetto.py Outdated Show resolved Hide resolved

devlib/collector/perfetto.py Outdated Show resolved Hide resolved

devlib/collector/perfetto.py Outdated Show resolved Hide resolved

devlib/collector/perfetto.py Outdated Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch from 625dd8c to a439783 Compare February 14, 2023 15:56

douglas-raillard-arm reviewed Feb 15, 2023

View reviewed changes

douglas-raillard-arm mentioned this pull request Feb 20, 2023

Cleanup transfer manager #619

Merged

mrkajetanp force-pushed the perfetto branch from a439783 to c9eb9b7 Compare February 28, 2023 12:55

marcbonnici approved these changes Mar 14, 2023

View reviewed changes

devlib/collector/perfetto.py Outdated Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch 2 times, most recently from 4324b7c to 1cb6d0a Compare April 18, 2023 15:22

marcbonnici mentioned this pull request May 16, 2023

Fix bg cmds #628

Merged

marcbonnici reviewed May 16, 2023

View reviewed changes

devlib/collector/perfetto.py Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch from 1cb6d0a to 7ed04e4 Compare May 23, 2023 16:21

marcbonnici reviewed May 30, 2023

View reviewed changes

devlib/collector/perfetto.py Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch 2 times, most recently from 972c5b8 to bc7522f Compare July 4, 2023 16:32

douglas-raillard-arm reviewed Jul 5, 2023

View reviewed changes

devlib/target.py Outdated Show resolved Hide resolved

devlib/target.py Outdated Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch from bc7522f to 820e2c8 Compare July 5, 2023 13:54

douglas-raillard-arm reviewed Jul 6, 2023

View reviewed changes

devlib/target.py Outdated Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch from 820e2c8 to 723af80 Compare July 6, 2023 13:17

mrkajetanp force-pushed the perfetto branch from 723af80 to d33ec37 Compare July 18, 2023 12:18

douglas-raillard-arm requested changes Jul 18, 2023

View reviewed changes

devlib/target.py Outdated Show resolved Hide resolved

devlib/target.py Outdated Show resolved Hide resolved

mrkajetanp force-pushed the perfetto branch 2 times, most recently from ff22f03 to 28e5945 Compare July 20, 2023 13:07

douglas-raillard-arm reviewed Jul 31, 2023

View reviewed changes

devlib/target.py Show resolved Hide resolved

marcbonnici reviewed Aug 9, 2023

View reviewed changes

mrkajetanp force-pushed the perfetto branch 2 times, most recently from ba5972b to 008811e Compare August 11, 2023 16:44

mrkajetanp force-pushed the perfetto branch from 008811e to 7c7ca37 Compare August 22, 2023 12:55

marcbonnici reviewed Aug 29, 2023

View reviewed changes

devlib/target.py Show resolved Hide resolved

devlib/target.py Outdated Show resolved Hide resolved

devlib/collector/perfetto.py Show resolved Hide resolved

target: Add is_running()

2155a0c

Add the "is_running" function that can be used to check if a given process is running on the target device. It will return True if a process matching the name is found and Falsa otherwise. Signed-off-by: Kajetan Puchalski <[email protected]>

mrkajetanp force-pushed the perfetto branch from 7c7ca37 to 4e6cfe9 Compare August 30, 2023 13:48

mrkajetanp force-pushed the perfetto branch from 4e6cfe9 to 3e4bc0b Compare August 31, 2023 13:43

marcbonnici approved these changes Sep 6, 2023

View reviewed changes

marcbonnici merged commit 9b15807 into ARM-software:master Sep 6, 2023

devlib.collector: Add PerfettoCollector #618

devlib.collector: Add PerfettoCollector #618

Uh oh!

Conversation

mrkajetanp commented Feb 6, 2023

Uh oh!

Uh oh!

douglas-raillard-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrkajetanp commented Feb 14, 2023

Uh oh!

douglas-raillard-arm commented Feb 14, 2023

Uh oh!

mrkajetanp commented Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

douglas-raillard-arm commented Feb 15, 2023

Uh oh!

douglas-raillard-arm Feb 15, 2023

Choose a reason for hiding this comment

Uh oh!

mrkajetanp Feb 16, 2023

Choose a reason for hiding this comment

Uh oh!

douglas-raillard-arm Feb 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrkajetanp commented Feb 28, 2023

Uh oh!

Uh oh!

mrkajetanp commented May 12, 2023

Uh oh!

marcbonnici left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mrkajetanp commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcbonnici left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mrkajetanp commented Jul 4, 2023

Uh oh!

Uh oh!

Uh oh!

mrkajetanp commented Jul 5, 2023

Uh oh!

Uh oh!

mrkajetanp commented Jul 6, 2023

Uh oh!

douglas-raillard-arm commented Jul 6, 2023

Uh oh!

mrkajetanp commented Jul 6, 2023

Uh oh!

douglas-raillard-arm commented Jul 18, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcbonnici Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

mrkajetanp Aug 10, 2023

Choose a reason for hiding this comment

Uh oh!

mrkajetanp commented Aug 22, 2023

Uh oh!

Uh oh!

mrkajetanp commented Feb 15, 2023 •

edited

Loading

douglas-raillard-arm Feb 20, 2023 •

edited

Loading

mrkajetanp commented May 23, 2023 •

edited

Loading