-
Notifications
You must be signed in to change notification settings - Fork 80
devlib.collector: Add PerfettoCollector #618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some drive-by comments
I updated the PR with Douglas' suggestions. |
That's strange, from a cursory look I could not find any non-None default timeout on the pull() path, except in the transfer manager but it's set to 3600 |
Now that I'm thinking about it, are we sure switching to the |
I would definitely use background command. The previous version of the code was straightfoward and yet already had issues:
The change to the background command API fixed 1. and made 2. obvious. On top of that, we can now capture stdout/stderr if required for debug, instead of having to go hunt for some log file somewhere. Process control is a tricky business and it's definitely the sort of stuff we don't want to duplicate. Ctrl-C in a terminal is simply SIGINT, which will likely be handled like SIGTERM so I expect perfetto to work fine with that. |
devlib/collector/perfetto.py
Outdated
cmd = "cat {} | {} --txt -c - -o {}".format( | ||
quote(self.config), quote(self.target_binary), quote(self.target_output_file) | ||
) | ||
self.start_time = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out why the pull times out as it's not really expected to happen instead of this hack. Also time.time() is not monotonic so end - start < 0
is possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not even that it times out, more that there's an issue if we don't specify an explicit timeout?
> /home/kajpuc01/tools/lisa/external/devlib/devlib/collector/perfetto.py(69)get_data()
-> self.target.pull(self.target_output_file, self.output_path)
(Pdb) s
--Call--
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(202)__get__()
-> def __get__(self, *args, **kwargs):
(Pdb) n
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(203)__get__()
-> return self.__class__(
(Pdb)
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(204)__get__()
-> asyn=self.asyn.__get__(*args, **kwargs),
(Pdb)
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(205)__get__()
-> blocking=self.blocking.__get__(*args, **kwargs),
(Pdb)
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(203)__get__()
-> return self.__class__(
(Pdb)
--Return--
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(203)__get__()-><devlib.utils...x7f326301bc10>
-> return self.__class__(
(Pdb)
--Call--
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(208)__call__()
-> def __call__(self, *args, **kwargs):
(Pdb)
> /home/kajpuc01/tools/lisa/external/devlib/devlib/utils/asyn.py(209)__call__()
-> return self.blocking(*args, **kwargs)
(Pdb)
WARNING Cancelling file transfer ['/sdcard/devlib-target/trace.perfetto-trace'] -> perfetto_test/wk1-speedometer-1/trace.perfetto-trace due to 'transfer inactive'
WARNING Cancelling file transfer ['/sdcard/devlib-target/trace.perfetto-trace'] -> perfetto_test/wk1-speedometer-1/trace.perfetto-trace due to 'exception during transfer'
TimeoutError: adb -s 18251FDF60083A pull /sdcard/devlib-target/trace.perfetto-trace perfetto_test/wk1-speedometer-1/trace.perfetto-trace
I think the issue might be here:
def _push_pull(self, action, sources, dest, timeout):
# [...]
if timeout or not self.poll_transfers:
adb_command(self.device, command, timeout=timeout, adb_server=self.adb_server)
else:
with self.transfer_mgr.manage(sources, dest, action):
bg_cmd = adb_command_background(self.device, command, adb_server=self.adb_server)
self.transfer_mgr.set_transfer_and_wait(bg_cmd)
Depending on whether a timeout was specified we either use adb_command
or adb_command_background
. The former seems to work fine, the latter not so much in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so that definitely needs fixing. Can you get a backtrace of that TimeoutError and debug logging ?
There also seems to be a leak of the bg_cmd in the transfer manager (it only gets destroyed when the next transfer happens), and a Thread that should be created with daemon=True but is not so a code robustness review seems to be in order.
EDIT: I also found broken uses of None
:
self.transfer_mgr = SSHTransferManager(self, **transfer_opts) if poll_transfers else None
and then:
with _handle_paramiko_exceptions(), self.transfer_mgr.manage(sources, dest, action, scp):
so there are definitely a bunch of smelly/brittle stuff around that transfer manager implementation that ought to be fixed ASAP if that is enabled by default
I dropped the explicit timeout argument but that requires #619 so it should be either merged after the other one or at the same time |
4324b7c
to
1cb6d0a
Compare
Since the transfer manager PR has been merged we should now be able to go ahead and merge this as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay yes we can proceed with this PR. I just wanted to spend some time understand this a bit and get it running on my end as I had found some issues during testing but it turns out they were unrelated to this PR. (Opened a PR to address these #628)
However I have a couple of questions for you.
IIUC running the perfetto binary requires a running service on the device to connect to, and if this is not running then currently we fail silently. On a linux target (my test device) this is provided by traced
binary or both can packaged as part of the tracebox
binary.
I'm wondering is you know an easy way to detect whether this service is running, or even adding some config to enable running the service as a background command as part of the collector (I'm currently testing with manually running with ./tracebox traced
before starting a workload.
Secondly do you know if we would be able to include a compiled perfetto binary (or tracebox) similar to what we do for trace-cmd etc to make it easier for users to get started?
First of all, these are not issues for Android - on Android Perfetto always comes included and is always running so it'll just be fine. For Linux targets there's a bit more steps, true.
I'm not sure if there's any official or proper way to check, we could always grep the
We could just make the tool use
As mentioned above, it comes bundled with Android so on Android there's no point. On Linux if the tracebox binary already works for you then I see no issues with bundling it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the information. I understand that the requirements for Android and Linux are different, one option would be to check for the OS of the target and either auto deploy the binary and start it as a background task for Linux or assume it is present for Android?
At the minimum I think we should outline what needs to be done in the docsctring for the collector. Additionally it might be worth trying to ensure that if something fails, that we can provide some error message to indicate this (my initial testing was failing silently with an empty output file so it wasn't immediately obvious what steps needed to be performed).
972c5b8
to
bc7522f
Compare
Sorry for the delay, here are the updates. I added a check to switch Perfetto on for Android 9 & 10, as well as a check to install tracebox on Linux targets if the service is missing. The tracebox binary was bundled along with the copied Perfetto license. I also added a "is_running" function in Target, as far as I can tell there was no already existing equivalent and it seems like a useful thing to have so I made it a helper inside Target instead of implementing it just inside the collector. |
Updated with using the fstring and a more proper implementation of is_running that takes a regexp so that we can match the name more accurately and avoid a false positive. An added upside is that awk will just make execute return an empty string instead of an exception in the absent case so we can avoid the exception handler. |
I updated with just directly matching the comm to not overcomplicate things. Busybox ps solves the potential spaces problem already as far as I can tell.
None of the tasks that have spaces in the default output have them in the |
That's good to know, have you tried the |
comm just gives us exactly what we want, args appears slightly broken in the busybox implementation. Namely |
This is not what we want:
Fortunately I found that it's actually not true, the busybox binary shipped in devlib does report comm with spaces properly, e.g.
for a script called "foo bar.sh" I think the confusion comes from the full command line (
|
ff22f03
to
28e5945
Compare
devlib/collector/perfetto.py
Outdated
# Android 9 and 10 require traced to be enabled manually | ||
if os_version == '9' or os_version == '10': | ||
target.execute('setprop persist.traced.enable 1') | ||
elif target.os in ['linux', 'android'] and not target.is_running('traced'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When launching traced via tracebox sadly the traced
argument does not appear in the ps output causing this check to fail. Perhaps we could check for traced
or tracebox
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's intentional though. We only check for traced
so that we can use tracebox
if it's missing. If traced
is present this means that we don't need to use tracebox
and should call perfetto
instead.
Calling tracebox
with traced
running in the background would be an odd halfway setup I think. From their docs:
Due to Perfetto's service-based architecture, in order to capture a trace, the traced (session daemon) and traced_probes (probes and ftrace-interop daemon) need to be running. As per Perfetto v16, the tracebox binary bundles together all the binaries you need in a single executable (a bit like toybox or busybox).
ba5972b
to
008811e
Compare
I updated the PR so that for Android targets it'll write to Android's default |
Add the "is_running" function that can be used to check if a given process is running on the target device. It will return True if a process matching the name is found and Falsa otherwise. Signed-off-by: Kajetan Puchalski <[email protected]>
Add a Collector for accessing Google's Perfetto tracing infrastructure. The Collector takes a path to an on-device config file, starts tracing in the background using the perfetto binary and then stops by killing the tracing process. Signed-off-by: Kajetan Puchalski <[email protected]>
Add a Collector for accessing Google's Perfetto tracing infrastructure. The Collector takes a path to an on-device config file, starts tracing in the background using the perfetto binary and then stops by killing the tracing process.
Signed-off-by: Kajetan Puchalski [email protected]