Skip to content

Support filtering out dependency source code from coverage counters #90767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xd009642 opened this issue Nov 10, 2021 · 1 comment
Open

Support filtering out dependency source code from coverage counters #90767

xd009642 opened this issue Nov 10, 2021 · 1 comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@xd009642
Copy link
Contributor

xd009642 commented Nov 10, 2021

Related to #84943

Sometimes it’s useful to only instrument certain files or functions.

This is also true for code not in your crate but a dependency. Some dependencies may create a large amount of coverage counters, on a closed source project I tried it on syn and serde took up the majority of the counters. Most people testing a crate aren't looking to get coverage of their dependencies as well so it makes sense to not instrument crates that aren't in the project workspace. Plus it should lead to faster run time and faster parsing of coverage reports - though for most projects I'd expect this to be negligible

Can ignore below, I feel that even with half sized hash the chance of hash collisions on ascii strings from mangled function names is probably negligibly low 🤔

It's also worth noting that for function hashes llvm coverage uses a truncated MD5 hash (so 64 bits instead of 128 bits). These hashes are used to create a hashmap of records and if i recall correctly in the profdata implementation last I checked it doesn't check for hash collisions so the more records the greater the chance of coverage data being lost or erroneously linked to a source location due to hash collisions. I'm not sure of the relative risk of this (I'd have to sample a variety of crates to figure this out), but reducing the number of records would mitigate the potential risk here as well. EDIT: I'm actually very curious about the hash collision impact myself so will update if I find anything out 🤔

@xd009642
Copy link
Contributor Author

If anyone's interested this is the debug printout of the merged profraw files generated from https://github.com/xd009642/llvm-profparser I implemented this, implemented a subset of llvm-profdata show and test against the llvm test files for the util to ensure output is the same so I'm relatively confident of correctness - e.g. func names, hashes and counter values definitely match llvm implementation.

cargo.txt

From just searching the dependencies in cargo's Cargo.toml and counting the times their names show up we get:

  • cargo_util: 290 occurrences
  • anyhow: 182
  • filetime: 90
  • git2: 2
  • hex: 30
  • ignore: 6
  • jobserver: 100l
  • libc: 16
  • log: 258
  • shell_escape: 8
  • tar: 8
  • tempfile: 248
  • walkdir: 158

I've excluded all dependencies with 0 mentions and openssl which is mentioned a lot in the code so the 4112 references may come from records within the crate source. It is evident not all dependencies are featured in the counters. So I guess there's some facet of those crates that doesn't allow them to be automatically excluded. Maybe macros could play a part as log uses macros heavily and I'm guessing similar to C++ templates their compilation unit is where they are used

@jyn514 jyn514 added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) labels Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants