-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[clang] Clang unit test refactoring appears to be creating/exposing testing gaps and/or test counting madness and/or test invocation confusion #140799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
These are "shards" of gtest tests. This one means that shard
You can disable sharding via |
FWIW, I ran into this as well; I was trying to build |
CC @rnk for awareness |
You can still run the |
Thanks for the suggestion, I can give that approach a shot, but I don't think that's actually an improvement over the previous experience, either. I think it's reasonable for us to try to combine unit tests for things like running them on bots, but what I'm not keen on is removing them as targets I can build myself. And the current state is confused: I have a target for |
For reference, that appears to be intentional according to commit afd738c. |
Sorry about this. I hit this issue while testing my own changes, and I thought about adding some CMake logic to proactively clear our and remove the stale targets, but it seemed extremely complicated, and I talked myself out of it because these days our buildbots mostly all use clean builds and ccache instead of incremental builds.
The current targets were chosen based on build size concerns, which maybe if we make the dylib build the default, make less sense. ASTMatchersTests are presumably extremely heavy weight, so splitting them back out would be costly. However, the cost changes if we go through with making the dylib build the default, so we could reconsider if and when that change lands. Format and Basic are small because they have few dependencies, so they are standalone. In the context of LLVM, it's helpful to have a small Getting to the meat of the matter, the issue here is that gtest, lit, and test sharding only really work well in practice if your tests don't leak global state updates between test fixtures. LLVM, unfortunately, has a fair amount of global state, especially around target registration. This state leads to failures that only reproduce if you run the exact sequence of tests that lit runs when it runs a single gtest shard. I have found that these failures do reproduce when you re-run the exact gtest command that lit prints out, but that's not typically the first thing folks do to repro failures. It's worth mentioning that we previously ran each gtest case in an isolated process, until we switched to sharding as an optimization in 2022: https://reviews.llvm.org/D122251 / a87ba5c In conclusion, I'm not really sure what good next steps would be at this point. Possible options are:
|
Ooooof. Yeah, that's a problem worth putting up with a bit of pain for. Why don't the LLVM unit tests have the same problems? Those all still seem to be built into separate executables, so are they not doing sharding? Is it possible that we at least get the cmake targets back (but maybe not built by default?) so folks can build and debug against one set of tests rather than having to build everything and know about command line flags? Or is that just adding cmake complexity for not enough benefit? |
Yes, I just haven't gotten around to looking at LLVM yet. The Clang unit tests are bigger, since they have more dependencies.
My sense is that it's probably too much complexity. Consider that lit is going to auto-detect any extra gtest binaries you create, and then you'll run the gtest suite twice, effectively. If you want to switch back, either locally or permanently because, well, maybe we just want a dedicated ASTMatcherTests binary, it's just a matter of running Maybe that's the area we could improve: CMake knows where the gtest targets are, and it could be directly passing that information to lit through the generated lit config file, and then we don't have to worry as much about stale binaries. |
Yeah, but who wants to do cmake hacking when trying to debug an issue with an AST matcher?
But doesn't CMake also know which gtest targets are being built? e.g., if we configure to build individual unit tests, can cmake also generate something for lit to know "don't run the extra test binaries"? |
Yep, that's what I'm saying too: cmake should tell lit which gtest binaries it created, and lit should only run those test binaries, and not scan for others. |
Grab a fresh cup of coffee, this might take a while. You might want to set aside a cup of something stronger for when you reach the end.
This all started with me investigating a downstream performance issue concerning the
ASTMatchersTests
unit test taking 30 minutes to run on Windows (serially, with-j1
) compared to 45 seconds on Linux (serially). That is why this report focuses on theASTMatchers
tests. I assume the concerns presented here are applicable to other tests that have been folded into the newAllClangUnitTests
executable, but I haven't verified that. For the record, the performance issue hasn't been found, but upstream appears to be unaffected.Recent changes made via commit 5ffd9bd (which was merged April 1st, reverted later that day via commit 03a791f, then reapplied on April 2nd via commit e3c0565) and commit db2315a (which was merged April 29th) have merged most Clang unit tests into a single
AllClangUnitTests
executable with the intent to reduce the number of executables produced by the build. Ok, good, mostly, though I lost a lot of time debugging a staleASTMatchersTests
executable left in my work space before I discovered these changes. Related PRs include #133545 and #134196.I'm now finding it challenging to convince myself that all of the tests that were previously being run via the
ASTMatchersTests
executable are still being run via theAllClangUnitTests
executable.My Linux and Windows build environments aren't configured the same. This is at least partially why the test counts shown below don't correlate well between them. So, ignore those; the interesting data concerns the test count comparisons for the same platform. For reference, my builds are configured as follows:
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=On -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_RUNTIMES="compiler-rt;libunwind;libcxx;libcxxabi;openmp" -DLLVM_BUILD_TESTS=ON -DLLVM_LIT_ARGS='-sv -o lit-results.json'
make -j 80 && make install
make -j 80 && make check-clang
cmake -G "Visual Studio 16 2019" -A x64 -Thost=x64 -DLLVM_ENABLE_ASSERTIONS=On -DLLVM_ENABLE_PROJECTS=clang -DLLVM_BUILD_TESTS=ON -DLLVM_LIT_ARGS='-sv -o lit-results.json'
cmake --build . --config Release --target install
cmake --build . --config Release --target check-clang
Running the Clang unit tests via a command like the following results in
lit
reporting that all tests passed. In each case, the current working directory is the build directory.Personally, I find it rather confusing that
lit
says it is running 372 (156) tests and then reports that 25804 (25702) passed. I acknowledge the use of "Discovered" as a presumed disambiguator that these tests are not those tests, but it still seems odd to me. But I digress.None of the
lit
output indicates which of those "PASS" lines, if any, corresponds to the tests formerly run byASTMatchersTests
. Fortunately, my build configuration passes-sv -o lit-results.json
tolit
and I can search.Ok, so lots of different numbers above, but the numbers obtained from the search for "ASTMatchersTests" from the
lit
log match! They are also pretty close to numbers I had noted when running the staleASTMatchersTests
executables (12914 discovered tests in 140 tests for Linux, 12911 discovered tests in 65 tests for Windows; these numbers are from a downstream build with other changes and might include additional tests).Issue 1: Lit says all is good but
AllClangUnitTests
aborts with a failed assertion.Per the detail above,
lit
claims that all tests passed. However, if I runAllClangUnitTests
by itself, the results are ... different; the program aborts with a failed assertion (on both Linux and Windows).There appear to be at least three issues here:
lit
to report a failure.ASTMatchersTests
were run. Is that because the program aborted before completing? Or is this yet another way in which "tests" are differently counted?Issue 2: How do I run just the
ASTMatchersTests
now?If there is a way to use
lit
to run just theASTMatchersTests
portions of theClang-Unit
set of tests, I'd love to know how to do that. Previously, a command like the following sufficed to run all the relevant tests vialit
:I am aware that the Google Test framework results in an executable that supports lot of options for controlling how the tests are run. For example, I can do this:
Hey, look! That number matches what I got from the
lit
log! Yay!But, as seen above, running tests via
lit
and via the standalone unit test executable doesn't seem to produce the same behavior. It is therefore important to be able to do both.Issue 3: How it the typical developer expected to investigate a failed lit test?
Let's say that
lit
did fail one of the tests (as it seems it should have above). Thelit
output is not as helpful as it could be:The astute developer will eventually find that the key they need to actually investigate/debug the problem is hidden in the output
Room for improvement:
FAIL: Clang-Unit :: ./AllClangUnitTests/16/96 (1288 of 21587)
to the failed test:
Clang-Unit :: ./AllClangUnitTests/ASTMatchersTests/ASTMatchersTest/IsExpandedFromMacro_MatchesObjectMacro/C11
to the command needed to run just that test:
/.../tools/clang/unittests/./AllClangUnitTests --gtest_filter=ASTMatchersTests/ASTMatchersTest.IsExpandedFromMacro_MatchesObjectMacro/CXX11
to the command needed to run just that test via
lit
:<I wish I knew>
Summary
I know the above is discussing issues that predate the recent refactoring, but I think these issues have been made worse by it. I spent a lot of time trying to wrap my head around what all was going on in my environments over the last week.
The text was updated successfully, but these errors were encountered: