Skip to content

[clang] Clang unit test refactoring appears to be creating/exposing testing gaps and/or test counting madness and/or test invocation confusion #140799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tahonermann opened this issue May 20, 2025 · 11 comments
Labels
clang Clang issues not falling into any other category test-suite

Comments

@tahonermann
Copy link
Contributor

Grab a fresh cup of coffee, this might take a while. You might want to set aside a cup of something stronger for when you reach the end.

This all started with me investigating a downstream performance issue concerning the ASTMatchersTests unit test taking 30 minutes to run on Windows (serially, with -j1) compared to 45 seconds on Linux (serially). That is why this report focuses on the ASTMatchers tests. I assume the concerns presented here are applicable to other tests that have been folded into the new AllClangUnitTests executable, but I haven't verified that. For the record, the performance issue hasn't been found, but upstream appears to be unaffected.

Recent changes made via commit 5ffd9bd (which was merged April 1st, reverted later that day via commit 03a791f, then reapplied on April 2nd via commit e3c0565) and commit db2315a (which was merged April 29th) have merged most Clang unit tests into a single AllClangUnitTests executable with the intent to reduce the number of executables produced by the build. Ok, good, mostly, though I lost a lot of time debugging a stale ASTMatchersTests executable left in my work space before I discovered these changes. Related PRs include #133545 and #134196.

I'm now finding it challenging to convince myself that all of the tests that were previously being run via the ASTMatchersTests executable are still being run via the AllClangUnitTests executable.

My Linux and Windows build environments aren't configured the same. This is at least partially why the test counts shown below don't correlate well between them. So, ignore those; the interesting data concerns the test count comparisons for the same platform. For reference, my builds are configured as follows:

  • Linux:
    • Built with GCC 11.2.1.
    • Built with the following commands:
      • cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=On -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_RUNTIMES="compiler-rt;libunwind;libcxx;libcxxabi;openmp" -DLLVM_BUILD_TESTS=ON -DLLVM_LIT_ARGS='-sv -o lit-results.json'
      • make -j 80 && make install
      • make -j 80 && make check-clang
  • Windows:
    • Built within a MSVS 2022 x64 developer command prompt.
    • Built with the following commands:
      • cmake -G "Visual Studio 16 2019" -A x64 -Thost=x64 -DLLVM_ENABLE_ASSERTIONS=On -DLLVM_ENABLE_PROJECTS=clang -DLLVM_BUILD_TESTS=ON -DLLVM_LIT_ARGS='-sv -o lit-results.json'
      • cmake --build . --config Release --target install
      • cmake --build . --config Release --target check-clang

Running the Clang unit tests via a command like the following results in lit reporting that all tests passed. In each case, the current working directory is the build directory.

  • Linux:
    $ ./bin/llvm-lit -vv -j 1 tools/clang/test/Unit
    -- Testing: 372 tests, 96 workers --
    PASS: Clang-Unit :: ./AllClangUnitTests/36/96 (1 of 372)
    ...
    PASS: Clang-Unit :: ./AllClangUnitTests/94/96 (97 of 372)
    PASS: Clang-Unit :: Format/./FormatTests/75/151 (98 of 372)
    ...
    PASS: Clang-Unit :: Basic/./BasicTests/22/95 (372 of 372)
    
    Testing Time: 94.48s
        
    Total Discovered Tests: 25808
      Skipped:     4 (0.02%)
      Passed : 25804 (99.98%)
    
  • Windows:
    $ python ./Release/bin/llvm-lit.py -vv -j 1 tools/clang/test/Unit
    -- Testing: 156 tests, 24 workers --
    PASS: Clang-Unit :: Release/AllClangUnitTests.exe/45/48 (1 of 156)
    ...
    PASS: Clang-Unit :: Release/AllClangUnitTests.exe/24/48 (48 of 156)
    PASS: Clang-Unit :: Format/Release/FormatTests.exe/21/38 (49 of 156)
    ...
    PASS: Clang-Unit :: Basic/Release/BasicTests.exe/7/43 (156 of 156)
    
    Testing Time: 215.54s
    
    Total Discovered Tests: 25706
      Skipped:     4 (0.02%)
      Passed : 25702 (99.98%)
    

Personally, I find it rather confusing that lit says it is running 372 (156) tests and then reports that 25804 (25702) passed. I acknowledge the use of "Discovered" as a presumed disambiguator that these tests are not those tests, but it still seems odd to me. But I digress.

None of the lit output indicates which of those "PASS" lines, if any, corresponds to the tests formerly run by ASTMatchersTests. Fortunately, my build configuration passes -sv -o lit-results.json to lit and I can search.

  • Linux:
    $ grep ASTMatchersTests ./tools/clang/test/lit-results.json | wc -l
    12558
    
  • Windows:
    $ grep ASTMatchersTests ./tools/clang/test/lit-results.json | wc -l
    12558
    

Ok, so lots of different numbers above, but the numbers obtained from the search for "ASTMatchersTests" from the lit log match! They are also pretty close to numbers I had noted when running the stale ASTMatchersTests executables (12914 discovered tests in 140 tests for Linux, 12911 discovered tests in 65 tests for Windows; these numbers are from a downstream build with other changes and might include additional tests).

Issue 1: Lit says all is good but AllClangUnitTests aborts with a failed assertion.

Per the detail above, lit claims that all tests passed. However, if I run AllClangUnitTests by itself, the results are ... different; the program aborts with a failed assertion (on both Linux and Windows).

  • Linux:
    $ ./tools/clang/unittests/AllClangUnitTests
    [==========] Running 24477 tests from 539 test suites.
    ...
    [----------] 5 tests from ASTMatchersTest
    [ RUN      ] ASTMatchersTest.NamesMember_CXXDependentScopeMemberExpr
    [       OK ] ASTMatchersTest.NamesMember_CXXDependentScopeMemberExpr (10 ms)
    [ RUN      ] ASTMatchersTest.ArgumentCountIs_CXXUnresolvedConstructExpr
    [       OK ] ASTMatchersTest.ArgumentCountIs_CXXUnresolvedConstructExpr (2 ms)
    [ RUN      ] ASTMatchersTest.HasArgument_CXXUnresolvedConstructExpr
    [       OK ] ASTMatchersTest.HasArgument_CXXUnresolvedConstructExpr (2 ms)
    [ RUN      ] ASTMatchersTest.DecompositionDecl
    [       OK ] ASTMatchersTest.DecompositionDecl (6 ms)
    [ RUN      ] ASTMatchersTest.Finder_DynamicOnlyAcceptsSomeMatchers
    [       OK ] ASTMatchersTest.Finder_DynamicOnlyAcceptsSomeMatchers (0 ms)
    [----------] 5 tests from ASTMatchersTest (21 ms total)
    ...
    [ RUN      ] CodeGenTest.CodeGenFromIRMemBuffer
    warning: overriding the module target triple with i386-unknown-linux-gnu
    AllClangUnitTests: /.../llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp:48: void llvm::CodeGenTargetMachineImpl::initAsmInfo(): Assertion `MRI && "Unable to create reg info"' failed.
    
  • Windows:
    $ ./tools/clang/unittests/Release/AllClangUnitTests.exe
    [==========] Running 24393 tests from 520 test suites.
    ...
    [----------] 5 tests from ASTMatchersTest
    [ RUN      ] ASTMatchersTest.NamesMember_CXXDependentScopeMemberExpr
    [       OK ] ASTMatchersTest.NamesMember_CXXDependentScopeMemberExpr (22 ms)
    [ RUN      ] ASTMatchersTest.ArgumentCountIs_CXXUnresolvedConstructExpr
    [       OK ] ASTMatchersTest.ArgumentCountIs_CXXUnresolvedConstructExpr (4 ms)
    [ RUN      ] ASTMatchersTest.HasArgument_CXXUnresolvedConstructExpr
    [       OK ] ASTMatchersTest.HasArgument_CXXUnresolvedConstructExpr (5 ms)
    [ RUN      ] ASTMatchersTest.DecompositionDecl
    [       OK ] ASTMatchersTest.DecompositionDecl (13 ms)
    [ RUN      ] ASTMatchersTest.Finder_DynamicOnlyAcceptsSomeMatchers
    [       OK ] ASTMatchersTest.Finder_DynamicOnlyAcceptsSomeMatchers (0 ms)
    [----------] 5 tests from ASTMatchersTest (46 ms total)
    ...
    [ RUN      ] CodeGenTest.CodeGenFromIRMemBuffer
    warning: overriding the module target triple with i386-unknown-linux-gnu
    Assertion failed: MRI && "Unable to create reg info", file D:\iusers\thonerma\llvm-project\llvm\lib\CodeGen\CodeGenTargetMachineImpl.cpp, line 48
    Exception Code: 0x80000003
    

There appear to be at least three issues here:

  1. The assertion failure.
  2. That the assertion failure doesn't cause lit to report a failure.
  3. The output suggests that only 5 tests from ASTMatchersTests were run. Is that because the program aborted before completing? Or is this yet another way in which "tests" are differently counted?

Issue 2: How do I run just the ASTMatchersTests now?

If there is a way to use lit to run just the ASTMatchersTests portions of the Clang-Unit set of tests, I'd love to know how to do that. Previously, a command like the following sufficed to run all the relevant tests via lit:

$ python bin/llvm-lit.py -vv tools/clang/test/Unit/ASTMatchers 

I am aware that the Google Test framework results in an executable that supports lot of options for controlling how the tests are run. For example, I can do this:

$ ./tools/clang/unittests/AllClangUnitTests --gtest_filter='ASTMatchersTests*'
...
[==========] 12558 tests from 1 test suite ran. (30913 ms total)
[  PASSED  ] 12558 tests.

Hey, look! That number matches what I got from the lit log! Yay!

But, as seen above, running tests via lit and via the standalone unit test executable doesn't seem to produce the same behavior. It is therefore important to be able to do both.

Issue 3: How it the typical developer expected to investigate a failed lit test?

Let's say that lit did fail one of the tests (as it seems it should have above). The lit output is not as helpful as it could be:

********************
FAIL: Clang-Unit :: ./AllClangUnitTests/16/96 (1288 of 21587)
******************** TEST 'Clang-Unit :: ./AllClangUnitTests/16/96' FAILED ********************
...
Failed Tests (26):
  Clang-Unit :: ./AllClangUnitTests/ASTMatchersTests/ASTMatchersTest/IsExpandedFromMacro_MatchesObjectMacro/C11
  ...

The astute developer will eventually find that the key they need to actually investigate/debug the problem is hidden in the output

Script:
--
/.../tools/clang/unittests/./AllClangUnitTests --gtest_filter=ASTMatchersTests/ASTMatchersTest.IsExpandedFromMacro_MatchesObjectMacro/CXX11
--

Room for improvement:

  • Better terminology to distinguish testsuites, tests, grouped tests, discovered/unit tests, etc...
  • What do those "16/96" numbers above mean? Are they useful? Can they be replaced with something more useful?
  • Better guidance to help a developer:
    • Correlate a failed test message

      FAIL: Clang-Unit :: ./AllClangUnitTests/16/96 (1288 of 21587)

      to the failed test:

      Clang-Unit :: ./AllClangUnitTests/ASTMatchersTests/ASTMatchersTest/IsExpandedFromMacro_MatchesObjectMacro/C11

      to the command needed to run just that test:

      /.../tools/clang/unittests/./AllClangUnitTests --gtest_filter=ASTMatchersTests/ASTMatchersTest.IsExpandedFromMacro_MatchesObjectMacro/CXX11

      to the command needed to run just that test via lit:

      <I wish I knew>

Summary

I know the above is discussing issues that predate the recent refactoring, but I think these issues have been made worse by it. I spent a lot of time trying to wrap my head around what all was going on in my environments over the last week.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label May 20, 2025
@jroelofs
Copy link
Contributor

jroelofs commented May 20, 2025

What do those "16/96" numbers above mean? Are they useful? Can they be replaced with something more useful?

These are "shards" of gtest tests. This one means that shard 16 out of 96 for the AllClangUnitTests binary had a failure in it:

FAIL: Clang-Unit :: ./AllClangUnitTests/16/96 (1288 of 21587)

You can disable sharding via --no-gtest-sharding. Sharding helps reduce overhead of test setup/teardown, so we probably don't want to do that by default.

@AaronBallman
Copy link
Collaborator

FWIW, I ran into this as well; I was trying to build ASTMatchersTests.exe so that I could do some local debugging and I couldn't find the executable any longer. I don't think lit should be the only way in which to run the unit tests, having the native executable is still quite useful. Was it intended for that functionality to be removed?

@AaronBallman
Copy link
Collaborator

CC @rnk for awareness

@jroelofs
Copy link
Contributor

I don't think lit should be the only way in which to run the unit tests, having the native executable is still quite useful

You can still run the AllClangUnitTests binary with a --gtest_filter= argument to filter down to the subset you care about. LIT is not required for that.

@AaronBallman
Copy link
Collaborator

Thanks for the suggestion, I can give that approach a shot, but I don't think that's actually an improvement over the previous experience, either. I think it's reasonable for us to try to combine unit tests for things like running them on bots, but what I'm not keen on is removing them as targets I can build myself. And the current state is confused: I have a target for BasicTests.exe and FormatTests.exe which are Clang unit tests, but I don't have targets for other things like ASTMatchersTests.exe

@tahonermann
Copy link
Contributor Author

And the current state is confused: I have a target for BasicTests.exe and FormatTests.exe which are Clang unit tests, but I don't have targets for other things like ASTMatchersTests.exe

For reference, that appears to be intentional according to commit afd738c.

@rnk
Copy link
Collaborator

rnk commented May 21, 2025

Ok, good, mostly, though I lost a lot of time debugging a stale ASTMatchersTests executable left in my work space before I discovered these changes.

Sorry about this. I hit this issue while testing my own changes, and I thought about adding some CMake logic to proactively clear our and remove the stale targets, but it seemed extremely complicated, and I talked myself out of it because these days our buildbots mostly all use clean builds and ccache instead of incremental builds.

And the current state is confused: I have a target for BasicTests.exe and FormatTests.exe which are Clang unit tests, but I don't have targets for other things like ASTMatchersTests.exe

The current targets were chosen based on build size concerns, which maybe if we make the dylib build the default, make less sense. ASTMatchersTests are presumably extremely heavy weight, so splitting them back out would be costly. However, the cost changes if we go through with making the dylib build the default, so we could reconsider if and when that change lands.

Format and Basic are small because they have few dependencies, so they are standalone. In the context of LLVM, it's helpful to have a small SupportTests binary for iterating on system library stuff, and I assume that carries over to clang Basic, and maybe Format.


Getting to the meat of the matter, the issue here is that gtest, lit, and test sharding only really work well in practice if your tests don't leak global state updates between test fixtures. LLVM, unfortunately, has a fair amount of global state, especially around target registration. This state leads to failures that only reproduce if you run the exact sequence of tests that lit runs when it runs a single gtest shard. I have found that these failures do reproduce when you re-run the exact gtest command that lit prints out, but that's not typically the first thing folks do to repro failures.

It's worth mentioning that we previously ran each gtest case in an isolated process, until we switched to sharding as an optimization in 2022: https://reviews.llvm.org/D122251 / a87ba5c

In conclusion, I'm not really sure what good next steps would be at this point. Possible options are:

  • Better docs
  • Improve lit output to make shard re-execution easier
  • Disable sharding
  • Reduce LLVM global state

@AaronBallman
Copy link
Collaborator

Getting to the meat of the matter, the issue here is that gtest, lit, and test sharding only really work well in practice if your tests don't leak global state updates between test fixtures. LLVM, unfortunately, has a fair amount of global state, especially around target registration. This state leads to failures that only reproduce if you run the exact sequence of tests that lit runs when it runs a single gtest shard. I have found that these failures do reproduce when you re-run the exact gtest command that lit prints out, but that's not typically the first thing folks do to repro failures.

Ooooof. Yeah, that's a problem worth putting up with a bit of pain for. Why don't the LLVM unit tests have the same problems? Those all still seem to be built into separate executables, so are they not doing sharding?

Is it possible that we at least get the cmake targets back (but maybe not built by default?) so folks can build and debug against one set of tests rather than having to build everything and know about command line flags? Or is that just adding cmake complexity for not enough benefit?

@rnk
Copy link
Collaborator

rnk commented May 22, 2025

Why don't the LLVM unit tests have the same problems? Those all still seem to be built into separate executables, so are they not doing sharding?

Yes, I just haven't gotten around to looking at LLVM yet. The Clang unit tests are bigger, since they have more dependencies.

Is it possible that we at least get the cmake targets back (but maybe not built by default?) so folks can build and debug against one set of tests rather than having to build everything and know about command line flags? Or is that just adding cmake complexity for not enough benefit?

My sense is that it's probably too much complexity. Consider that lit is going to auto-detect any extra gtest binaries you create, and then you'll run the gtest suite twice, effectively.

If you want to switch back, either locally or permanently because, well, maybe we just want a dedicated ASTMatcherTests binary, it's just a matter of running s/add_clang_unittest/add_distinct_clang_unittest/ on the cmake. But, consider that it will leave another stale binary in your build directory.

Maybe that's the area we could improve: CMake knows where the gtest targets are, and it could be directly passing that information to lit through the generated lit config file, and then we don't have to worry as much about stale binaries.

@AaronBallman
Copy link
Collaborator

If you want to switch back, either locally or permanently because, well, maybe we just want a dedicated ASTMatcherTests binary, it's just a matter of running s/add_clang_unittest/add_distinct_clang_unittest/ on the cmake. But, consider that it will leave another stale binary in your build directory.

Yeah, but who wants to do cmake hacking when trying to debug an issue with an AST matcher?

CMake knows where the gtest targets are, and it could be directly passing that information to lit through the generated lit config file, and then we don't have to worry as much about stale binaries.

But doesn't CMake also know which gtest targets are being built? e.g., if we configure to build individual unit tests, can cmake also generate something for lit to know "don't run the extra test binaries"?

@rnk
Copy link
Collaborator

rnk commented May 27, 2025

But doesn't CMake also know which gtest targets are being built? e.g., if we configure to build individual unit tests, can cmake also generate something for lit to know "don't run the extra test binaries"?

Yep, that's what I'm saying too: cmake should tell lit which gtest binaries it created, and lit should only run those test binaries, and not scan for others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category test-suite
Projects
None yet
Development

No branches or pull requests

6 participants