Skip to content

Builders flang-aarch64-dylib flaky build #124485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shafik opened this issue Jan 26, 2025 · 9 comments
Closed

Builders flang-aarch64-dylib flaky build #124485

shafik opened this issue Jan 26, 2025 · 9 comments

Comments

@shafik
Copy link
Collaborator

shafik commented Jan 26, 2025

This failure happened to me here:

https://lab.llvm.org/buildbot/#/builders/50/builds/9430

FAILED: tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_INCLUDE_TESTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR/../Dialect/Test -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -MF tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o.d -o tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp:9:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.

and from looking at the builds from the past few days it is happening on and off every day.

@shafik shafik added the infrastructure Bugs about LLVM infrastructure label Jan 26, 2025
@shafik
Copy link
Collaborator Author

shafik commented Jan 26, 2025

Apologies, not sure where to target this, please feel free to adjust tags as needed.

@shafik shafik changed the title Buildersflang-aarch64-dylib flaky build Builders flang-aarch64-dylib flaky build Jan 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 26, 2025

@llvm/issue-subscribers-infrastructure

Author: Shafik Yaghmour (shafik)

This failure happened to me here:

https://lab.llvm.org/buildbot/#/builders/50/builds/9430

FAILED: tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_INCLUDE_TESTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR/../Dialect/Test -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -MF tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o.d -o tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp:9:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.

and from looking at the builds from the past few days it is happening on and off every day.

@tstellar
Copy link
Collaborator

This looks like it might be a race condition in the build system. If you run ninja again does it build?

@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/issue-subscribers-mlir

Author: Shafik Yaghmour (shafik)

This failure happened to me here:

https://lab.llvm.org/buildbot/#/builders/50/builds/9430

FAILED: tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_INCLUDE_TESTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR/../Dialect/Test -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -MF tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o.d -o tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp:9:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.

and from looking at the builds from the past few days it is happening on and off every day.

@EugeneZelenko EugeneZelenko added infrastructure Bugs about LLVM infrastructure and removed flang Flang issues not falling into any other category labels Jan 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/issue-subscribers-infrastructure

Author: Shafik Yaghmour (shafik)

This failure happened to me here:

https://lab.llvm.org/buildbot/#/builders/50/builds/9430

FAILED: tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_INCLUDE_TESTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR/../Dialect/Test -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -MF tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o.d -o tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestVisitorsGeneric.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestVisitorsGeneric.cpp:9:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.

and from looking at the builds from the past few days it is happening on and off every day.

@DavidSpickett
Copy link
Collaborator

TestOps.h.inc is included in TestOps.h. Which itself is included in many places in mlir/test and mlir/unittest. mlir/test/lib/Dialect/Test/CMakeLists.txt calls add_sharded_ops(TestOps 20) to generate this file using the *.td that are in that same folder.

My best guess to fix this is to hoist this up into mlir/ and make sure its a dependency of all the existing tests that include it. I suppose this worked for so long because the build tool(s) just happened to order things in a way that worked.

DavidSpickett added a commit that referenced this issue Feb 3, 2025
That Linaro has been seeing on our dylib bot:
https://lab.llvm.org/staging/#/builders/171/builds/79

In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

Reported in #124335 and
#124485.

I think this was caused by #123910,
and so I've followed what #125004 did,
which seems to be a follow up to that. Same kind of error.

I was not able to reproduce the failure locally, but dylib and
normal builds are ok with this change so I will push it and monitor
the bot's results for a few days.
@DavidSpickett
Copy link
Collaborator

Turns out I was wrong. Everything did have a dependency but in one case not in the right place:
d906da5

Will monitor the bot to confirm.

047e8e4 seems to be the start of these issues and that made it into 20 therefore if the fix works we need to backport the fix and this other fix 72b73c9.

github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 3, 2025
That Linaro has been seeing on our dylib bot:
https://lab.llvm.org/staging/#/builders/171/builds/79

In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

Reported in llvm/llvm-project#124335 and
llvm/llvm-project#124485.

I think this was caused by llvm/llvm-project#123910,
and so I've followed what llvm/llvm-project#125004 did,
which seems to be a follow up to that. Same kind of error.

I was not able to reproduce the failure locally, but dylib and
normal builds are ok with this change so I will push it and monitor
the bot's results for a few days.
DavidSpickett pushed a commit to DavidSpickett/llvm-project that referenced this issue Feb 5, 2025
This cherry picks
[mlir][cmake] Add missing MLIRTestDialect dependency (llvm#125004) (72b73c9)
and
[mlir] Fix build race condition in Pass Manager tests (d906da5)
to the 20.x release branch.

Both of these are addressing issues that started with
llvm#123910, which is already on the 20.x branch.

In each case the cherry-picked changes correct problems that would otherwise be
a race condition in certain builds. Linaro noticed this on our flang dylib (shared library)
build bot.

Failures look like:
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

We have tested these changes on the buildbot for the last 2 days and had no problems.
Whereas before it was failing maybe 1 in 10 builds, enough that multiple people
in the community noticed it.

Reported in llvm#124485 and
llvm#124335.

Co-authored-by: Diego Caballero <[email protected]>
@DavidSpickett
Copy link
Collaborator

Backports - #125834

DavidSpickett pushed a commit to DavidSpickett/llvm-project that referenced this issue Feb 5, 2025
This cherry picks
[mlir] Fix build race condition in Pass Manager tests (d906da5)
to the 20.x release branch.

This addresses issues that started with
llvm#123910, which is already on the 20.x branch.

Linaro noticed this on our flang dylib (shared library) build bot.

In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

We have tested these changes on the buildbot for the last 2 days and had no problems.
Whereas before it was failing maybe 1 in 10 builds, enough that multiple people
in the community noticed it.

Reported in llvm#124485.
@EugeneZelenko EugeneZelenko removed the infrastructure Bugs about LLVM infrastructure label Feb 5, 2025
@DavidSpickett
Copy link
Collaborator

After a few more fixes, I believe from the buildbot results that we have fixed all the instances of this problem. All of the fixes will go into 20.x as well.

swift-ci pushed a commit to swiftlang/llvm-project that referenced this issue Feb 10, 2025
This cherry picks
[mlir] Fix build race condition in Pass Manager tests (d906da5)
to the 20.x release branch.

This addresses issues that started with
llvm#123910, which is already on the 20.x branch.

Linaro noticed this on our flang dylib (shared library) build bot.

In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

We have tested these changes on the buildbot for the last 2 days and had no problems.
Whereas before it was failing maybe 1 in 10 builds, enough that multiple people
in the community noticed it.

Reported in llvm#124485.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 10, 2025
This cherry picks
[mlir] Fix build race condition in Pass Manager tests (d906da5)
to the 20.x release branch.

This addresses issues that started with
llvm/llvm-project#123910, which is already on the 20.x branch.

Linaro noticed this on our flang dylib (shared library) build bot.

In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

We have tested these changes on the buildbot for the last 2 days and had no problems.
Whereas before it was failing maybe 1 in 10 builds, enough that multiple people
in the community noticed it.

Reported in llvm/llvm-project#124485.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this issue Feb 11, 2025
That Linaro has been seeing on our dylib bot:
https://lab.llvm.org/staging/#/builders/171/builds/79

In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~

Reported in llvm#124335 and
llvm#124485.

I think this was caused by llvm#123910,
and so I've followed what llvm#125004 did,
which seems to be a follow up to that. Same kind of error.

I was not able to reproduce the failure locally, but dylib and
normal builds are ok with this change so I will push it and monitor
the bot's results for a few days.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants