Skip to content

Conversation

@markoburcul
Copy link
Contributor

@markoburcul markoburcul commented Oct 2, 2025

We are in the process of moving pipelines across organization to linux docker containers. For the nimbus pipelines we have only two relevant ones:

  • main
  • nimv2_2

The problem we noticed was when running tests within the nix shell in the container because the GCC doesn't come with LTO support and thus we had to disable it for tests. The new successfull builds don't show any degradation in the pipeline performance in terms of duration.

Referenced issue:

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch from 1c6883c to 886ba93 Compare October 2, 2025 08:26
@github-actions
Copy link

github-actions bot commented Oct 2, 2025

Unit Test Results

       15 files  ±0    3 030 suites  ±0   1h 45m 19s ⏱️ + 11m 44s
12 061 tests ±0  11 491 ✔️ ±0  570 💤 ±0  0 ±0 
76 481 runs  ±0  75 629 ✔️ ±0  852 💤 ±0  0 ±0 

Results for commit a8200c8. ± Comparison against base commit dbc7781.

♻️ This comment has been updated with latest results.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch 2 times, most recently from 844ca33 to 6a599a7 Compare October 7, 2025 15:09
@jakubgs
Copy link
Member

jakubgs commented Oct 7, 2025

I looked into these errors from make test DISABLE_TEST_FIXTURES_SCRIPT=1:

09:36:31  /nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: /tmp/nix-shell.DI1pTZ/ccKHH2R1.ltrans0.ltrans.o: in function `visualizeHeader':
09:36:31  <artificial>:(.text+0x3ec): undefined reference to `ETHLightClientHeaderCopyBeaconRoot'
09:36:31  /nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: <artificial>:(.text+0x426): undefined reference to `ETHRootDestroy'
09:36:31  /nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: <artificial>:(.text+0x42e): undefined reference to `ETHLightClientHeaderGetBeacon'
09:36:31  /nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: <artificial>:(.text+0x439): undefined reference to `ETHBeaconBlockHeaderGetSlot'
...

I tried using the target with -j1 and V=2 as well as NIM_PARAMS='--passL:"-Wl,--verbose"' and found this GCC command:

nimbus-eth2/Makefile

Lines 717 to 725 in 5c454ec

gcc -D__DIR__="\"beacon_chain/libnimbus_lc\"" \
--std=c17 -flto \
-pedantic -pedantic-errors \
-Wall -Wextra -Werror -Wno-maybe-uninitialized \
-Wno-unsafe-buffer-usage -Wno-unknown-warning-option \
-o build/test_libnimbus_lc \
beacon_chain/libnimbus_lc/test_libnimbus_lc.c \
build/libnimbus_lc.a \
"$${EXTRA_FLAGS[@]}"; \

And indeed I could reproduce the issue using it:

[jakubgs@caspair:~/work/nimbus-eth2]$ gcc -D__DIR__="\"beacon_chain/libnimbus_lc\"" --std=c17 -flto -pedantic -pedantic-errors -Wall -Wextra -Werror -Wno-maybe-uninitialized -Wno-unsafe-buffer-usage -Wno-unknown-warning-option -o build/test_libnimbus_lc beacon_chain/libnimbus_lc/test_libnimbus_lc.c build/libnimbus_lc.a
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: /tmp/nix-shell.fK6O0E/ccolI9dG.ltrans0.ltrans.o: in function `visualizeHeader':
<artificial>:(.text+0x3cc): undefined reference to `ETHLightClientHeaderCopyBeaconRoot'
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: <artificial>:(.text+0x406): undefined reference to `ETHRootDestroy'
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: <artificial>:(.text+0x40e): undefined reference to `ETHLightClientHeaderGetBeacon'
...

@jakubgs
Copy link
Member

jakubgs commented Oct 7, 2025

We attempted to compare the build/libnimbus_lc.a built via Nix tooling and using the Ubuntu tooling.

What we found was that nm could not print the symbols on build/libnimbus_lc.a on NixOS:

[jakubgs@caspair:~/work/nimbus-eth2]$ nm -g --defined-only ~/libnimbus_lc.a 2>&1 | grep NimMain

[jakubgs@caspair:~/work/nimbus-eth2]$ 

While it could on our CI host:

[email protected]:~ % nm -g --defined-only libnimbus_lc.a 2>&1 | grep NimMain
00000000 T NimMain
00000000 T NimMainInner
00000000 T NimMainModule

And the error was:

@mlibnimbus_lc.nim.c.o:
nm: @mlibnimbus_lc.nim.c.o: plugin needed to handle lto object
0000000000000000 W g_mlibnimbus_lc.nim.c.2c72c415
0000000000000001 C __gnu_lto_slim

Which indicates that the nm on NixOS lacks LTO support.

@jakubgs
Copy link
Member

jakubgs commented Oct 7, 2025

The we found out that LTO actually works correctly when nm is used from gcc-unwrapped instead of gcc:

[jakubgs@caspair:~/work/nimbus-eth2]$ /nix/store/lc7vdrd7l2apdjy9gzbljn6fgzj5nyz3-gcc-wrapper-11.4.0/bin/nm -Ag --defined-only build/libnimbus_lc.a 2>&1 | grep ETHRootDestroy

[jakubgs@caspair:~/work/nimbus-eth2]$ /nix/store/g3xq3b8b8gwiw7j68v5wh3lvgc4fr3yz-gcc-11.4.0/bin/gcc-nm -Ag --defined-only build/libnimbus_lc.a 2>&1 | grep ETHRootDestroy
build/libnimbus_lc.a:@mlibnimbus_lc.nim.c.o:00000000 T ETHRootDestroy

Which means that the Nix wrapping of GCC is causing issues with LTO plugin.

When unwrapped GCC is used it fails on missing Glibc instead:

[jakubgs@caspair:~/work/nimbus-eth2]$ /nix/store/g3xq3b8b8gwiw7j68v5wh3lvgc4fr3yz-gcc-11.4.0/bin/gcc build/libnimbus_lc.a beacon_chain/libnimbus_lc/test_libnimbus_lc.c -o build/test_libnimbus_lc
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: cannot find crt1.o: No such file or directory
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: cannot find crti.o: No such file or directory
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: cannot find -lgcc_s: No such file or directory
collect2: error: ld returned 1 exit status

@jakubgs
Copy link
Member

jakubgs commented Oct 7, 2025

If we set export NIX_DEBUG=1 we can make the wrapped GCC print debug logs which shows the flags:

original flags to /nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld:
  -plugin
  /nix/store/g3xq3b8b8gwiw7j68v5wh3lvgc4fr3yz-gcc-11.4.0/libexec/gcc/x86_64-unknown-linux-gnu/11.4.0/liblto_plugin.so
  -plugin-opt=/nix/store/g3xq3b8b8gwiw7j68v5wh3lvgc4fr3yz-gcc-11.4.0/libexec/gcc/x86_64-unknown-linux-gnu/11.4.0/lto-wrapper
extra flags after to /nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld:
  -rpath
  /nix/store/ddwyrxif62r8n6xclvskjyy6szdhvj60-glibc-2.39-5/lib
  -rpath
  /nix/store/p5spyl0dhqwrs98kjbhca964rcyl4pj2-gcc-11.4.0-lib/lib
/nix/store/hqvni28zpibl6jsqqimcvng6h6qm58xy-binutils-2.41/bin/ld: /tmp/nix-shell.a7vrBu/ccR3W27J.ltrans0.ltrans.o: in function `visualizeHeader':
<artificial>:(.text+0x374): undefined reference to `ETHLightClientHeaderCopyBeaconRoot'

Which show that the original ld flags included LTO plugin, while the modified flags do not.

@jakubgs
Copy link
Member

jakubgs commented Oct 7, 2025

I have managed to make it work by passing NIM_PARAMS='--passC:"-fno-lto" --passL:"-fno-lto"':

[jakubgs@caspair:~/work/nimbus-eth2]$ make test DISABLE_TEST_FIXTURES_SCRIPT=1 NIM_PARAMS='--passC:"-fno-lto" --passL:"-fno-lto"'
...
Build completed successfully: build/process_state
Build completed successfully: build/libnimbus_lc.a
Build completed successfully: build/test_libnimbus_lc
Build completed successfully: build/nimbus_signing_node
Build completed successfully: build/block_sim
Build completed successfully: build/consensus_spec_tests_minimal
Build completed successfully: build/consensus_spec_tests_mainnet
Build completed successfully: build/all_tests

Running consensus_spec_tests_minimal --xml:build/consensus_spec_tests_minimal.xml --console

Which confirms the issue is with wrapped GCC and LTO support when loading symbols from build/libnimbus_lc.a.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch 3 times, most recently from d4abbc1 to 6f93b3f Compare October 8, 2025 10:12
@markoburcul
Copy link
Contributor Author

Posts with same issue:
https://discourse.nixos.org/t/anyone-was-successful-with-setting-up-lto-pgo-with-gcc/63905
NixOS/nixpkgs#399656
https://discourse.nixos.org/t/how-to-correctly-use-ld-nm-ar-ranlib-with-lto-in-nix-develop-shell/33220

What I've found is this line in the ld-wrapper.sh from nixpkgs which is potentially causing the issue:
https://github.com/NixOS/nixpkgs/blob/20c4598c84a671783f741e02bf05cbfaf4907cff/pkgs/build-support/bintools-wrapper/ld-wrapper.sh#L144C1-L146C19

The LTO does the run time optimizations, but we are using it only in the tests and not when building the node itself. I will do a test where I'll run the tests with and without the LTO enabled to see if it impacts significantly to the performance. If not, I think it's safe to disable it.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch from d0cc741 to 6f93b3f Compare October 8, 2025 13:11
@markoburcul
Copy link
Contributor Author

In the normal build without nix, we are having an increase of 5 min because of disabling LTO in the tests step:

The Nix build failed even though in the shell hook I've set the NIM_PARAMS to disable LTO, but maybe the outer quotes got stripped. I've rerun the build with setting the variable in the test step like this:

nix.develop('''make DISABLE_TEST_FIXTURES_SCRIPT=1 NIM_PARAMS='--passC:"-fno-lto" --passL:"-fno-lto"' test''', pure: false)

it passed the step, now I'm waiting for the total time.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch from 6f93b3f to f002c34 Compare October 9, 2025 08:35
@markoburcul markoburcul requested review from a team and tersec October 9, 2025 12:11
@markoburcul markoburcul changed the title WIP ci: Move linux builds to the container ci: Move linux builds to the container Oct 9, 2025
Copy link
Member

@jakubgs jakubgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a separate Jenkinsfile from ci/Jenkinsfile?

@markoburcul
Copy link
Contributor Author

Why is this a separate Jenkinsfile from ci/Jenkinsfile?

Because we are using a docker container and since the same jenkinsfile is used for macos builds this would fail.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch 2 times, most recently from e675e2c to bf34d0f Compare October 10, 2025 10:25
@markoburcul markoburcul requested a review from a team October 10, 2025 12:12
@tersec
Copy link
Contributor

tersec commented Oct 13, 2025

I have managed to make it work by passing NIM_PARAMS='--passC:"-fno-lto" --passL:"-fno-lto"':

[jakubgs@caspair:~/work/nimbus-eth2]$ make test DISABLE_TEST_FIXTURES_SCRIPT=1 NIM_PARAMS='--passC:"-fno-lto" --passL:"-fno-lto"'
...
Build completed successfully: build/process_state
Build completed successfully: build/libnimbus_lc.a
Build completed successfully: build/test_libnimbus_lc
Build completed successfully: build/nimbus_signing_node
Build completed successfully: build/block_sim
Build completed successfully: build/consensus_spec_tests_minimal
Build completed successfully: build/consensus_spec_tests_mainnet
Build completed successfully: build/all_tests

Running consensus_spec_tests_minimal --xml:build/consensus_spec_tests_minimal.xml --console

Which confirms the issue is with wrapped GCC and LTO support when loading symbols from build/libnimbus_lc.a.

In general, Nimbus ships an LTO build, so it'd be better to test with it too.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch 2 times, most recently from fb5c58b to df55374 Compare October 14, 2025 07:53
@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch 5 times, most recently from 798e3a1 to 9a11c8d Compare October 16, 2025 12:19
Copy link
Member

@jakubgs jakubgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really really really dumb how we can't reuse the same Jenkinsfile because of the dockerfile block being in a static agent block parsed early on.

But what we could do is define stages in a common.groovy file to not repeat the same thing twice. But that can be tried separately.

@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch from 9a11c8d to 3b8de86 Compare October 16, 2025 12:47
@markoburcul markoburcul force-pushed the move-linux-builds-to-the-container branch from 3b8de86 to 98136b8 Compare October 16, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants