-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Describe the bug
I used nixpkgs-review
to build a PR that affect many packages (#216403).
After 15 hours, the CPU was still at 100% load, but the temperature low, as if it is not doing anything.
[root@gaming:~]# top
top - 13:02:36 up 13 days, 20:17, 1 user, load average: 32.16, 32.26, 32.18
Tasks: 354 total, 5 running, 349 sleeping, 0 stopped, 0 zombie
%Cpu(s): 95.2 us, 4.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 32027.9 total, 24085.9 free, 3354.6 used, 4587.4 buff/cache
MiB Swap: 32768.0 total, 26143.9 free, 6624.0 used. 28058.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1501010 nixbld14 20 0 5432076 88508 17160 R 733.3 0.3 4247:32 python3.11
1533870 nixbld23 20 0 6134444 75340 15304 R 730.3 0.2 4236:56 python3.10
1622385 nixbld18 20 0 3866172 427812 33696 R 72.7 1.3 344:42.42 python3.11
1622408 nixbld20 20 0 3936212 707792 34208 R 48.7 2.2 339:52.25 python3.10
There are 4 python package builds that seem to be stuck.
They are running pytest
.
root 1272 0.0 0.0 445100 616 ? Ssl Feb01 0:00 nix-daemon --daemon
root 1362239 1.1 0.1 2331936 61520 ? Ssl 01:25 7:38 \_ nix-daemon 1362236
nixbld14 1480363 0.0 0.0 7544 1320 ? Ss 01:52 0:00 \_ bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
nixbld14 1501010 639 0.2 5432076 88508 ? Rl 01:58 4203:02 | \_ /nix/store/r85cm22i6058g4zj76w05rzvqwil6d2g-python3-3.11.1/bin/python3.11 -m pytest -k not test_tv_inpa
nixbld23 1505345 0.0 0.0 7536 1320 ? Ss 01:59 0:00 \_ bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
nixbld23 1533870 642 0.2 6134444 75340 ? Rl 02:04 4192:28 | \_ /nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/bin/python3.10 -m pytest -k not test_tv_inpa
nixbld20 1569160 0.0 0.0 7880 1328 ? Ss 02:12 0:00 \_ bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
nixbld20 1622408 53.8 2.1 3936212 707792 ? R 02:31 336:25 | \_ /nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/bin/python3.10 -m pytest
nixbld18 1571036 0.0 0.0 7880 1328 ? Ss 02:14 0:00 \_ bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
nixbld18 1622385 54.6 1.3 3866172 427812 ? R 02:31 341:11 \_ /nix/store/r85cm22i6058g4zj76w05rzvqwil6d2g-python3-3.11.1/bin/python3.11 -m pytest
but the build is stuck at the same test for hours...
[4/69/77 built (12 failed), 1989 copied (17709.2/17716.9 MiB), 4213.5 MiB DL] building python3.10-cvxpy-1.2.3 (pytestCheckPhase): .........
2 processes only do futex
calls and the other 2 sched_yield
:: (nixbld13) → /tmp/nix-build-python3.10-OpenSfM-unstable-2022-03-10.drv-0
UID PID PPID STIME TIME COMMAND
30013 3067079 3055397 20:48 00:00:00 bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
30013 3101891 3067079 20:53 01:32:21 /nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/bin/python3.10 -m pytest
strace -ff -p 3101891
...
[pid 3101891] sched_yield( <unfinished ...>
[pid 3102726] <... sched_yield resumed>) = 0
[pid 3102725] sched_yield( <unfinished ...>
[pid 3101891] <... sched_yield resumed>) = 0
^C
[root@gaming:~]# strace -f -c -p 3101891
strace: Process 3101891 attached with 3 threads
...
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 20.935565 8 2546114 sched_yield
------ ----------- ----------- --------- --------- ----------------
100.00 20.935565 8 2546114 total
:: (nixbld14) → /tmp/nix-build-python3.10-qutip-4.7.1.drv-0
UID PID PPID STIME TIME COMMAND
30014 3067093 3055397 20:48 00:00:00 bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
30014 3139076 3067093 20:56 04:31:17 /nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/bin/python3.10 -c import qutip.testing; qutip.testing.run()
strace -ff -p 3139076
...
[pid 3139643] futex(0x1c7fd44, FUTEX_WAIT_PRIVATE, 595632, NULL <unfinished ...>
[pid 3139645] futex(0x1c7fd44, FUTEX_WAIT_PRIVATE, 595632, NULL <unfinished ...>
[pid 3139642] futex(0x1c7fd44, FUTEX_WAIT_PRIVATE, 595632, NULL <unfinished ...>
[pid 3139646] futex(0x1c7fd44, FUTEX_WAIT_PRIVATE, 595632, NULL <unfinished ...>
[pid 3139647] futex(0x1c7fd44, FUTEX_WAKE_PRIVATE, 2147483647) = 4
[pid 3139645] <... futex resumed>) = 0
[pid 3139643] <... futex resumed>) = 0
[pid 3139642] <... futex resumed>) = 0
[pid 3139646] <... futex resumed>) = 0
[pid 3139646] futex(0x121a2e4, FUTEX_WAIT_PRIVATE, 1955808, NULL <unfinished ...>
[pid 3139642] futex(0x121a2e4, FUTEX_WAIT_PRIVATE, 1955808, NULL <unfinished ...>
[pid 3139645] futex(0x121a2e4, FUTEX_WAIT_PRIVATE, 1955808, NULL <unfinished ...>
[pid 3139643] futex(0x121a2e4, FUTEX_WAIT_PRIVATE, 1955808, NULL <unfinished ...>
[pid 3139652] futex(0x121a2e4, FUTEX_WAKE_PRIVATE, 2147483647) = 2
[pid 3139642] <... futex resumed>) = 0
[pid 3139646] <... futex resumed>) = 0
[pid 3139645] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable)
[pid 3139643] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable)
[pid 3139641] futex(0x1c7fd44, FUTEX_WAIT_PRIVATE, 595640, NULL^Cstrace: Process 3139076 detached
[root@gaming:~]# strace -f -c -p 3139076
strace: Process 3139076 attached with 16 threads
...
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.286391 66 4322 1070 futex
------ ----------- ----------- --------- --------- ----------------
100.00 0.286391 66 4322 1070 total
:: (nixbld16) → /tmp/nix-build-python3.11-OpenSfM-unstable-2022-03-10.drv-0
UID PID PPID STIME TIME COMMAND
30016 3067100 3055397 20:48 00:00:00 bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
30016 3100028 3067100 20:53 01:34:09 /nix/store/r85cm22i6058g4zj76w05rzvqwil6d2g-python3-3.11.1/bin/python3.11 -m pytest
strace -ff -p 3100028
...
[pid 3100638] sched_yield( <unfinished ...>
[pid 3100637] <... sched_yield resumed>) = 0
[pid 3100638] <... sched_yield resumed>) = 0
[pid 3100028] sched_yield( <unfinished ...>
[pid 3100638] sched_yield( <unfinished ...>
[pid 3100637] sched_yield( <unfinished ...>
[pid 3100638] <... sched_yield resumed>) = 0
[pid 3100028] <... sched_y^C
[root@gaming:~]# strace -f -c -p 3100028
strace: Process 3100028 attached with 3 threads
...
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 17.474801 8 2085520 sched_yield
------ ----------- ----------- --------- --------- ----------------
100.00 17.474801 8 2085520 total
:: (nixbld17) → /tmp/nix-build-python3.11-cvxpy-1.2.3.drv-0
UID PID PPID STIME TIME COMMAND
30017 3067103 3055397 20:48 00:00:00 bash -e /nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh
30017 3086650 3067103 20:51 05:15:56 /nix/store/r85cm22i6058g4zj76w05rzvqwil6d2g-python3-3.11.1/bin/python3.11 -m pytest -k not test_tv_inpainting and not test_diffcp_sdp_example and not test_huber and not test_partial_problem ./cvxpy
strace -ff -p 3086650
...
[pid 3088301] futex(0x1e54ff4, FUTEX_WAIT_PRIVATE, 715744, NULL <unfinished ...>
[pid 3088293] futex(0x1e54ff4, FUTEX_WAKE_PRIVATE, 2147483647) = 1
[pid 3088301] <... futex resumed>) = 0
[pid 3088301] futex(0x1e4b784, FUTEX_WAIT_PRIVATE, 715744, NULL <unfinished ...>
[pid 3088292] futex(0x1e4b784, FUTEX_WAIT_PRIVATE, 715744, NULL <unfinished ...>
[pid 3088298] futex(0x1e4b784, FUTEX_WAKE_PRIVATE, 2147483647) = 2
[pid 3088301] <... futex resumed>) = 0
[pid 3088292] <... futex resumed>) = 0
[pid 3088301] futex(0x1e54ff4, FUTEX_WAIT_PRIVATE, 715752, NULL <unfinished ...>
[pid 3088305] futex(0x1e54ff4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 3088301] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable)
[pid 3088303] futex(0x1e4b784, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 3088297] futex(0x1e54ff4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
^C
[root@gaming:~]# strace -f -c -p 3086650
strace: Process 3086650 attached with 23 threads
...
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.464360 71 6463 1839 futex
------ ----------- ----------- --------- --------- ----------------
100.00 0.464360 71 6463 1839 total
Maybe this is a pytest issue, but Nix should also do something to unblock such situations!
Workaround
To unblock the situation and let Nix build the rest of the packages, you have to kill the deadlocked python test processes.
error: builder for '/nix/store/r9msjyv7p81fl9llcz6hzp8czr5ml83w-python3.11-OpenSfM-unstable-2022-03-10.drv' failed with exit code 143;
last 10 log lines:
> opensfm/test/test_dense.py .. [ 55%]
> opensfm/test/test_geo.py ..... [ 57%]
> opensfm/test/test_geometry.py .... [ 59%]
> opensfm/test/test_io.py ......... [ 62%]
> opensfm/test/test_matching.py ...... [ 65%]
> opensfm/test/test_multiview.py ......... [ 68%]
> opensfm/test/test_pairs_selection.py ......... [ 72%]
> opensfm/test/test_reconstruction_alignment.py ........ [ 75%]
> opensfm/test/test_reconstruction_incremental.py Terminated
> /nix/store/b09v23lirgvci3wzszh22mbkdfj0h0yq-stdenv-linux/setup: line 1582: pop_var_context: head of shell_variables not a function context
The "Terminated" appears.
Also the strange stdenv-linux/setup
error.
It's interesting that even when i killed 3 of 4, the last one is not unblocked. So they seem not to get blocked on each other. But the issue also don't appear when building the package alone!
Steps To Reproduce
nixpkgs-review pr 215689
(Builds 138 packages which need 17 GB!)
Not reproducible with e.g.:
nix-build . -A python310Packages.cvxpy -A python310Packages.qutip -A python310Packages.opensfm -A python311Packages.opensfm
Expected behavior
Nix should be able to build 100+ packages without errors.
nix-env --version
output nix-env (Nix) 2.11.1
Additional context
Packages that get stuck in this way:
python3.11-OpenSfM-unstable-2022-03-10
python3.10-OpenSfM-unstable-2022-03-10
python3.10-qutip-4.7.1
python3.11-cvxpy-1.2.3
python3.10-cvxpy-1.2.3
Priorities
Add 👍 to issues you find important.