Skip to content

Tests show as both FAILED and PASSED after node crash #932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hb2638 opened this issue Jul 17, 2023 · 3 comments
Open

Tests show as both FAILED and PASSED after node crash #932

hb2638 opened this issue Jul 17, 2023 · 3 comments
Labels

Comments

@hb2638
Copy link

hb2638 commented Jul 17, 2023

Hi,
I'm noticing that when we have a worker crash (which started happening frequenctly last week) the test appears as both FAILED and PASSED.

E.x.: Below is a snippet of the logs for tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21. It ran on worker #7 which crashed and then ran later on worker #8 which PASSED.

	Line 4026: tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 
	Line 4037: [gw7] node down: Not properly terminated
	Line 4038: [gw7] [ 97%] FAILED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21

replacing crashed worker gw7

	Line 4047: tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 
	Line 4070: [gw8] [ 98%] PASSED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 
	Line 7994: worker 'gw7' crashed while running 'tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21'
	Line 8220: =========================== short test summary info ============================
	Line 8227: FAILED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
	Line 9397: PASSED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
	Line 9420: = 6 failed, 1137 passed, 252 skipped, 96 warnings, 15 rerun in 6617.65s (1:50:17) =

package versions:
pytest-7.4.0
pytest_cov-4.1.0
pytest_xdist-3.3.1
coverage-7.2.7
pytest_rerunfailures-12.0
psutil-5.9.5

command line:
pytest --log-format="%Y-%m-%dT%H:%M:%S.%f%z" --log-date-format="%Y-%m-%d %H:%M:%S" --log-format "%(asctime)s %(levelname)-8s [%(name)s|%(process)d|%(thread)d|%(threadName)s] [%(pathname)s:%(funcName)s:%(lineno)d] %(message)s" --max-worker-restart 5 -n 16 --dist loadgroup -rfEsxXp --reruns 2 --reruns-delay 30 -v --tb=long -o faulthandler_timeout=3600 --durations=20 --durations-min=60 --cov=src/ tests

We're running about 1000 tests using 16 workers

@nicoddemus
Copy link
Member

Perhaps that's due to --reruns 2 in the command-line?

@hb2638
Copy link
Author

hb2638 commented Jul 17, 2023

reruns

I don't know because I know I see entries that start with

	Line 1278: plugins: rerunfailures-12.0, cov-
	Line 1300: [gw6] [  0%] RERUN tests/aws/test
	Line 1302: [gw2] [  0%] RERUN tests/aws/test
	Line 1304: [gw4] [  0%] RERUN tests/aws/test
	Line 1654: [gw2] [ 12%] RERUN tests/src/test
	Line 1664: [gw2] [ 12%] RERUN tests/src/test
	Line 1676: [gw2] [ 13%] RERUN tests/src/test
	Line 1694: [gw2] [ 13%] RERUN tests/src/test
	Line 1744: [gw13] [ 15%] RERUN tests/src/tes
	Line 1754: [gw13] [ 15%] RERUN tests/src/tes
	Line 3294: [gw13] [ 70%] RERUN tests/src/scr
	Line 3322: [gw13] [ 71%] RERUN tests/src/scr
	Line 3364: [gw13] [ 73%] RERUN tests/src/scr
	Line 3558: [gw13] [ 80%] RERUN tests/src/scr
	Line 3632: [gw10] [ 82%] RERUN tests/cimdb/l
	Line 4075: [gw5] [ 98%] RERUN tests/src/scri

but I'm not seeing that for tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 .

Most of the tests involve hitting the DB and the tests sometimes intermittently fail because of a sql deadlock, so we want to retry the test a few times before failing.

@nicoddemus
Copy link
Member

RERUN can only work when the test fails with an error or exception, but does not work for a hard crash (indicated by the message replacing crashed worker gw7).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants