-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
regression testcases with execution timeout
./tests/00-geo-rep/00-georep-verify-non-root-setup.t - 900 second./tests/00-geo-rep/bug-1600145.t - 600 second
./tests/00-geo-rep/georep-stderr-hang.t - 500 second
./tests/00-geo-rep/georep-basic-tarssh-ec.t - 500 second
./tests/00-geo-rep/georep-basic-rsync-ec.t - 500 second
./tests/00-geo-rep/georep-basic-dr-tarssh.t - 500 second
./tests/00-geo-rep/georep-basic-dr-tarssh-arbiter.t - 500 second
./tests/00-geo-rep/georep-basic-dr-rsync.t - 500 second
./tests/00-geo-rep/georep-basic-dr-rsync-arbiter.t - 500 second
./tests/00-geo-rep/00-georep-verify-setup.t - 400 second
./tests/00-geo-rep/georep-config-upgrade.t - 300 second
./tests/00-geo-rep/01-georep-glusterd-tests.t - 300 second
./tests/bugs/snapshot/bug-1399598-uss-with-ssl.t - 200 second
./tests/bugs/rpc/bug-921072.t - 200 second
./tests/bugs/replicate/bug-830665.t - 200 second
./tests/bugs/nfs/bug-974972.t - 200 second
./tests/bugs/cli/bug-1320388.t - 200 second
./tests/basic/namespace.t - 200 second
./tests/basic/gfapi/gfapi-ssl-test.t - 200 second
./tests/basic/gfapi/gfapi-ssl-load-volfile-test.t - 200 second
./tests/basic/fencing/afr-lock-heal-basic.t - 200 second
./tests/basic/fencing/afr-lock-heal-advanced.t - 200 second
./tests/000-flaky/basic\_mount-nfs-auth.t - 200 second
Steps to reproduce
- Run one of above tests.
- The chosen test finally end up with timeout, then, we can find a residual glusterfs process and volume used for test is not deleted.
- Run a new testcase.
Phenomenon
The new testcase will get stucked like follow:
... GlusterFS Test Framework ...
/home/shard/glusterfs /home/shard/glusterfs
/home/shard/glusterfs
testing 'timeout' command
============================ (784 / 840) ============================
[10:48:54] Running tests in file ./tests/bugs/transport/bug-873367.t
./tests/bugs/transport/bug-873367.t ..
And a 'D' state process occurs:
root 293910 0.0 0.0 222756 1824 pts/2 D+ 18:48 0:00 mkdir -p /d/backends /mnt/glusterfs/0 /mnt/glusterfs/1 /mnt/glusterfs/2 /mnt/glusterfs/3 /mnt/nfs/0 /mnt/nfs/1 /d/dev
Analysis
When the new testcase introduce include.rc
and do env init, it executes WORKDIRS="$B0 $M0 $M1 $M2 $M3 $N0 $N1 $DEVDIR"; mkdir -p $WORKDIRS
. Since $M0
is currently ocuppied by the residual glusterfs process, mkdir
gets stucked. The procedure of this new testcase stops, and finally this testcase gets timeout.
A trap in include.rc is registered for every testcase, so theoretically, the cleanup procedure will be called when a testcase times out and exits.
function force_terminate () {
local ret=$?;
1>&2 echo -e "\nreceived external"\
"signal --`kill -l $ret`--, calling 'cleanup' ...\n";
cleanup;
exit $ret;
}
trap force_terminate INT TERM HUP
In run-tests.sh, each test is executed with timeout command. In theory, timeout command will send two signals during execution at most:
- when target shell cmd times out, a SIGTERM (will not be sent with --foreground option) will be sent.
- if the target shell cmd still keeps running, a SIGKILL (cannot be trapped) will be send after <--kill-after> seconds.
Howerver, to stop the running tests on Ctrl-C, option '--foreground' is used for timeout command.
1f03309
After this patch, the target shell cmd does not time out any more. Each test that times out is terminated by SIGKILL finally and hence force_terminate is not called as expected.
Meanwhile, terminate_pids
may fail to terminate glusterfs-related processes even if force_terminate is called. Because the following command redirect stdout to stderr and not trigger buffer flush:
1>&2 echo -e "\nreceived external"\
"signal --`kill -l $ret`--, calling 'cleanup' ...\n";
In this case, the subsequent stdout is also redirected stderr, too, until the buffer is filled up. So, function calls in cleanup cannot return expected info.
Temporary solution
I remove '--foreground' in run-tests.sh and '>&2 echo' in inlude.rc, cleanup works well when tests time out.
timeout --foreground -k ${kill_after_time} ${cmd_timeout} prove -vmfe '/bin/bash' ${t}
So, should we consider --foreground
as an optional settting, instead using it by default? Once it is specified, cleanup is not called even if tests time out and there will be residual processes and volume used for tests. It seems unreasonable.