Skip to content

tt start/stop commands does not always work #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
psergee opened this issue Jan 30, 2023 · 1 comment · Fixed by #327
Closed

tt start/stop commands does not always work #325

psergee opened this issue Jan 30, 2023 · 1 comment · Fixed by #327
Assignees
Labels
bug Something isn't working release blocker

Comments

@psergee
Copy link
Contributor

psergee commented Jan 30, 2023

Looks like this is some kind of race in start command. It does not start all instances under high load:

$ ../../tt/tt start myapp
   • Starting an instance [myapp:s1-master]...
   • Starting an instance [myapp:s1-replica]...
   • Starting an instance [myapp:s2-master]...
   • Starting an instance [myapp:s2-replica]...
   • Starting an instance [myapp:stateboard]...
   • Starting an instance [myapp:router]...
$ ../../tt/tt status myapp
   • myapp:s1-replica: ERROR. The process is dead
   • myapp:s2-master: RUNNING. PID: 48573.
   • myapp:s2-replica: RUNNING. PID: 48581.
   • myapp:stateboard: RUNNING. PID: 48585.
   • myapp:router: RUNNING. PID: 48595.
   • myapp:s1-master: ERROR. The process is dead

$ ../../tt/tt start myapp
   • Starting an instance [myapp:router]...
   • Starting an instance [myapp:s1-master]...
   • Starting an instance [myapp:s1-replica]...
   • Starting an instance [myapp:s2-master]...
   • Starting an instance [myapp:s2-replica]...
   • Starting an instance [myapp:stateboard]...
$ ../../tt/tt status myapp
   • myapp:s1-replica: RUNNING. PID: 48696.
   • myapp:s2-master: RUNNING. PID: 48700.
   • myapp:s2-replica: RUNNING. PID: 48701.
   • myapp:stateboard: ERROR. The process is dead
   • myapp:router: ERROR. The process is dead
   • myapp:s1-master: RUNNING. PID: 48691.

tt stop does not stop all instances:

$ ../../tt/tt stop myapp
   • the process is already dead. Error: "no such process"
   • The Instance myapp:s1-master (PID = 48691) has been terminated.
   • The Instance myapp:s1-replica (PID = 48696) has been terminated.
   • The Instance myapp:s2-master (PID = 48700) has been terminated.
   • The Instance myapp:s2-replica (PID = 48701) has been terminated.
   • the process is already dead. Error: "no such process"
$ ps awux | grep taran
user    48737  0.2  0.1 775644 25524 pts/4    Sl   16:02   0:00 tarantool stateboard.init.lua <running>

This issue reproduces on my system after the following commit:
4b0d12d tt: fix a race condition between tt start and tt stop

@psergee psergee added bug Something isn't working teamE labels Jan 30, 2023
@psergee
Copy link
Contributor Author

psergee commented Jan 31, 2023

Log messages for dead processes:
stateboard:

2023-01-31 10:26:07.757 [10752] main I> entering the event loop
2023-01-31 10:28:22.772 [10752] main/114/iproto.shutdown I> tx_binary: stopped
2023-01-31 10:28:22.773 [10752] main/102/on_shutdown utils.c:489 E> LuajitError: stdin:290: variable 'close_sock_tr' is not declared
2023/01/31 10:28:22 Watchdog(INFO): the Instance has shutdown.

router:

2023-01-31 10:28:54.445 [10943] main I> entering the event loop
2023-01-31 10:29:40.556 [10943] main/102/on_shutdown utils.c:489 E> LuajitError: stdin:290: variable 'close_sock_tr' is not declared
2023/01/31 10:29:40 Watchdog(INFO): the Instance has shutdown.

vr009 added a commit that referenced this issue Feb 2, 2023
vr009 added a commit that referenced this issue Feb 2, 2023
vr009 added a commit that referenced this issue Feb 3, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 3, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 3, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 4, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 4, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 6, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 7, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added two fields for controlling
a state of the instance being watched and a mutex for synchronizing
the goroutines changing them.

Part of #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added two fields for controlling
a state of the instance being watched and a mutex for synchronizing
the goroutines changing them.

Part of #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added two fields for controlling
a state of the instance being watched and a mutex for synchronizing
the goroutines changing them.

Part of #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added two fields for controlling
a state of the instance being watched and a mutex for synchronizing
the goroutines changing them.

Part of #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added two fields for controlling
a state of the instance being watched and a mutex for synchronizing
the goroutines changing them.

Part of #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added a field for controlling
a state of the watchdog and a mutex for synchronizing the goroutines
changing this field.

Part of #325
vr009 added a commit that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit that referenced this issue Feb 10, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added a field for controlling
a state of the watchdog and a mutex for synchronizing the goroutines
changing this field.

Part of #325
vr009 added a commit that referenced this issue Feb 10, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
LeonidVas pushed a commit that referenced this issue Feb 10, 2023
This patch fixes the signals handling strategy. Now there is only
one handling loop for all signals. Added a field for controlling
a state of the watchdog and a mutex for synchronizing the goroutines
changing this field.

Part of #325
LeonidVas pushed a commit that referenced this issue Feb 10, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants