-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
test demonstrating orphaned process are not killed with their parent #20264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This test fails and demonstrates that when Gitea kills one of its children (for instance when mirroring a repository timesout), the grand children are not killed and become orphaned that linger and will eventually become zombies. This is explained in detail in these blog posts: * https://hostea.org/blog/zombies/ * https://hostea.org/blog/zombies-part-2/ I'd be happy to work on implementing a bug fix for Gitea. Signed-off-by: Loïc Dachary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the real meaning of this test? The process manager is being called to execute a git command and then we see that the process is in the process list(which is expected?) I don't see that we're killing the process.
// the git clone process forks a grand child git-remote-https, wait for it | ||
pattern := "git-remote-https origin https://4.4.4.4" | ||
ps := func() string { | ||
cmd := exec.Command("ps", "-x", "-o", "pid,ppid,pgid,args") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNIX-only command, tests can also be run on windows machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a build tag to avoid running in windows.
@Gusted thanks for your review.
To verify no zombie process linger when Gitea children are killed.
The kill occurs when the command context is canceled. It is not implemented by Gitea, it is implemented by the CommandContext of the os package "The provided context is used to kill the process (by calling os.Process.Kill) if the context becomes done before the command completes on its own. ". |
The question is if the zombie process will or will not be reaped. If it won't be reaped then we will need to do something. Simplest thing is to add the equivalent of kill -PID I guess. If it will be reaped then we shouldn't care. Zombie processes that need to be reaped should be a NORMAL thing. |
It depends on the implementation of the process that adopts the zombie. The only way to be sure the zombie does not linger is to kill -PID the process group. |
Have you been able to see zombie processes that don't get reaped in Gitea running as PID1 following the patch? |
@zeripath, yes, this is precisely what the test in this pull request demonstrates. You can also repeat that manually following the example in this blog post, which I have verified today to still be relevant with 1.17.1. Setting a process to be a process group leader is essentially a noop if it is not killed with a negative signal. |
OK I'm not sure that proves that processes are not being reaped eventually and you've not shown it happening in real world usage in Gitea, but I suspect that you're right. Now whilst So I think if we are to go down this route we are instead going to have to go down the full init and cgroup technique see: https://dev.to/__mrvik__/following-slippery-processes-6 Doing this would also require upgrading and improving modules/process to also track OS processes in addition to This would have some benefits for people who aren't putting Gtiea as PID 1 as it would allow for proper managing all of the child OS processes created by Gitea. (including So that does make it potentially worth doing. The trouble will be working out if this could be done on Windows or the BSDs - but I guess some linux only functionality is fine. |
I may have missed something and I'd be grateful if you could tell me what it is so I can provide a reproducer that is exactly right to unambiguously reproduce the issue.
I don't think Gitea launches such processes. But even if it did, the first thing it would do is to set its own process group and would not be impacted by kill -N. This is what daemons do: fork(), setsid(), exec() and live their life. This dates back decades before cgroup existed: it is simple and effective. This is what...
I'd be happy to provide more if this is not convincing. But maybe daemons are not what you have in mind when writing "fork off and do things"?
This is a very interesting project and would fit well with the Gitea process manager, cron jobs, etc. But the magnitude of this effort is quite different from a rather simple patch to fix a reproducible bug. |
@zeripath For the record there are new reports of zombies. I'm still willing to work on a fix, but before I do we should get on the same page regarding how to approach the problem. Would you be so kind as to answer the questions above? It would help me figure out what to do about this :-) |
Stale for long time, I guess neither of you is still interested in this problem. Feel free to reopen it this demo is still needed. |
This test fails and demonstrates that when Gitea kills one of its children (for instance when mirroring a repository timesout), the grand children are not killed and become orphaned that linger and will eventually become zombies.
This is explained in detail in these blog posts:
I'd be happy to work on implementing a bug fix for Gitea.
This test fails and demonstrates that when...
This PR is from @dachary of the forgefriends project. Please see here for origin.