Skip to content

test demonstrating orphaned process are not killed with their parent #20264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

realaravinth
Copy link
Contributor

@realaravinth realaravinth commented Jul 6, 2022

This test fails and demonstrates that when Gitea kills one of its children (for instance when mirroring a repository timesout), the grand children are not killed and become orphaned that linger and will eventually become zombies.

This is explained in detail in these blog posts:

I'd be happy to work on implementing a bug fix for Gitea.

...
[unit-test:115]         	            	  16511       1   16494 /usr/libexec/git-core/git remote-https origin https://4.4.4.4
[unit-test:116]         	            	  16513   16511   16494 /usr/libexec/git-core/git-remote-https origin https://4.4.4.4
...
[unit-test:120]         	            	  17165   16486       1 ps -x -o pid,ppid,pgid,args
[unit-test:121]         	            	Contains git-remote-https origin https://4.4.4.4
[unit-test:122]         	Test:       	TestManagerKillGrandChildren
[unit-test:123] FAIL
[unit-test:124] coverage: 35.5% of statements
[unit-test:125] FAIL	code.gitea.io/gitea/modules/process	6.518s

This test fails and demonstrates that when...


This PR is from @dachary of the forgefriends project. Please see here for origin.

This test fails and demonstrates that when Gitea kills one of its
children (for instance when mirroring a repository timesout), the grand
children are not killed and become orphaned that linger and will
eventually become zombies.

This is explained in detail in these blog posts:

* https://hostea.org/blog/zombies/
* https://hostea.org/blog/zombies-part-2/

I'd be happy to work on implementing a bug fix for Gitea.

Signed-off-by: Loïc Dachary <[email protected]>
@realaravinth realaravinth marked this pull request as draft July 6, 2022 07:30
@realaravinth realaravinth marked this pull request as ready for review July 7, 2022 04:50
Copy link
Contributor

@Gusted Gusted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the real meaning of this test? The process manager is being called to execute a git command and then we see that the process is in the process list(which is expected?) I don't see that we're killing the process.

// the git clone process forks a grand child git-remote-https, wait for it
pattern := "git-remote-https origin https://4.4.4.4"
ps := func() string {
cmd := exec.Command("ps", "-x", "-o", "pid,ppid,pgid,args")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNIX-only command, tests can also be run on windows machine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The implementation will need to be different and I'll address that if @zeripath thinks this is a problem that deserves fixing since he recently added a feature to set Gitea children process to be process group leaders.


source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a build tag to avoid running in windows.

@GiteaBot GiteaBot added lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jul 10, 2022
@realaravinth
Copy link
Contributor Author

@Gusted thanks for your review.

What is the real meaning of this test?

To verify no zombie process linger when Gitea children are killed.

The process manager is being called to execute a git command and then we see that the process is in the process list(which is expected?) I don't see that we're killing the process.

The kill occurs when the command context is canceled. It is not implemented by Gitea, it is implemented by the CommandContext of the os package "The provided context is used to kill the process (by calling os.Process.Kill) if the context becomes done before the command completes on its own. ".

@realaravinth
Copy link
Contributor Author

@zeripath would you like me to work on fixing this bug? Or do you think it is not worth the effort?


src

@zeripath
Copy link
Contributor

zeripath commented Jul 26, 2022

The question is if the zombie process will or will not be reaped.

If it won't be reaped then we will need to do something. Simplest thing is to add the equivalent of kill -PID I guess.

If it will be reaped then we shouldn't care.

Zombie processes that need to be reaped should be a NORMAL thing.

@realaravinth
Copy link
Contributor Author

It depends on the implementation of the process that adopts the zombie. The only way to be sure the zombie does not linger is to kill -PID the process group.


src

@realaravinth
Copy link
Contributor Author

@zeripath with that in mind, would you like me to work on a fix that ensures zombies are always reaped?


src

@realaravinth
Copy link
Contributor Author

@zeripath gentle ping?


src

@zeripath
Copy link
Contributor

Have you been able to see zombie processes that don't get reaped in Gitea running as PID1 following the patch?

@realaravinth
Copy link
Contributor Author

@zeripath, yes, this is precisely what the test in this pull request demonstrates. You can also repeat that manually following the example in this blog post, which I have verified today to still be relevant with 1.17.1.

Setting a process to be a process group leader is essentially a noop if it is not killed with a negative signal.


src

@zeripath
Copy link
Contributor

zeripath commented Sep 3, 2022

OK I'm not sure that proves that processes are not being reaped eventually and you've not shown it happening in real world usage in Gitea, but I suspect that you're right.

Now whilst kill -PID willl kill and terminate all child processes - My significant caution here is that I worry that it may be normal for processes to fork off and do things and I don't want to stop normal behaviour.

So I think if we are to go down this route we are instead going to have to go down the full init and cgroup technique see:

https://dev.to/__mrvik__/following-slippery-processes-6

Doing this would also require upgrading and improving modules/process to also track OS processes in addition to goroutines/threads/requests


This would have some benefits for people who aren't putting Gtiea as PID 1 as it would allow for proper managing all of the child OS processes created by Gitea. (including kill -9 functionality.)

So that does make it potentially worth doing.

The trouble will be working out if this could be done on Windows or the BSDs - but I guess some linux only functionality is fine.

@realaravinth
Copy link
Contributor Author

...you've not shown it happening in real world usage in Gitea,...

  • There is an automated test in this PRs
  • There are manual steps in the blog post to reproduce the problem with all existing releases
  • Since setting the process group of a process and not using kill(-signal) is a noop in this context, the previous issues reporting zombies are not resolved

I may have missed something and I'd be grateful if you could tell me what it is so I can provide a reproducer that is exactly right to unambiguously reproduce the issue.

My significant caution here is that I worry that it may be normal for processes to fork off and do things and I don't want to stop normal behaviour.

I don't think Gitea launches such processes. But even if it did, the first thing it would do is to set its own process group and would not be impacted by kill -N. This is what daemons do: fork(), setsid(), exec() and live their life. This dates back decades before cgroup existed: it is simple and effective. This is what...

I'd be happy to provide more if this is not convincing.

But maybe daemons are not what you have in mind when writing "fork off and do things"?

...go down the full init and cgroup technique...

This is a very interesting project and would fit well with the Gitea process manager, cron jobs, etc. But the magnitude of this effort is quite different from a rather simple patch to fix a reproducible bug.


src

@realaravinth
Copy link
Contributor Author

@zeripath For the record there are new reports of zombies. I'm still willing to work on a fix, but before I do we should get on the same page regarding how to approach the problem. Would you be so kind as to answer the questions above? It would help me figure out what to do about this :-)


src

@realaravinth
Copy link
Contributor Author

Please note that #20264 has a tests that demonstrated the root problem. Using dumb-init is a fine workaround until it is fixed, IMHO.


src

@wxiaoguang
Copy link
Contributor

Stale for long time, I guess neither of you is still interested in this problem.

Feel free to reopen it this demo is still needed.

@wxiaoguang wxiaoguang closed this Aug 12, 2023
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Nov 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants