-
Notifications
You must be signed in to change notification settings - Fork 186
Agent silently hangs forever if hook script doesn't close STDOUT and STDERR #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is definitely annoying. Here's hoping for a fix, or more prominent documentation. I found this on the troubleshooting page, not in the intro or walkthrough. |
Thank you for this!! |
I think this issue should be labeled as bug, not a question |
@sharuzzaman thanks for pointing it out |
#119 . |
Fix for #118 (agent hangs if STDOUT/STDERR left open by hook)
I don't think this has been fixed, I ran into this very issue today. |
I also encountered this today, using agent version 1.1597. |
This seems to have been fixed, you have to redirect stdout, stderr, and stdin to /dev/null Add below string before &
Here is the detailed description from AWS docs: |
Thank you. I also had a similar issue and got fixed. |
Does anyone have an example "fix" for Windows? I say "fix" because this is clearly a deficiency in the behavior of the agent. |
Let's suppose you have an
ApplicationStart
hook like this in yourappspec.yml
file:start.sh
is extremely basic:The call to
run.sh
could be replaced with any application--a shell script, a Ruby program, a Java program, a native executable, etc. When you deploy the application, the deployment hangs forever, despite the fact you've been sure to backgroundrun.sh
in the start hook (using&
). Upon examining the logs and state of the instance where the deployment is hung, you see:/var/log/aws/codedeploy-agent/codedeploy-agent.log
, nor any log entries following the start of your script./opt/codedeploy-agent/deployment-root/deployment-logs/codedeploy-agent-deployments.log
, nor any log entries following the start of your script.run.sh
(or other application) is happily running in the background.start.sh
has completed (it is not present in the process list).The deployment will hang forever. It will not time out after 15 seconds, as configured in
appspec.yml
. It will not time out after the default hook timeout of 3600 seconds (one hour). The status in the AWS Console will show the hook event as "Pending" for a considerable time, then eventually "Failed"; however, no logs will be available for the failed event. No errors will appear in the deployment log or the agent log on the instance. Any further deployments to the instance will fail, as the agent is completely hung.The problem is that the STDOUT and STDERR streams from
start.sh
have been left open, and the CodeDeploy agent starts a separate thread to watch each of those streams; the threads pump the outputs into the deployment log. After the hook script completes, the agent joins those threads with no timeout, so it will wait forever for STDOUT and STDERR to close (and no one will ever close them). This does not just affectApplicationStart
hooks, but any hook. See hook_executor.rb.This is not expected behavior. A silent hang is one of the least desirable outcomes of any program. Interestingly, it appears the stream-pump thread code in
hook_executor.rb
has no unit tests around it (if you comment out all four lines dealing with the threading,rake test
still succeeds).I suggest that the CodeDeploy agent should:
Notes:
In this example, the streams are left open because although
run.sh
is launched in the background, it inherits STDOUT and STDERR from its parent,start.sh
. This can be avoided by redirecting therun.sh
streams, like:...or:
...or any number of other ways. STDOUT and STDERR will then be closed when
start.sh
exits. Although the example given was simplified greatly, it's a real-world issue that caused significant problems simply because no one expected CodeDeploy to hang forever from something like this. Seeing an error message in the log or the AWS Console would have allowed us to easily find and fix the problem.The text was updated successfully, but these errors were encountered: