Skip to content

Agent silently hangs forever if hook script doesn't close STDOUT and STDERR #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jcttrll opened this issue Jun 21, 2017 · 11 comments
Closed
Labels

Comments

@jcttrll
Copy link
Contributor

jcttrll commented Jun 21, 2017

Let's suppose you have an ApplicationStart hook like this in your appspec.yml file:

hooks:
  ApplicationStart:
    - location: start.sh
      timeout: 15

start.sh is extremely basic:

#!/bin/bash -e

run.sh &

The call to run.sh could be replaced with any application--a shell script, a Ruby program, a Java program, a native executable, etc. When you deploy the application, the deployment hangs forever, despite the fact you've been sure to background run.sh in the start hook (using &). Upon examining the logs and state of the instance where the deployment is hung, you see:

  1. No errors in /var/log/aws/codedeploy-agent/codedeploy-agent.log, nor any log entries following the start of your script.
  2. No errors in /opt/codedeploy-agent/deployment-root/deployment-logs/codedeploy-agent-deployments.log, nor any log entries following the start of your script.
  3. run.sh (or other application) is happily running in the background.
  4. start.sh has completed (it is not present in the process list).

The deployment will hang forever. It will not time out after 15 seconds, as configured in appspec.yml. It will not time out after the default hook timeout of 3600 seconds (one hour). The status in the AWS Console will show the hook event as "Pending" for a considerable time, then eventually "Failed"; however, no logs will be available for the failed event. No errors will appear in the deployment log or the agent log on the instance. Any further deployments to the instance will fail, as the agent is completely hung.

The problem is that the STDOUT and STDERR streams from start.sh have been left open, and the CodeDeploy agent starts a separate thread to watch each of those streams; the threads pump the outputs into the deployment log. After the hook script completes, the agent joins those threads with no timeout, so it will wait forever for STDOUT and STDERR to close (and no one will ever close them). This does not just affect ApplicationStart hooks, but any hook. See hook_executor.rb.

This is not expected behavior. A silent hang is one of the least desirable outcomes of any program. Interestingly, it appears the stream-pump thread code in hook_executor.rb has no unit tests around it (if you comment out all four lines dealing with the threading, rake test still succeeds).

I suggest that the CodeDeploy agent should:

  1. Honor the timeout configured (or the default of 3600 seconds).
  2. Log an error to the agent log if STDOUT or STDERR is left open after the timeout expires, and throw an exception.
  3. If possible, trigger the event viewer in the AWS Console to show a link to documentation explaining the problem and how to avoid it.

Notes:

In this example, the streams are left open because although run.sh is launched in the background, it inherits STDOUT and STDERR from its parent, start.sh. This can be avoided by redirecting the run.sh streams, like:

run.sh >run.out 2>run.err &

...or:

run.sh >/dev/null 2>&1 &

...or any number of other ways. STDOUT and STDERR will then be closed when start.sh exits. Although the example given was simplified greatly, it's a real-world issue that caused significant problems simply because no one expected CodeDeploy to hang forever from something like this. Seeing an error message in the log or the AWS Console would have allowed us to easily find and fix the problem.

@hughesjj
Copy link

hughesjj commented Apr 9, 2018

This is definitely annoying. Here's hoping for a fix, or more prominent documentation. I found this on the troubleshooting page, not in the intro or walkthrough.

@yamasaki760
Copy link

Thank you for this!!

nckackerman added a commit to nckackerman/spring-hello-world that referenced this issue Apr 29, 2018
@sharuzzaman
Copy link

I think this issue should be labeled as bug, not a question

@rohkat-aws rohkat-aws added bug and removed question labels Jun 12, 2018
@rohkat-aws
Copy link
Contributor

@sharuzzaman thanks for pointing it out

@rohkat-aws
Copy link
Contributor

#119 .
This pull request is approved and will be included soon ,hence closing this.

rohkat-aws added a commit that referenced this issue Jun 20, 2018
Fix for #118 (agent hangs if STDOUT/STDERR left open by hook)
@adamsipos
Copy link

I don't think this has been fixed, I ran into this very issue today.

@jamescarignan
Copy link

I also encountered this today, using agent version 1.1597.

@AgarwalMilan
Copy link

This seems to have been fixed, you have to redirect stdout, stderr, and stdin to /dev/null

Add below string before &

/dev/null 2> /dev/null < /dev/null

Here is the detailed description from AWS docs:
https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-deployments.html#troubleshooting-long-running-processes

@abdulhub
Copy link

This seems to have been fixed, you have to redirect stdout, stderr, and stdin to /dev/null

Add below string before &

/dev/null 2> /dev/null < /dev/null

Here is the detailed description from AWS docs:
https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-deployments.html#troubleshooting-long-running-processes

Thank you. I also had a similar issue and got fixed.

@moloch--
Copy link

Does anyone have an example "fix" for Windows? I say "fix" because this is clearly a deficiency in the behavior of the agent.

@philstrong
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests