Skip to content

Agent silently hangs forever if hook script doesn't close STDOUT and STDERR #118

Closed
@jcttrll

Description

@jcttrll

Let's suppose you have an ApplicationStart hook like this in your appspec.yml file:

hooks:
  ApplicationStart:
    - location: start.sh
      timeout: 15

start.sh is extremely basic:

#!/bin/bash -e

run.sh &

The call to run.sh could be replaced with any application--a shell script, a Ruby program, a Java program, a native executable, etc. When you deploy the application, the deployment hangs forever, despite the fact you've been sure to background run.sh in the start hook (using &). Upon examining the logs and state of the instance where the deployment is hung, you see:

  1. No errors in /var/log/aws/codedeploy-agent/codedeploy-agent.log, nor any log entries following the start of your script.
  2. No errors in /opt/codedeploy-agent/deployment-root/deployment-logs/codedeploy-agent-deployments.log, nor any log entries following the start of your script.
  3. run.sh (or other application) is happily running in the background.
  4. start.sh has completed (it is not present in the process list).

The deployment will hang forever. It will not time out after 15 seconds, as configured in appspec.yml. It will not time out after the default hook timeout of 3600 seconds (one hour). The status in the AWS Console will show the hook event as "Pending" for a considerable time, then eventually "Failed"; however, no logs will be available for the failed event. No errors will appear in the deployment log or the agent log on the instance. Any further deployments to the instance will fail, as the agent is completely hung.

The problem is that the STDOUT and STDERR streams from start.sh have been left open, and the CodeDeploy agent starts a separate thread to watch each of those streams; the threads pump the outputs into the deployment log. After the hook script completes, the agent joins those threads with no timeout, so it will wait forever for STDOUT and STDERR to close (and no one will ever close them). This does not just affect ApplicationStart hooks, but any hook. See hook_executor.rb.

This is not expected behavior. A silent hang is one of the least desirable outcomes of any program. Interestingly, it appears the stream-pump thread code in hook_executor.rb has no unit tests around it (if you comment out all four lines dealing with the threading, rake test still succeeds).

I suggest that the CodeDeploy agent should:

  1. Honor the timeout configured (or the default of 3600 seconds).
  2. Log an error to the agent log if STDOUT or STDERR is left open after the timeout expires, and throw an exception.
  3. If possible, trigger the event viewer in the AWS Console to show a link to documentation explaining the problem and how to avoid it.

Notes:

In this example, the streams are left open because although run.sh is launched in the background, it inherits STDOUT and STDERR from its parent, start.sh. This can be avoided by redirecting the run.sh streams, like:

run.sh >run.out 2>run.err &

...or:

run.sh >/dev/null 2>&1 &

...or any number of other ways. STDOUT and STDERR will then be closed when start.sh exits. Although the example given was simplified greatly, it's a real-world issue that caused significant problems simply because no one expected CodeDeploy to hang forever from something like this. Seeing an error message in the log or the AWS Console would have allowed us to easily find and fix the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions