Skip to content

Add a CircularBuffer in AgentSink #36

Open
@jaredcnance

Description

@jaredcnance

Description

Currently, if the agent is down or has not started, metrics can be dropped. It's currently up to the caller of logger.flush to handle retries. There are 2 options:

  1. Backpressure the caller of logger.flush. This could negatively impact request latencies.
  2. On error, enqueue to a circular buffer. The trick here is we will need to retry this queue on an interval which changes the model from an async/await to a purely async one. This is a departure from the current design and will need to be turned on via feature flag.

The symptoms of this are:

  1. The first metrics during initialization of the app may not appear
  2. The following error message will be in your app logs:
(node:1) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 172.17.0.2:25888
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1106:14)

Tasks

  • Add type AgentSinkOptions with
    • RetryStrategy parameter where the default value is None for backwards compatibility with a single option to start with: ExponentialBackoffRetryStrategy (see also: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)
    • AsyncBehavior parameter that controls whether the call should block or not. In the former case we keep the current behavior and in the latter we return immediately, enqueuing to the retry buffer on failure.
  • Change AgentSink's constructor to constructor(options: AgentSinkOptions, ISerializer: serializer).
  • Add RetryStrategies which the AgentSink uses based on its configuration. NoRetry propagates errors back to the caller of flush which maintains current behavior today. ExponentialRetry (which can be configured by the application) will block flush on the first attempt, enqueuing to a CircularBuffer (whose size is also configurable) on failures.
  • On startup, setInterval will be set to check the size of the CircularBuffer and retry failed requests asynchronously.
  • Add shutdown method to gracefully shutdown and block on any outstanding requests.

Example Usage

AWS_EMF_AGENT_RETRY_STRATEGY="ExponentialBackoff"
// or
Configuration.agentRetryStrategy = RetryStrategy.ExponentialBackoff;
// or 
Configuration.agentRetryStrategy = (...) => customRetryStratgy();

// ...
await logger.flush();
// execution control is returned when logs have been successfully flushed or enqueued for retry

Open Question

  • Should we change logger.flush() to enqueue and return immediately? This would allow us to make flush() a synchronous operation in all cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestin progressSomeone is actively working on this issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions