Open
Description
Description
Currently, if the agent is down or has not started, metrics can be dropped. It's currently up to the caller of logger.flush
to handle retries. There are 2 options:
- Backpressure the caller of
logger.flush
. This could negatively impact request latencies. - On error, enqueue to a circular buffer. The trick here is we will need to retry this queue on an interval which changes the model from an async/await to a purely async one. This is a departure from the current design and will need to be turned on via feature flag.
The symptoms of this are:
- The first metrics during initialization of the app may not appear
- The following error message will be in your app logs:
(node:1) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 172.17.0.2:25888
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1106:14)
Tasks
- Add type
AgentSinkOptions
withRetryStrategy
parameter where the default value isNone
for backwards compatibility with a single option to start with:ExponentialBackoffRetryStrategy
(see also: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)AsyncBehavior
parameter that controls whether the call should block or not. In the former case we keep the current behavior and in the latter we return immediately, enqueuing to the retry buffer on failure.
- Change AgentSink's constructor to
constructor(options: AgentSinkOptions, ISerializer: serializer)
. - Add RetryStrategies which the AgentSink uses based on its configuration.
NoRetry
propagates errors back to the caller offlush
which maintains current behavior today.ExponentialRetry
(which can be configured by the application) will blockflush
on the first attempt, enqueuing to aCircularBuffer
(whose size is also configurable) on failures. - On startup,
setInterval
will be set to check the size of theCircularBuffer
and retry failed requests asynchronously. - Add
shutdown
method to gracefully shutdown and block on any outstanding requests.
Example Usage
AWS_EMF_AGENT_RETRY_STRATEGY="ExponentialBackoff"
// or
Configuration.agentRetryStrategy = RetryStrategy.ExponentialBackoff;
// or
Configuration.agentRetryStrategy = (...) => customRetryStratgy();
// ...
await logger.flush();
// execution control is returned when logs have been successfully flushed or enqueued for retry
Open Question
- Should we change
logger.flush()
to enqueue and return immediately? This would allow us to makeflush()
a synchronous operation in all cases.