fix: prevent sub-batch 413's from blocking whole batch #972

yaauie · 2020-10-06T18:13:13Z

The Http Client breaks batches of actions into sub-batches that are up to 20MB
in size, sending larger actions as batches-of-one, and zips the responses
together to emulate a single batch response from the Elasticsearch API.

When an individual HTTP request is rejected by Elasticsearch (or by an
intermediate proxy or load-balancer) with an HTTP/1.1 413, we can emulate
the error response instead of blowing an exception through to the whole batch.

andsel

I was wondering if it's feasible to add a unit test that verify the new behavior introduced, overwriting TARGET_BULK_BYTES doubling the LogStash::Outputs::ElasticSearch::HttpClient::Pool client to return a 413 on the post request.

yaauie · 2020-10-30T17:07:19Z

@andsel I've added specs and improved logging some more.

andsel

I've noticed only a minor repetition in the Changelog.
The changes to me seems OK, but I don't understand why the it fails on Travis

CHANGELOG.md

andsel

LGTM

The Http Client breaks batches of actions into sub-batches that are up to 20MB in size, sending larger actions as batches-of-one, and zips the responses together to emulate a single batch response from the Elasticsearch API. When an individual HTTP request is rejected by Elasticsearch (or by an intermediate proxy or load-balancer) with an HTTP/1.1 413, we can emulate the error response instead of blowing an exception through to the whole batch. This allows only the offending events/actions to be subject to retry logic. Along the way, we improve logging at the `debug` level for sub-batches, and emit clear `warn`-level logs with payload sizes when we hit HTTP 413 rejections.

yaauie · 2021-04-02T17:26:22Z

Reviving, and expanding the scope to address a situation where we could send a <20MB payload that expanded beyond Elasticsearch's 100MB buffers (#823). Bulk grouping is now entirely determined by decompressed size.

yaauie · 2021-04-03T21:50:10Z

Failing tests on 8.0-SNAPSHOT are unrelated -- it looks like integration tests do some cleanup of ILM policies without first cleaning up the indices governed by those policies, which will be an error in 8.x. I'll see if fixing is trivial, or will file a separate issue to chase that down outside this PR.

andsel

LGTM

The default value of Elasticsearch's `action.destructive_requires_rename` has changed to true in elastic/elasticsearch#66908 which causes our integration tests' wildcard deletes to fail. By specifying this config explicitly, we ensure the desired behaviour is selected.

yaauie · 2021-04-06T20:40:57Z

Theoretically have a fix in place for the failing integration tests against ES 8 snapshot -- They recently made a change to the default value of action.destructive_requires_name to true in elastic/elasticsearch#66908 , which causes our attempt to wildcard-delete the indices prior to removal of the ILM policies fail.

Once CI goes green, I plan to merge.

yaauie force-pushed the sub-batch-failures branch from 3f1bf7b to 13fd84b Compare October 6, 2020 19:47

yaauie assigned elasticsearch-bot Oct 26, 2020

andsel assigned andsel and unassigned elasticsearch-bot Oct 27, 2020

andsel reviewed Oct 27, 2020

View reviewed changes

yaauie force-pushed the sub-batch-failures branch 3 times, most recently from c6a1006 to 2cd4843 Compare October 30, 2020 15:25

yaauie force-pushed the sub-batch-failures branch 2 times, most recently from 6475020 to 0f601cd Compare November 6, 2020 21:13

andsel reviewed Nov 9, 2020

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

yaauie force-pushed the sub-batch-failures branch from 0f601cd to 9a43f31 Compare November 11, 2020 15:51

andsel self-requested a review November 12, 2020 08:19

andsel approved these changes Nov 12, 2020

View reviewed changes

andsel assigned yaauie and unassigned andsel Jan 15, 2021

roaksoax added the int-shortlist label Apr 2, 2021

yaauie added 2 commits April 2, 2021 15:08

size batch by _decompressed_ payload size

b2ef074

yaauie force-pushed the sub-batch-failures branch from d970253 to b2ef074 Compare April 2, 2021 17:24

yaauie requested a review from andsel April 2, 2021 17:26

andsel approved these changes Apr 6, 2021

View reviewed changes

roaksoax added the status:approved label Apr 6, 2021

yaauie merged commit b9c66b5 into logstash-plugins:master Apr 6, 2021

yaauie deleted the sub-batch-failures branch April 6, 2021 23:09

This was referenced Nov 25, 2021

Compression should be done on splitted batches #937

Closed

Resolve 413 Payload Too Large #786

Closed

kares mentioned this pull request Feb 7, 2022

Encountered a retryable error. Will Retry with exponential backoff #823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent sub-batch 413's from blocking whole batch #972

fix: prevent sub-batch 413's from blocking whole batch #972

Uh oh!

yaauie commented Oct 6, 2020

Uh oh!

andsel left a comment

Uh oh!

yaauie commented Oct 30, 2020

Uh oh!

andsel left a comment

Uh oh!

Uh oh!

andsel left a comment

Uh oh!

yaauie commented Apr 2, 2021

Uh oh!

yaauie commented Apr 3, 2021

Uh oh!

andsel left a comment

Uh oh!

yaauie commented Apr 6, 2021

Uh oh!

Uh oh!

fix: prevent sub-batch 413's from blocking whole batch #972

fix: prevent sub-batch 413's from blocking whole batch #972

Uh oh!

Conversation

yaauie commented Oct 6, 2020

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

yaauie commented Oct 30, 2020

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

yaauie commented Apr 2, 2021

Uh oh!

yaauie commented Apr 3, 2021

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

yaauie commented Apr 6, 2021

Uh oh!

Uh oh!