-
Notifications
You must be signed in to change notification settings - Fork 1.2k
log unsuccessful shards in failed scrolls #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@tommyzli Unfortunately I did not find the information of shards/nodes that were not successful to answer using scroll API. I may be forgetting something, but only the successful/total shards counts is presented in the raw response. The only exception where I could see scroll API send information about nodes that do not return successfully is when the _scroll_id request has answered initially with all shards successfully (for example 3/3) and when consuming the scroll API shards become unavailable because a node is unreachable (2/3 primary shards available in example bewlow). Is that what you are referring to?
|
@bartier yeah, the case I saw was that a shard failed after already scrolling through a few pages. I'm thinking the code should check if error messages were included in the response and log them if so. |
Did this ever got resolved ? I’m running to the same issue. |
Ok, after debugging this issue for few days , splitting shards and adding nodes we found out the the main issue was heapsize on JVM size. It was using the default of 1GB instead of 32 as the rest of the nodes. When we saw it first: debug: Cluster version : 7.8.0 Solution: Configured heapsize under jvm options to 32gb ram and reload the elastic service. |
Elasticsearch version (
bin/elasticsearch --version
): 7.6.1elasticsearch-py
version (elasticsearch.__versionstr__
): 7.5.1Description of the problem including expected versus actual behavior:
The scan() helper function only logs the number of successful vs failed shards. It would be helpful to also log the shards that failed, so I can quickly jump onto the node and grab the appropriate server logs. That data is a part of the response, but gets thrown away by the client.
Steps to reproduce:
A call to
scan(client, query, raise_on_error=True)
fails and throwsScanError("Scroll request has only succeeded on 9 (+0 skiped) shards out of 10.")
Proposed error:
ScanError("Scroll request has only succeeded on 9 (+0 skipped) shards out of 10. First failure: node 'foo', shard 'bar', reason 'reason'")
The text was updated successfully, but these errors were encountered: