-
Notifications
You must be signed in to change notification settings - Fork 832
Added shuffle sharding support to generate a subring #3090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added shuffle sharding support to generate a subring #3090
Conversation
/cc @thorfour (original author of #1947) and @harry671003, who's been poking around in this area. |
pkg/ring/ring.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to check: in the case where we're not using zones, zone = ingesterID. Will this algorithm end distributing all users to all ingesters in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case where we're not using zones, zone = ingesterID
It was in the initial implementation but was soon fixed in #2404. When not using zones, the zone
is an empty string, which works fine here (all ingesters belong to the same zone).
However, I'm wondering if we should add a boolean flag to the ring to explicitly enable zone awareness, disabled by default instead of just relying on the fact zones are set (think about a rolling update adding zones to ingesters, you may want to rollout all ingesters first and then enable it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents is if it doesn't add too much complexity, it's probably better to favor rollout safety for changing the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree @ranton256. Will do it in a separate PR (since it's unrelated from the work done in this PR).
I guess I don't understand this comment. Can you elaborate on what this problem is? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
pkg/ring/ring.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "probabilistic hashing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uniqueness is not guaranteed, but it offers a low collision probability. What we're using here is a probabilistic data structure.
If we pick nodes following the "next" tokens in the ring (like |
So is the value of this for really low tenant counts to make it more likely to evenly distribute load? Because the example of 2 tenants overlapping will always be true at large enough tenant counts. |
No. It works very well with a large number of tenants too (assuming a reasonably large number of nodes too, but order of magnitude smaller then tenants).
It depends on the cluster size. If you have a cluster with 100 ingesters and you randomly pick 6 nodes for each tenant, you end up with 75M possible combinations, and the chances to have overlapping nodes (with shuffle sharding) are:
I created this spreadsheet to easily compute it. For more information, you could take a look at this and the reference implementation linked in the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pracucci thanks for the explanation! LGTM
0629048
to
ea4c258
Compare
# in the per-tenant overrides, a value of 0 disables shuffle sharding for the | ||
# tenant. | ||
# CLI flag: -distribution.ingestion-tenant-shard-size | ||
[ingestion_tenant_shard_size: <int> | default = 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it based on the similar "store_gateway_tenant_shard_size" flag from https://github.com/cortexproject/cortex/pull/3069/files you are planning to name all the similar flags "_tenant_shard_size"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are you planning to take/keep the -experimental off this when you are ready to submit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it based on the similar "store_gateway_tenant_shard_size" flag from https://github.com/cortexproject/cortex/pull/3069/files you are planning to name all the similar flags "_tenant_shard_size"?
Yes, I do. The reason is that overrides are specified within the same YAML node, so we need a way to clearly differentiate them. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are you planning to take/keep the -experimental off this when you are ready to submit?
I would plan to remove the experimental flag once ready to submit, unless you have any concerns. Generally speaking, I think originally adding the experimental prefix to CLI flags was a mistake and shouldn't be done anymore. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean document as experimental but not name the flag experimental? That sounds fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Richard, you're correct (sorry for being unclear). This way, once we're all comfortable marking it stable, it will just be a doc change instead of a config change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There is additional work (caching of user rings) available in https://github.com/pstibrany/cortex/tree/fix-shuffle-sharding branch, not yet tested. |
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
3783b39
to
7131ffc
Compare
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
Signed-off-by: Marco Pracucci <[email protected]>
…ortex#3090) * Added shuffle sharding support to generate a subring Signed-off-by: Marco Pracucci <[email protected]> * Solved all TODOs Signed-off-by: Marco Pracucci <[email protected]> * Simplified unlock Signed-off-by: Marco Pracucci <[email protected]> * Fixed linter Signed-off-by: Marco Pracucci <[email protected]> * Added benchmark Signed-off-by: Marco Pracucci <[email protected]> * Replaced Subring() with ShuffleShard() Signed-off-by: Marco Pracucci <[email protected]> * Small improvements Signed-off-by: Marco Pracucci <[email protected]> * Shortened CHANGELOG entry Signed-off-by: Marco Pracucci <[email protected]> * Make golang doc happy Signed-off-by: Marco Pracucci <[email protected]> * Fixed flag name and added integration test Signed-off-by: Marco Pracucci <[email protected]>
What this PR does:
The current
Subring()
implementation was conceived as an experimental shuffle sharding support in Cortex. It works fine in regards to build a subring of N nodes and guaranteeing stability and consistency but unfortunately suffers two issues:In this PR I proposed an alternative
ShuffleShard()
implementation which is inspired by AWS Route53 infima library and guarantee stability and consistency, while offering nodes shuffling and zone awareness.Will be done in subsequent PRs:
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]