-
Notifications
You must be signed in to change notification settings - Fork 832
Shuffle-sharding of queriers in the query-frontend #3113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffle-sharding of queriers in the query-frontend #3113
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could roll the same querier multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can. Alternative would be to make a copy of queriers, shuffle them and use first maxQueriers
of them. Current version avoids this extra allocation at the cost of extra iterations. Not sure which one is better (time for a benchmark?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to use shuffling instead of trying random numbers. Updated benchmark shows little less time needed for BenchmarkQueueRequest
(well, with +-7%, maybe not), but also little more memory allocations.
name old time/op new time/op delta
GetNextRequest-4 59.7µs ± 4% 59.7µs ± 1% ~ (p=0.645 n=10+9)
QueueRequest-4 626µs ± 7% 609µs ± 2% ~ (p=0.105 n=10+10)
name old alloc/op new alloc/op delta
GetNextRequest-4 1.60kB ± 0% 1.60kB ± 0% ~ (all equal)
QueueRequest-4 322kB ± 0% 326kB ± 0% +1.24% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
GetNextRequest-4 100 ± 0% 100 ± 0% ~ (all equal)
QueueRequest-4 1.07k ± 0% 1.12k ± 0% +4.68% (p=0.000 n=10+10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand this logic. Don't we want to shuffle-shard each user onto n distinct queriers? Does this function n distinct queriers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Distinctness" is guaranteed by 1) using a map, 2) condition in the for loop. This loop doesn't end until there are maxQueriers
distinct queriers in the map. In addition to that, thanks to len(allSortedQueriers) > maxQueriers
precondition checked at the beginning of this method, we know that loop will eventually finish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this via repeated random selections versus shuffling the array seems like a probably unnecessary optimization at the expense of having a more deterministic outcome in terms of iterations required. I'm not sure it will matter practically, but seems like predictable would be better, meaning shuffle a copy of the array.
You don't have to shuffle the whole array, you can make a copy then from cnt over 0 to N first indices generate a random index between cnt+1 to N-1 inclisive and swap the value in the counter index with the random index so you can't ever select a duplicate, and you don't have to swap more values than the number you need to sellect. This is basically this: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm except I described it from 0 up and they describe it from N-1 down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the implementation to use suggested algorithm. (Btw, it's the same what rand.Shuffle()
does, but in our case we can stop early, and set elements in returned map without another iteration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure how to quantify the effect, but this could unfairly skip users who don't happen to be assigned to the querier currently asking for a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t quite follow here... skipping users who use different queriers is the goal of this PR.
# are connected to all frontends). Note that this only works with queriers | ||
# connecting to the query-frontend, not when using downstream URL. | ||
# CLI flag: -frontend.max-queriers-per-user | ||
[max_queriers_per_user: <int> | default = 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR for store-gateway shuffle sharding, a config called sharding_strategy
was introduced. Is that something to be added here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store-gateways can use different sharding strategies. In case of queriers, no sharding was done before (which corresponds to using max_queriers_per_user: 0
), and only shuffle-sharding is available (if max_queriers_per_user
is greater than 0). We can introduce sharding_strategy option too, but personally I don't see a need for it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought (not having a strong opinion): I agree it's not required here, but could help with config consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought (not having a strong opinion): I agree it's not required here, but could help with config consistency.
If we want to be consistent with other config options, we should also support "default" (non-shuffle-sharding) strategy with non-zero shard size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it as is for now. We marked this feature as experimental, which will allow us to eventually fine-tune the config before marking it stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand this logic. Don't we want to shuffle-shard each user onto n distinct queriers? Does this function n distinct queriers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this via repeated random selections versus shuffling the array seems like a probably unnecessary optimization at the expense of having a more deterministic outcome in terms of iterations required. I'm not sure it will matter practically, but seems like predictable would be better, meaning shuffle a copy of the array.
You don't have to shuffle the whole array, you can make a copy then from cnt over 0 to N first indices generate a random index between cnt+1 to N-1 inclisive and swap the value in the counter index with the random index so you can't ever select a duplicate, and you don't have to swap more values than the number you need to sellect. This is basically this: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm except I described it from 0 up and they describe it from N-1 down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Thanks for the array shuffling algorithm change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic job 👏 I also appreciated a lot the tests! I left minor comments, but overall LGTM!
I would mention this is experimental in the "v1 guarantees" doc (but not add the experimental
CLI flag prefix which we've already seen is a pain) in order to be able to do any breaking change until we got enough confidence running it in production.
I think we also mentioned we want aggressive gRPC keepalive settings for querier->query-frontend, in order to quickly detect "dead" queriers. Could you check current settings (if any) and eventually fine tune it, please?
In a separate PR, I would also work on some doc.
pkg/querier/frontend/frontend.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOrAddQueue()
can potentially return nil. I would check it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a check with comment that it can only happen if user is ""
.
Signed-off-by: Peter Štibraný <[email protected]>
Modified worker to respond to new GET_ID request type. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
…riers. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
This is similar to rnd.Shuffle(), but stops early after selecting enough queriers. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
…from supplied input. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
We need to configure these settings on query-frontend. Default values that Cortex uses are:
which means that after 2 hours of no activity on the connection, server will ping the client and wait for 20 seconds for response. On no reply, connection is closed. We can include this information in the docs for people to tune (eg. time=1m/timeout=20s, or something that makes sense for their setup).
👍 |
@pracucci I've addressed your feedback, please take a look again when time permits. Thanks! |
LGTM. Thanks to you! 🙏 |
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
What this PR does: This PR implements shuffle-sharding of queriers in the query-frontend.
Since each query frontend knows about all queriers that are connecting to it, QF can select N queriers that will be handling requests from specific users.
Implementation details:
Query-Frontend needs to know which queriers are connected, and group multiple connections. Protocol between QF and Q has been extended to support passing querierID. Old queriers will use empty string as ID. If new Querier connects to old Query-Frontend, it will not send its ID at all.
If the same set of queriers is connected to all frontends, these frontends will all select the same subset of queriers for the given user.
Each user has a different subset of queriers.
When searching for next user's queue to handle request from, each querier will fairly iterate between its set of users, in round robin fashion, as before.
Shuffle-sharding is enabled by setting either default (
-frontend.max-queriers-per-user
) or per-user limit ("max_queriers_per_user").integration test
testing in dev cluster
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]