Shuffle-sharding of queriers in the query-frontend #3113

pstibrany · 2020-09-01T08:42:19Z

What this PR does: This PR implements shuffle-sharding of queriers in the query-frontend.

Since each query frontend knows about all queriers that are connecting to it, QF can select N queriers that will be handling requests from specific users.

Implementation details:

Query-Frontend needs to know which queriers are connected, and group multiple connections. Protocol between QF and Q has been extended to support passing querierID. Old queriers will use empty string as ID. If new Querier connects to old Query-Frontend, it will not send its ID at all.
If the same set of queriers is connected to all frontends, these frontends will all select the same subset of queriers for the given user.
Each user has a different subset of queriers.
When searching for next user's queue to handle request from, each querier will fairly iterate between its set of users, in round robin fashion, as before.
Shuffle-sharding is enabled by setting either default (-frontend.max-queriers-per-user) or per-user limit ("max_queriers_per_user").
integration test
testing in dev cluster

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pkg/util/validation/limits.go

docs/configuration/config-file-reference.md

pkg/querier/frontend/frontend.go

joe-elliott · 2020-09-01T13:02:21Z

pkg/querier/frontend/frontend_querier_queues.go

this could roll the same querier multiple times?

Yes it can. Alternative would be to make a copy of queriers, shuffle them and use first maxQueriers of them. Current version avoids this extra allocation at the cost of extra iterations. Not sure which one is better (time for a benchmark?).

I've tried to use shuffling instead of trying random numbers. Updated benchmark shows little less time needed for BenchmarkQueueRequest (well, with +-7%, maybe not), but also little more memory allocations.

name old time/op new time/op delta GetNextRequest-4 59.7µs ± 4% 59.7µs ± 1% ~ (p=0.645 n=10+9) QueueRequest-4 626µs ± 7% 609µs ± 2% ~ (p=0.105 n=10+10) name old alloc/op new alloc/op delta GetNextRequest-4 1.60kB ± 0% 1.60kB ± 0% ~ (all equal) QueueRequest-4 322kB ± 0% 326kB ± 0% +1.24% (p=0.000 n=10+10) name old allocs/op new allocs/op delta GetNextRequest-4 100 ± 0% 100 ± 0% ~ (all equal) QueueRequest-4 1.07k ± 0% 1.12k ± 0% +4.68% (p=0.000 n=10+10)

I'm not sure if I understand this logic. Don't we want to shuffle-shard each user onto n distinct queriers? Does this function n distinct queriers?

"Distinctness" is guaranteed by 1) using a map, 2) condition in the for loop. This loop doesn't end until there are maxQueriers distinct queriers in the map. In addition to that, thanks to len(allSortedQueriers) > maxQueriers precondition checked at the beginning of this method, we know that loop will eventually finish.

Doing this via repeated random selections versus shuffling the array seems like a probably unnecessary optimization at the expense of having a more deterministic outcome in terms of iterations required. I'm not sure it will matter practically, but seems like predictable would be better, meaning shuffle a copy of the array.

You don't have to shuffle the whole array, you can make a copy then from cnt over 0 to N first indices generate a random index between cnt+1 to N-1 inclisive and swap the value in the counter index with the random index so you can't ever select a duplicate, and you don't have to swap more values than the number you need to sellect. This is basically this: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm except I described it from 0 up and they describe it from N-1 down.

I've changed the implementation to use suggested algorithm. (Btw, it's the same what rand.Shuffle() does, but in our case we can stop early, and set elements in returned map without another iteration).

pkg/querier/frontend/frontend_querier_queues.go

joe-elliott · 2020-09-01T13:18:54Z

pkg/querier/frontend/frontend_querier_queues.go

i'm not sure how to quantify the effect, but this could unfairly skip users who don't happen to be assigned to the querier currently asking for a query.

I don’t quite follow here... skipping users who use different queriers is the goal of this PR.

harry671003 · 2020-09-10T06:25:04Z

docs/configuration/config-file-reference.md

+# are connected to all frontends). Note that this only works with queriers
+# connecting to the query-frontend, not when using downstream URL.
+# CLI flag: -frontend.max-queriers-per-user
+[max_queriers_per_user: <int> | default = 0]


In the PR for store-gateway shuffle sharding, a config called sharding_strategy was introduced. Is that something to be added here as well?

Store-gateways can use different sharding strategies. In case of queriers, no sharding was done before (which corresponds to using max_queriers_per_user: 0), and only shuffle-sharding is available (if max_queriers_per_user is greater than 0). We can introduce sharding_strategy option too, but personally I don't see a need for it here.

Food for thought (not having a strong opinion): I agree it's not required here, but could help with config consistency.

Food for thought (not having a strong opinion): I agree it's not required here, but could help with config consistency.

If we want to be consistent with other config options, we should also support "default" (non-shuffle-sharding) strategy with non-zero shard size.

Let's keep it as is for now. We marked this feature as experimental, which will allow us to eventually fine-tune the config before marking it stable.

harry671003 · 2020-09-10T07:24:55Z

pkg/querier/frontend/frontend_querier_queues.go

I'm not sure if I understand this logic. Don't we want to shuffle-shard each user onto n distinct queriers? Does this function n distinct queriers?

docs/configuration/config-file-reference.md

ranton256 · 2020-09-10T15:48:27Z

pkg/querier/frontend/frontend_querier_queues.go

Doing this via repeated random selections versus shuffling the array seems like a probably unnecessary optimization at the expense of having a more deterministic outcome in terms of iterations required. I'm not sure it will matter practically, but seems like predictable would be better, meaning shuffle a copy of the array.

You don't have to shuffle the whole array, you can make a copy then from cnt over 0 to N first indices generate a random index between cnt+1 to N-1 inclisive and swap the value in the counter index with the random index so you can't ever select a duplicate, and you don't have to swap more values than the number you need to sellect. This is basically this: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm except I described it from 0 up and they describe it from N-1 down.

ranton256

Changes look good to me. Thanks for the array shuffling algorithm change.

pkg/querier/frontend/frontend_querier_queues_test.go

CHANGELOG.md

pracucci

Fantastic job 👏 I also appreciated a lot the tests! I left minor comments, but overall LGTM!

I would mention this is experimental in the "v1 guarantees" doc (but not add the experimental CLI flag prefix which we've already seen is a pain) in order to be able to do any breaking change until we got enough confidence running it in production.

I think we also mentioned we want aggressive gRPC keepalive settings for querier->query-frontend, in order to quickly detect "dead" queriers. Could you check current settings (if any) and eventually fine tune it, please?

In a separate PR, I would also work on some doc.

integration/querier_sharding_test.go

pkg/querier/frontend/worker.go

pracucci · 2020-09-16T08:59:40Z

pkg/querier/frontend/frontend.go

getOrAddQueue() can potentially return nil. I would check it here.

I'll add a check with comment that it can only happen if user is "".

pkg/querier/frontend/frontend_querier_queues.go

Signed-off-by: Peter Štibraný <[email protected]>

Modified worker to respond to new GET_ID request type. Signed-off-by: Peter Štibraný <[email protected]>

Signed-off-by: Peter Štibraný <[email protected]>

…riers. Signed-off-by: Peter Štibraný <[email protected]>

Signed-off-by: Peter Štibraný <[email protected]>

This is similar to rnd.Shuffle(), but stops early after selecting enough queriers. Signed-off-by: Peter Štibraný <[email protected]>

Signed-off-by: Peter Štibraný <[email protected]>

…from supplied input. Signed-off-by: Peter Štibraný <[email protected]>

Signed-off-by: Peter Štibraný <[email protected]>

pstibrany · 2020-09-17T06:53:02Z

I would mention this is experimental in the "v1 guarantees" doc (but not add the experimental CLI flag prefix which we've already seen is a pain) in order to be able to do any breaking change until we got enough confidence running it in production.

I think we also mentioned we want aggressive gRPC keepalive settings for querier->query-frontend, in order to quickly detect "dead" queriers. Could you check current settings (if any) and eventually fine tune it, please?

We need to configure these settings on query-frontend. Default values that Cortex uses are:

-server.grpc.keepalive.time=2h
-server.grpc.keepalive.timeout=20s

which means that after 2 hours of no activity on the connection, server will ping the client and wait for 20 seconds for response. On no reply, connection is closed. We can include this information in the docs for people to tune (eg. time=1m/timeout=20s, or something that makes sense for their setup).

In a separate PR, I would also work on some doc.

👍

pstibrany · 2020-09-17T06:54:10Z

@pracucci I've addressed your feedback, please take a look again when time permits. Thanks!

pracucci · 2020-09-17T10:14:00Z

I've addressed your feedback, please take a look again when time permits. Thanks!

LGTM. Thanks to you! 🙏

Signed-off-by: Peter Štibraný <[email protected]>

pull-request-size bot added the size/XL label Sep 1, 2020

pstibrany changed the title ~~Shuffle-sharding of queriers~~ Shuffle-sharding of queriers in the query-frontend Sep 1, 2020

pstibrany requested a review from pracucci September 1, 2020 08:43

joe-elliott reviewed Sep 1, 2020

View reviewed changes

pkg/util/validation/limits.go Outdated Show resolved Hide resolved

joe-elliott reviewed Sep 1, 2020

View reviewed changes

pull-request-size bot added size/XXL and removed size/XL labels Sep 2, 2020

harry671003 reviewed Sep 10, 2020

View reviewed changes

ranton256 reviewed Sep 10, 2020

View reviewed changes

ranton256 reviewed Sep 15, 2020

View reviewed changes

pkg/querier/frontend/frontend_querier_queues_test.go Outdated Show resolved Hide resolved

pracucci reviewed Sep 15, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

pracucci approved these changes Sep 16, 2020

View reviewed changes

pstibrany added 18 commits September 16, 2020 15:39

Added ID to the querier.

9008e16

Signed-off-by: Peter Štibraný <[email protected]>

Extended frontend protocol to add request type.

6947054

Modified worker to respond to new GET_ID request type. Signed-off-by: Peter Štibraný <[email protected]>

Frontend now asks querier for its ID before running process loop.

78c3713

Signed-off-by: Peter Štibraný <[email protected]>

Close gRPC connection when stopping manager.

55d98fe

Signed-off-by: Peter Štibraný <[email protected]>

Shuffle shard queriers between users.

2629b98

Signed-off-by: Peter Štibraný <[email protected]>

Added MaxQueriersPerUser to overrides.

96924dc

Signed-off-by: Peter Štibraný <[email protected]>

Fixes.

2b9dcab

Signed-off-by: Peter Štibraný <[email protected]>

Fix querier.id default value.

d23a825

Signed-off-by: Peter Štibraný <[email protected]>

CHANGELOG.md

813f2f2

Signed-off-by: Peter Štibraný <[email protected]>

Make lint happy.

dd81676

Signed-off-by: Peter Štibraný <[email protected]>

Fix protos.

31cc2d2

Signed-off-by: Peter Štibraný <[email protected]>

Fixed bug in getNextRequestForQuerier, modified benchmarks to add que…

0f0b343

…riers. Signed-off-by: Peter Štibraný <[email protected]>

Fixed spelling.

6ee9f37

Signed-off-by: Peter Štibraný <[email protected]>

Move metrics increment in register/unregister querier connection.

36caae2

Signed-off-by: Peter Štibraný <[email protected]>

When select queriers for tenant, use shuffling.

b90323a

Signed-off-by: Peter Štibraný <[email protected]>

Use rand numbers to find queriers.

736ca66

Signed-off-by: Peter Štibraný <[email protected]>

Fixed docs.

cb9b616

Signed-off-by: Peter Štibraný <[email protected]>

Add integration test for sharding queriers.

45a1f7f

Signed-off-by: Peter Štibraný <[email protected]>

pstibrany added 7 commits September 16, 2020 15:39

Use shuffling for selecting queriers per user.

c66e3da

This is similar to rnd.Shuffle(), but stops early after selecting enough queriers. Signed-off-by: Peter Štibraný <[email protected]>

Updated documentation.

35b81e8

Signed-off-by: Peter Štibraný <[email protected]>

Fixed docs after rebase.

0927013

Signed-off-by: Peter Štibraný <[email protected]>

Unify seed computation with subring PR.

32820ec

Signed-off-by: Peter Štibraný <[email protected]>

Added unit test to verify that selected queriers are unique and come …

fde0808

…from supplied input. Signed-off-by: Peter Štibraný <[email protected]>

Review feedback.

84217e6

Signed-off-by: Peter Štibraný <[email protected]>

Mention shuffle sharding in v1 guarantees.

874a548

Signed-off-by: Peter Štibraný <[email protected]>

Merge branch 'master' into query-frontend-sharding

729cc27

pstibrany added 3 commits September 17, 2020 12:20

Review feedback.

6501e26

Signed-off-by: Peter Štibraný <[email protected]>

Fix flag after master merge.

2058c50

Signed-off-by: Peter Štibraný <[email protected]>

Fixed TestQueuesConsistency test.

78cf6b0

Signed-off-by: Peter Štibraný <[email protected]>

pstibrany merged commit b1ee0aa into cortexproject:master Sep 17, 2020

ranton256 mentioned this pull request Oct 20, 2020

Global limits are not applied correctly when using shuffle sharding #3367

Closed

2 tasks

Shuffle-sharding of queriers in the query-frontend #3113

Shuffle-sharding of queriers in the query-frontend #3113

Uh oh!

Conversation

pstibrany commented Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pstibrany Sep 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pracucci Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ranton256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pstibrany commented Sep 17, 2020

Uh oh!

pstibrany commented Sep 17, 2020

Uh oh!

pracucci commented Sep 17, 2020

Uh oh!

Uh oh!

pstibrany commented Sep 1, 2020 •

edited

Loading

pstibrany Sep 1, 2020 •

edited

Loading

pracucci Sep 16, 2020 •

edited

Loading