Skip to content

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Aug 21, 2020

What this PR does:
In this PR I'm introducing shuffle sharding support to the store-gateway blocks sharding. The idea is that, when sharding is enabled, you can switch between the default sharding or shuffle sharding. When using shuffle sharding, each store-gateway is eligible to sync/load blocks for a subset of tenants.

Notes:

  • I preferred to allow to configure sharding_strategy as string instead of a boolean, to keep the config more generic and open to future additional strategies
  • The implementation in the store-gateway is slightly more tricky than initially planned because we have to make sure that, when a tenant "leaves" a store-gateway shard, blocks are unloaded and metrics correctly updated

Out of the scope of this PR:

  • Fix subring implementation to do real "shuffling" (will be done in a dedicated PR)
  • Trigger a re-sync for an user each time its shard size change (will be done in a dedicated PR because need to introduce events on overrides)

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci requested a review from pstibrany August 21, 2020 11:22
@pracucci pracucci force-pushed the add-shuffle-sharding-to-store-gateway branch 3 times, most recently from 544d490 to 8d9fde7 Compare August 24, 2020 12:22
@pracucci pracucci marked this pull request as ready for review August 24, 2020 12:23
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Took me a while to understand why we're faking Iter response to be empty, if user is not included, but I believe it is to make BucketStore to unload blocks for such user. I wonder if we should just drop such BucketStore completely (after discovering that it has no blocks anymore)? Or would that complicate it even more (logic here is already quite complex).

@pracucci pracucci force-pushed the add-shuffle-sharding-to-store-gateway branch from 8d9fde7 to e55371c Compare August 26, 2020 15:51
@pracucci
Copy link
Contributor Author

Took me a while to understand why we're faking Iter response to be empty, if user is not included, but I believe it is to make BucketStore to unload blocks for such user.

Correct. There's no other way to unload blocks from BucketStore as of today (we could do it, but see below).

I wonder if we should just drop such BucketStore completely (after discovering that it has no blocks anymore)? Or would that complicate it even more (logic here is already quite complex).

Let's say we add a way to BucketStore to "close it" unloading all blocks (eg. deleting them from disk). Even if we have it, it's tricky to deal with BucketStore and MetadataFetcher metrics. An option could be creating metrics structs that we pass to BucketStore and MetadataFetcher constructors, so that we retain them for each tenant even after unloading and re-use in case the same tenant will be re-added to the shard.

Technically it's doable, but I'm just wondering if it's worth the effort (will the final code be cleaner?).

Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall just a nit on what looks like a comment typo.

Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci pracucci force-pushed the add-shuffle-sharding-to-store-gateway branch from e55371c to 2585102 Compare August 27, 2020 07:12
@pracucci pracucci merged commit f1b95c6 into cortexproject:master Aug 27, 2020
@pracucci pracucci deleted the add-shuffle-sharding-to-store-gateway branch August 27, 2020 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants