-
Notifications
You must be signed in to change notification settings - Fork 832
Add topology aware read #3414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add topology aware read #3414
Conversation
9deb837
to
0bf804a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @MichelHollands! I think we're on the good track. I left you many comments, but most are small, while GetAll()
will need a bit more work.
That being said, I would be glad if you could add 1 integration test. I would test this scenario:
- Start a cortex cluster with 6 ingesters and zone-awareness enabled
- Push 100 series
- Query back 100 series > all good
- SIGKILL 1 ingester in 1 zone
- Query back 100 series > all good
- SIGKILL 1 more ingester in the same zone
- Query back 100 series > all good
- SIGKILL 1 more ingester in a different zone
- Query back 100 series > fail
ebf35f9
to
6cb4418
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichelHollands for addressing my feedback! I left few more comments. Please make sure that any fix you do is also covered by a unit test to avoid any regression (either in this PR or future changes).
56f66cb
to
05dad8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised that there are places where triggering the zone-awareness logic in the GetAll()
is not correct, because it will filter out even healthy instances (eg. NewRingServiceDiscovery()
, distributor.GetAll()
, etc..). Let's talk offline about this.
51e087e
to
fed940f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review pass. I need to get better understanding of the tests.
5ddb6d6
to
fe09cca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. I still need to review TestReplicationSet_Do
and integration test.
pkg/ring/ring.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If RF=2 and there are two zones, then minSuccessZones
will end up being 2. I would expect that we can actually tolerate one failing zone in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a special case for rf=2 and nr of zones =2/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure anymore if my previous statement is correct. Reason is that during write, we would accept it if we get single OK. But during read, we don't know which single zone has received the sample, so we need to ask both.
I'd check with @pracucci to be 100% sure. But now I think my previous comment here was incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the current status of this conversation?
DefaultReplicationStrategy.Filter()
calculate minSuccess := (replicationFactor / 2) + 1
. The minSuccess
is the write quorum. If RF=2, then we need to write to both ingesters in order to succeed. Because of this, it's safe to have maxUnavailableZones=1
when RF=2
.
I suggest changing the logic as follow:
maxUnavailableZones = minSuccessZones - 1
Tests should be updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the current status of this conversation?
I convinced myself that with RF=2, it is enough to write to single ingester. But that's not a majority 🤦
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]> Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
9d50812
to
75a5dc6
Compare
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
pkg/ring/ring.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the current status of this conversation?
DefaultReplicationStrategy.Filter()
calculate minSuccess := (replicationFactor / 2) + 1
. The minSuccess
is the write quorum. If RF=2, then we need to write to both ingesters in order to succeed. Because of this, it's safe to have maxUnavailableZones=1
when RF=2
.
I suggest changing the logic as follow:
maxUnavailableZones = minSuccessZones - 1
Tests should be updated accordingly.
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichelHollands for patiently addressing my feedback. Well done, LGTM! 👏
logger.Log("Killing", s.name) | ||
|
||
if out, err := RunCommandAndGetOutput("docker", "stop", "--time=0", s.containerName()); err != nil { | ||
if out, err := RunCommandAndGetOutput("docker", "kill", s.containerName()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changing from stop to kill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kill()
function, used by end-to-end tests, is expected to send a SIGKILL
. We were using docker stop --time=0
, which is expected to send a SIGTERM and then a SIGKILL
after time 0
, but we wondered if there could be some timing issues and a graceful shutdown of the stopped process could actually happen (or at least start). To make it more obvious we're going to send a SIGKILL
, we decided to just use docker kill
.
// 1 failing ingester. Due to how replication works when zone-awareness is | ||
// enabled (data is replicated to RF different zones), there's no benefit in | ||
// querying healthy instances from "failing zones". A zone is considered | ||
// failed if there is single error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the logic behind this.
If I have 1 failed replica in zone A, and 1 failed in zone B, and none failed in zone C, won't this cause an outage for some timeseries when there would have been none in some cases(i.e. the instance in A and the instance in B didn't host a replica of any of the same series). With this logic of removing everything from a zone if there is a failure, won't this make it more likely to have an outage in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends.
Assuming RF=3, if -distributor.shard-by-all-labels
is disabled, then this function is not used and we only query the 3 ingesters holding all the series for the queried metric (and we just need 2 success out of 3 queried ingesters).
However, if -distributor.shard-by-all-labels
is enabled, we don't know which ingesters hold all the series for a given metric name, so we need to query all of them. When zone-awareness is disabled, we need to query all ingester minus 1, while when zone-awareness is enabled we need to query all ingesters in all zones minus 1 (thus tolerating all ingesters in 1 zone failing).
About shuffle sharding, this function is called after the subring has been selected.
Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I follow now, and am convinced the logic is correct, but maybe less enthusiastic about some of the implications, but since the value of all ingesters when -distributor.shard-by-all-labels is enabled is the subring seems more reasonable.
* Add unit tests Signed-off-by: Michel Hollands <[email protected]> * Add lock to avoid race condition in test Signed-off-by: Michel Hollands <[email protected]> * Add zone aware to GetAll Signed-off-by: Michel Hollands <[email protected]> * Remove debug print statements Signed-off-by: Michel Hollands <[email protected]> * Add changelog entry Signed-off-by: Michel Hollands <[email protected]> * Fix comment Signed-off-by: Michel Hollands <[email protected]> * Address replication set review comments Signed-off-by: Michel Hollands <[email protected]> * Reword and change changelong entry to enhancement Signed-off-by: Michel Hollands <[email protected]> * Address review comments in ring code Signed-off-by: Michel Hollands <[email protected]> * Do not return early and add more test cases Signed-off-by: Michel Hollands <[email protected]> * Rename ingesters to instances Signed-off-by: Michel Hollands <[email protected]> * Add one more test Signed-off-by: Michel Hollands <[email protected]> * Update pkg/ring/replication_set.go Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Michel Hollands <[email protected]> * Update pkg/ring/replication_set.go : add sign off Signed-off-by: Michel Hollands <[email protected]> * Add integration test Signed-off-by: Michel Hollands <[email protected]> * Fix imports as per goimports Signed-off-by: Michel Hollands <[email protected]> * Address review comments and add extra tests Signed-off-by: Michel Hollands <[email protected]> * Fix rebase Signed-off-by: Michel Hollands <[email protected]> * Fix rebase in test Signed-off-by: Michel Hollands <[email protected]> * Add lock around mockIngester call Signed-off-by: Michel Hollands <[email protected]> * Add lock around mockIngester call at correct place Signed-off-by: Michel Hollands <[email protected]> * Handle nr of zones > replication factor Signed-off-by: Michel Hollands <[email protected]> * Use util.Min instead of if statement Signed-off-by: Michel Hollands <[email protected]> * Update pkg/ring/replication_set_test.go Co-authored-by: Peter Štibraný <[email protected]> Signed-off-by: Michel Hollands <[email protected]> * Use atomic and sets Signed-off-by: Michel Hollands <[email protected]> * Fixed integration test and ReplicationSet.Do() Signed-off-by: Marco Pracucci <[email protected]> * Added tracker unit tests Signed-off-by: Marco Pracucci <[email protected]> * Fixed TestReplicationSet_Do Signed-off-by: Marco Pracucci <[email protected]> * Commented ReplicationSet max errors and max unavailable zones Signed-off-by: Marco Pracucci <[email protected]> * Fixed GetReplicationSetForOperation() logic and improved unit tests Signed-off-by: Marco Pracucci <[email protected]> * Improved tests Signed-off-by: Marco Pracucci <[email protected]> * Fixed tests flakyness Signed-off-by: Marco Pracucci <[email protected]> * Fixed test Signed-off-by: Marco Pracucci <[email protected]> * Update documentation Signed-off-by: Michel Hollands <[email protected]> * Add note about reads from zone aware clusters Signed-off-by: Michel Hollands <[email protected]> * Remove extra space Signed-off-by: Michel Hollands <[email protected]> * Address some of Peter's review comment Signed-off-by: Michel Hollands <[email protected]> * Add special case for rf=2 Signed-off-by: Michel Hollands <[email protected]> * Address review comments Signed-off-by: Michel Hollands <[email protected]> * Fix comment Signed-off-by: Michel Hollands <[email protected]> * Update docs with review comments Signed-off-by: Michel Hollands <[email protected]> * Set maxUnavailableZones and change tests Signed-off-by: Michel Hollands <[email protected]> Co-authored-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
What this PR does:
Add zone awareness to the read path. A
MaxUnavailableZones
field is added to theReplicationSet
struct. The value of that field is derived from the number of zones.In
ReplicationSet.Do()
the zones of the of the failing requests is taken into account for zone aware requests. If too many zones have failures then the request fails.Unit tests for the non-zone aware invocations of
ReplicationSet.Do()
are added as well.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]