Skip to content

Conversation

MichelHollands
Copy link
Contributor

@MichelHollands MichelHollands commented Oct 28, 2020

What this PR does:
Add zone awareness to the read path. A MaxUnavailableZones field is added to the ReplicationSet struct. The value of that field is derived from the number of zones.

In ReplicationSet.Do() the zones of the of the failing requests is taken into account for zone aware requests. If too many zones have failures then the request fails.

Unit tests for the non-zone aware invocations of ReplicationSet.Do() are added as well.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@MichelHollands MichelHollands force-pushed the add_topology_aware_read branch from 9deb837 to 0bf804a Compare October 29, 2020 09:28
@MichelHollands MichelHollands marked this pull request as ready for review October 29, 2020 10:11
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @MichelHollands! I think we're on the good track. I left you many comments, but most are small, while GetAll() will need a bit more work.

That being said, I would be glad if you could add 1 integration test. I would test this scenario:

  • Start a cortex cluster with 6 ingesters and zone-awareness enabled
  • Push 100 series
  • Query back 100 series > all good
  • SIGKILL 1 ingester in 1 zone
  • Query back 100 series > all good
  • SIGKILL 1 more ingester in the same zone
  • Query back 100 series > all good
  • SIGKILL 1 more ingester in a different zone
  • Query back 100 series > fail

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MichelHollands for addressing my feedback! I left few more comments. Please make sure that any fix you do is also covered by a unit test to avoid any regression (either in this PR or future changes).

@MichelHollands MichelHollands force-pushed the add_topology_aware_read branch 6 times, most recently from 56f66cb to 05dad8c Compare November 4, 2020 09:26
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that there are places where triggering the zone-awareness logic in the GetAll() is not correct, because it will filter out even healthy instances (eg. NewRingServiceDiscovery(), distributor.GetAll(), etc..). Let's talk offline about this.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review pass. I need to get better understanding of the tests.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I still need to review TestReplicationSet_Do and integration test.

pkg/ring/ring.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RF=2 and there are two zones, then minSuccessZones will end up being 2. I would expect that we can actually tolerate one failing zone in such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a special case for rf=2 and nr of zones =2/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure anymore if my previous statement is correct. Reason is that during write, we would accept it if we get single OK. But during read, we don't know which single zone has received the sample, so we need to ask both.

I'd check with @pracucci to be 100% sure. But now I think my previous comment here was incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Copy link
Contributor

@pracucci pracucci Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current status of this conversation?

DefaultReplicationStrategy.Filter() calculate minSuccess := (replicationFactor / 2) + 1. The minSuccess is the write quorum. If RF=2, then we need to write to both ingesters in order to succeed. Because of this, it's safe to have maxUnavailableZones=1 when RF=2.

I suggest changing the logic as follow:

maxUnavailableZones = minSuccessZones - 1

Tests should be updated accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current status of this conversation?

I convinced myself that with RF=2, it is enough to write to single ingester. But that's not a majority 🤦

Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
MichelHollands and others added 13 commits November 9, 2020 09:01
Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
@MichelHollands MichelHollands force-pushed the add_topology_aware_read branch from 9d50812 to 75a5dc6 Compare November 9, 2020 09:02
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

pkg/ring/ring.go Outdated
Copy link
Contributor

@pracucci pracucci Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current status of this conversation?

DefaultReplicationStrategy.Filter() calculate minSuccess := (replicationFactor / 2) + 1. The minSuccess is the write quorum. If RF=2, then we need to write to both ingesters in order to succeed. Because of this, it's safe to have maxUnavailableZones=1 when RF=2.

I suggest changing the logic as follow:

maxUnavailableZones = minSuccessZones - 1

Tests should be updated accordingly.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MichelHollands for patiently addressing my feedback. Well done, LGTM! 👏

logger.Log("Killing", s.name)

if out, err := RunCommandAndGetOutput("docker", "stop", "--time=0", s.containerName()); err != nil {
if out, err := RunCommandAndGetOutput("docker", "kill", s.containerName()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changing from stop to kill?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kill() function, used by end-to-end tests, is expected to send a SIGKILL. We were using docker stop --time=0, which is expected to send a SIGTERM and then a SIGKILL after time 0, but we wondered if there could be some timing issues and a graceful shutdown of the stopped process could actually happen (or at least start). To make it more obvious we're going to send a SIGKILL, we decided to just use docker kill.

// 1 failing ingester. Due to how replication works when zone-awareness is
// enabled (data is replicated to RF different zones), there's no benefit in
// querying healthy instances from "failing zones". A zone is considered
// failed if there is single error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the logic behind this.
If I have 1 failed replica in zone A, and 1 failed in zone B, and none failed in zone C, won't this cause an outage for some timeseries when there would have been none in some cases(i.e. the instance in A and the instance in B didn't host a replica of any of the same series). With this logic of removing everything from a zone if there is a failure, won't this make it more likely to have an outage in some cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends.

Assuming RF=3, if -distributor.shard-by-all-labels is disabled, then this function is not used and we only query the 3 ingesters holding all the series for the queried metric (and we just need 2 success out of 3 queried ingesters).

However, if -distributor.shard-by-all-labels is enabled, we don't know which ingesters hold all the series for a given metric name, so we need to query all of them. When zone-awareness is disabled, we need to query all ingester minus 1, while when zone-awareness is enabled we need to query all ingesters in all zones minus 1 (thus tolerating all ingesters in 1 zone failing).

About shuffle sharding, this function is called after the subring has been selected.

Am I missing anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I follow now, and am convinced the logic is correct, but maybe less enthusiastic about some of the implications, but since the value of all ingesters when -distributor.shard-by-all-labels is enabled is the subring seems more reasonable.

@pracucci pracucci merged commit a04b547 into cortexproject:master Nov 18, 2020
simonswine pushed a commit to grafana/e2e that referenced this pull request Jan 13, 2022
* Add unit tests

Signed-off-by: Michel Hollands <[email protected]>

* Add lock to avoid race condition in test

Signed-off-by: Michel Hollands <[email protected]>

* Add zone aware to GetAll

Signed-off-by: Michel Hollands <[email protected]>

* Remove debug print statements

Signed-off-by: Michel Hollands <[email protected]>

* Add changelog entry

Signed-off-by: Michel Hollands <[email protected]>

* Fix comment

Signed-off-by: Michel Hollands <[email protected]>

* Address replication set review comments

Signed-off-by: Michel Hollands <[email protected]>

* Reword and change changelong entry to enhancement

Signed-off-by: Michel Hollands <[email protected]>

* Address review comments in ring code

Signed-off-by: Michel Hollands <[email protected]>

* Do not return early and add more test cases

Signed-off-by: Michel Hollands <[email protected]>

* Rename ingesters to instances

Signed-off-by: Michel Hollands <[email protected]>

* Add one more test

Signed-off-by: Michel Hollands <[email protected]>

* Update pkg/ring/replication_set.go

Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>

* Update pkg/ring/replication_set.go : add sign off

Signed-off-by: Michel Hollands <[email protected]>

* Add integration test

Signed-off-by: Michel Hollands <[email protected]>

* Fix imports as per goimports

Signed-off-by: Michel Hollands <[email protected]>

* Address review comments and add extra tests

Signed-off-by: Michel Hollands <[email protected]>

* Fix rebase

Signed-off-by: Michel Hollands <[email protected]>

* Fix rebase in test

Signed-off-by: Michel Hollands <[email protected]>

* Add lock around mockIngester call

Signed-off-by: Michel Hollands <[email protected]>

* Add lock around mockIngester call at correct place

Signed-off-by: Michel Hollands <[email protected]>

* Handle nr of zones > replication factor

Signed-off-by: Michel Hollands <[email protected]>

* Use util.Min instead of if statement

Signed-off-by: Michel Hollands <[email protected]>

* Update pkg/ring/replication_set_test.go

Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>

* Use atomic and sets

Signed-off-by: Michel Hollands <[email protected]>

* Fixed integration test and ReplicationSet.Do()

Signed-off-by: Marco Pracucci <[email protected]>

* Added tracker unit tests

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed TestReplicationSet_Do

Signed-off-by: Marco Pracucci <[email protected]>

* Commented ReplicationSet max errors and max unavailable zones

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed GetReplicationSetForOperation() logic and improved unit tests

Signed-off-by: Marco Pracucci <[email protected]>

* Improved tests

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed tests flakyness

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed test

Signed-off-by: Marco Pracucci <[email protected]>

* Update documentation

Signed-off-by: Michel Hollands <[email protected]>

* Add note about reads from zone aware clusters

Signed-off-by: Michel Hollands <[email protected]>

* Remove extra space

Signed-off-by: Michel Hollands <[email protected]>

* Address some of Peter's review comment

Signed-off-by: Michel Hollands <[email protected]>

* Add special case for rf=2

Signed-off-by: Michel Hollands <[email protected]>

* Address review comments

Signed-off-by: Michel Hollands <[email protected]>

* Fix comment

Signed-off-by: Michel Hollands <[email protected]>

* Update docs with review comments

Signed-off-by: Michel Hollands <[email protected]>

* Set maxUnavailableZones and change tests

Signed-off-by: Michel Hollands <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants