Skip to content

Zone-aware ring checks can fail when an entire zone is unavailable #4291

@jtlisi

Description

@jtlisi

Describe the bug

If all the instances in a zone are in the LEAVING state but all of the other zones are fully available, ring checks will return errors.

To Reproduce

This can be replicated but updating the following test:

diff --git a/pkg/ring/ring_test.go b/pkg/ring/ring_test.go
index f1d038d29..8341b5096 100644
--- a/pkg/ring/ring_test.go
+++ b/pkg/ring/ring_test.go
@@ -161,7 +161,7 @@ func TestRing_Get_ZoneAwarenessWithIngesterLeaving(t *testing.T) {
                                "instance-3": {Addr: "127.0.0.3", Zone: "zone-b", State: ACTIVE},
                                "instance-4": {Addr: "127.0.0.4", Zone: "zone-b", State: ACTIVE},
                                "instance-5": {Addr: "127.0.0.5", Zone: "zone-c", State: LEAVING},
-                               "instance-6": {Addr: "127.0.0.6", Zone: "zone-c", State: ACTIVE},
+                               "instance-6": {Addr: "127.0.0.6", Zone: "zone-c", State: LEAVING},
                        }
                        var prevTokens []uint32
                        for id, instance := range instances {

Expected behavior

Cortex should be resilient to unavailable zones as long as > RF / 2 zones are available

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions