-
Notifications
You must be signed in to change notification settings - Fork 816
disable rule groups #5521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable rule groups #5521
Conversation
pkg/ruler/ruler.go
Outdated
@@ -187,6 +197,8 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) { | |||
f.BoolVar(&cfg.EnableQueryStats, "ruler.query-stats-enabled", false, "Report the wall time for ruler queries to complete as a per user metric and as an info level log message.") | |||
f.BoolVar(&cfg.DisableRuleGroupLabel, "ruler.disable-rule-group-label", false, "Disable the rule_group label on exported metrics") | |||
|
|||
f.Var(&cfg.DisabledRuleGroups, "ruler.disabled-rule-groups", "Comma separated list of keys of rule groups this ruler cannot evaluate. If specified, a ruler that would normally pick the specified rule groups for processing will ignore them instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the disabled rule groups list in runtime config? It is more flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
46e7185
to
fdbb9cb
Compare
b3f858b
to
7075adc
Compare
2c6a3e4
to
be3d241
Compare
91651bc
to
6864bc9
Compare
e4eba59
to
80add1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice pr. I think what I want to see is one more integration test for this feature
pkg/ruler/ruler.go
Outdated
if !ruleGroupDisabled(group, disabledRuleGroupsForUser) { | ||
filteredGroupsForUser = append(filteredGroupsForUser, group) | ||
} else { | ||
level.Info(r.logger).Log("msg", "rule group disabled", "rule group name", group.Name, "namespace", group.Namespace, "user", group.User) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put this as debug level? Wondering if it is really necessary to make info.
Can simplify rule group name
to just name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking that we should do info though, we should have the log to see if we are indeed skip the rule evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the log with debug level as well as long as you change log level.
Can we tell some rule group is disabled by metrics? We dont have to use logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am good. Let's keep this as info for now…
continue | ||
switch e := err.(type) { | ||
case *DisabledRuleGroupErr: | ||
level.Info(log).Log("msg", e.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it too verbose to log it every time during filtering? And this doesn't feel like an error to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give visibility, I think it needs to be either Info or Warn
return disabledRuleGroupsForUser | ||
} | ||
} | ||
return DisabledRuleGroups{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return the default limits here? I think someone may wanna set this on the config (instead runtimeconfig) as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this one. Since user
would be required to disable the rule groups to avoid accidentally disabling rule groups from other users that happen to have the same name and namespace, I thought it might be better to allow this config only from runtime config. If we want to allow config, then we need to have user
field exposed here so it can be set. If we did that, then someone can configure user
under runtime config which is already a map of [user:overrides] which can be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! +1
if len(disabledRuleGroupsForUser) == 0 { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this check right in the beginning of the method and return the original allRuleGroups
value if the len is 0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to loop thru and check if there are overrides for each user, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my bad... this is for all users! ok
if len(disabledRuleGroupsForUser) == 0 { | ||
continue | ||
} | ||
filteredGroupsForUser := rulespb.RuleGroupList{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we use make
here to since we have an idea how many elements there will be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't know how many rule groups are disabled until the loop is complete
continue | ||
switch e := err.(type) { | ||
case *DisabledRuleGroupErr: | ||
level.Info(log).Log("msg", e.Message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, all the instance will log this right even if that instance doesn't own the group? I think this will produce too much logs for large clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seems to be an error to me +1
@@ -3093,6 +3093,9 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s | |||
# alerts will fail with a log message and metric increment. 0 = no limit. | |||
# CLI flag: -alertmanager.max-alerts-size-bytes | |||
[alertmanager_max_alerts_size_bytes: <int> | default = 0] | |||
|
|||
# list of rule groups to disable | |||
[disabled_rule_groups: <list of rule groups to disable> | default = ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should default be []?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document this format somehow?
We have others "complex configs" like https://cortexmetrics.io/docs/configuration/configuration-file/#memcached_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will submit another PR for this. Existing doc generator does not handle slices of structs well
e327544
to
47cc7c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!
47cc7c4
to
622c96d
Compare
return rlrs.Instances[0].Addr == instanceAddr, nil | ||
ownsRuleGroup := rlrs.Instances[0].Addr == instanceAddr | ||
if ownsRuleGroup && ruleGroupDisabled(g, disabledRuleGroups) { | ||
return false, &DisabledRuleGroupErr{Message: fmt.Sprintf("rule group %s, namespace %s, user %s is disabled", g.Name, g.Namespace, g.User)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Should this return an error? i see why u did that (u wanna catch on the caller layer) but is weird to return an error on something expected.
But im ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is a little bit weird... But seems like a workaround here as Go doesn't have a good way to capture such things...
622c96d
to
15dbc70
Compare
Signed-off-by: Anand Rajagopal <[email protected]>
15dbc70
to
6fa6f92
Compare
What this PR does:
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]