Skip to content

ClusterCondition::last_update_time is updated on no-ops, causing infinite reconciles (in the worst case) #1032

@nightkr

Description

@nightkr

Affected version

Yes. (Still an issue on trunk, introduced in #571, rolled out around SDP 23.4.)

Current and expected behavior

Reconciling a cluster where there nothing has changed should be a no-op.

ClusterCondition::last_update_time breaks this expectation since it is set unconditionally to whatever the current time is, rounded to the second (

if old_condition.status == new_condition.status {
ClusterCondition {
last_update_time: Some(now),
last_transition_time: old_condition.last_transition_time,
..new_condition
}
). This is registered as another object modification if the new reconcile is not within the same wall-second as the previous one. Depending on how long one reconcile takes, that can cause (up to) an infinite re-reconciliation loop while the object is trying to settle down (which is likely to be an indication that the cluster is struggling to begin with!).

Possible solution

  1. Drop last_update_time completely (for compat: either stub it out or make it equivalent to last_transition_time)
  2. Take the value from whenever the data source for the condition was updated, rather than the current wall time (if it makes sense/is possible for that condition)

Additional context

Discovered by @siegfriedweber, discussed at https://stackable-workspace.slack.com/archives/C02FZ581UCD/p1747230004370629

Environment

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

Labels

release-noteDenotes a PR that will be considered when it comes time to generate release notes.release-note/action-requiredDenotes a PR that introduces potentially breaking changes that require user action.release/25.7.0type/bug

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions