Skip to content

[usage] Ensure controller ticks are not concurrent #10995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

easyCZ
Copy link
Member

@easyCZ easyCZ commented Jun 29, 2022

Description

Ensures that a single instance of the usage component does not invoke multiple usage controller ticks concurrently. This is done by consuming from a channel which only ever executes linearly. Additionally, we skip buffering runs when we've exceeded 1 item queued already - prevents build up.

Related Issue(s)

Fixes #

How to test

Unit tests

Release Notes

NONE

Documentation

NONE

Werft options:

  • /werft with-preview

@easyCZ easyCZ requested a review from a team June 29, 2022 07:19
@github-actions github-actions bot added the team: webapp Issue belongs to the WebApp team label Jun 29, 2022
@easyCZ easyCZ force-pushed the mp/usage-ensure-singleton-runs branch from 6054bc3 to e659561 Compare June 29, 2022 07:22
@easyCZ easyCZ force-pushed the mp/usage-ensure-singleton-runs branch from e659561 to 9ef6bdd Compare June 29, 2022 07:26
@jankeromnes jankeromnes self-assigned this Jun 29, 2022
@jankeromnes
Copy link
Contributor

jankeromnes commented Jun 29, 2022

Many thanks! Taking a look 👀

Copy link
Contributor

@jankeromnes jankeromnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, and the unit test behaves as expected! (I also tried editing the values to make it fail -- works well).

However, I also ran a build with-preview, in order to observe skips on the real usage component.

  • I set the schedule to 1ms
  • But I observed no skips at all 🤔 I guess the loop is pretty fast when there are few workspaces, but still, I've seen it run only once per second (not every millisecond, as specified) -- do we somehow cap the min schedule to one second? Or is 1ms not the right format?

Still, shouldn't be a blocker, as we can safely test this in production (worst case production behavior is unchanged -- best case, parallel runs are prevented).

Another open question -- I remember @andrew-farries suggesting we use a Kubernetes cron job, in order not to re-implement a cron and a "don't run the next loop if the previous hasn't finished" mechanism. Do you have a strong preference for re-implementing this ourselves in Go as opposed to using an available Kubernetes feature?

/hold

@easyCZ
Copy link
Member Author

easyCZ commented Jun 29, 2022

@jankeromnes The cron library we use has the lowest resolution set to 1s, so anything below gets truncated to 1s. Indeed I wouldn't expect this to manifest in preview or even staging envs, but in prod it could. One way we could orchestrate this is to create many WSI records in preview manually which would increase the runtime above 1sec. My hope is that the test does demonstrate it would behave as expected, despite the preview/staging envs not having enough data to trigger this scenario.

@easyCZ
Copy link
Member Author

easyCZ commented Jun 29, 2022

Another open question -- I remember @andrew-farries suggesting we use a Kubernetes cron job, in order not to re-implement a cron and a "don't run the next loop if the previous hasn't finished" mechanism. Do you have a strong preference for re-implementing this ourselves in Go as opposed to using an available Kubernetes feature?

I would like this also, but it's a lot bigger change. We'd need to give the service a k8s service account make it interact with kubernetes. Gero wasn't convinced this is something we should do now (and I tend to agree) but yes, native k8s cron would make this a bit easier but it would increase complexity of the system (more moving parts).

@easyCZ
Copy link
Member Author

easyCZ commented Jun 29, 2022

I'm going to unhold this to get it into staging so we can do a test deploy this afternoon. Happy to follow-up on any of these in more details, or fix-up in other PRs.

/unhold

@roboquat roboquat merged commit 3cd0cf2 into main Jun 29, 2022
@roboquat roboquat deleted the mp/usage-ensure-singleton-runs branch June 29, 2022 11:14
@easyCZ easyCZ mentioned this pull request Jun 29, 2022
73 tasks
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production deployed Change is completely running in production release-note-none size/M team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants