Skip to content

Conversation

@isoos
Copy link
Collaborator

@isoos isoos commented Dec 10, 2025

  • Moving the state transition methods into the model class.
  • Retaining the docs and pana flags after scheduling (otherwise there is a time window where a new schedule has been started but not completed, where a previous run's results are not available).
  • Added the previousScheduled field, but we could skip it and just use initialTimestamp when restoring the previous state.
  • Simpler updatePackageStateWithPendingVersions and restorePreviousVersionsState since they no longer need to keep all the previous version state information.

@isoos isoos requested a review from jonasfj December 10, 2025 16:01
@isoos isoos force-pushed the task-version-state branch from b9c6f9e to 01d362e Compare December 16, 2025 10:44
'attempts: $attempts',
'zone: $zone',
'instance: $instance',
'secretToken: $secretToken',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new properties :D

].join(', ') +
')';

// Remove instanceName, zone, secretToken, and set attempts = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Remove instanceName, zone, secretToken, and set attempts = 0
/// Remove instanceName, zone, secretToken, and set attempts = 0

finished: true,
zone: null,
instance: null, // version is no-longer running on this instance
secretToken: null, // TODO: Consider retaining this for idempotency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't oppose the idea of retaining this for idempotency, that's a great idea.

s.versions!.addAll({
for (final v in pendingVersions.map((v) => v.toString()))
v: PackageVersionStateInfo(
scheduled: now,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using a different now after this change, is that a good idea?

// suppose to run on the instance we just failed to create.
// If this doesn't work, we'll eventually retry. Hence, correctness
// does not hinge on this transaction being successful.
await db.tasks.restorePreviousVersionsState(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we shouldn't have a method for this, should be inline logic.

There is never a reason to call restorePreviousVersionsState from anywhere else is there?

Also you cannot look at restorePreviousVersionsState and not understand the context within which it belongs.

if we want it as a method, then it should be a helper method in this file, like updatePackageStateWithPendingVersions (which IMO could also be inline); but I do think updatePackageStateWithPendingVersions is a bit more specific in what it does and worth documenting separately. Also just to keep the code short.

Why can't we simply store previousScheduled as a local variable in scope of scheduleInstance?

PackageVersionStateInfo scheduleNew({
required String zone,
required String instanceName,
}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't dislike these methods. It moves a lot of invariants into PackageVersionStateInfo -- and ensures that we're maintaining the invariants here.

But let's expand the documentation. You contract with the world is no longer than the world outside must understand all the invariants of this class, but that the world can only use method in this class.

Suggested change
}) {
/// Create updated [PackageVersionStateInfo] for when a new instance have been
/// scheduled.
///
/// You must supply:
/// * [scheduled] when the instance was scheduled.
/// * [zone] within which the instance was scheduled.
/// * [instanceName] as name of the of the isntance scheduled.
/// * [secretToken] passed to the instance for authentication when
/// the instance wants to callback.
PackageVersionStateInfo scheduleNew({
required DateTime scheduled,
required String zone,
required String instanceName,
required String secretToken,
}) {

I think we should leave deciding values to the scheduler. It is the schedulers responsibility to pick the values. This class mostly manages the invariants (it becomes a lot less pure if we ask this class to generate random tokens).

Yes, adding more parameters is just a nit 🙈

But expanding the documentation is important.

Comment on lines +406 to +407
/// Reverts the status of the last scheduling attempt, which has presumably failed.
PackageVersionStateInfo resetAfterFailedAttempt() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Reverts the status of the last scheduling attempt, which has presumably failed.
PackageVersionStateInfo resetAfterFailedAttempt() {
/// Reverts the status of the last scheduling attempt, when it is known that scheduling failed.
///
/// > [!WARNING]
/// > This state transition **may only** be used if it's
/// > **known with certainty** that scheduling failed.
/// >
/// > If an instance _may_ have been scheduled, but we suspect
/// > scheduling failed, we have to wait for a retry.
/// > As we otherwise risk leaving an instance unable to call back,
/// > which will leave the instance logging errors that indicate
/// > internal errors in our system.
PackageVersionStateInfo resetAfterFailedAttempt({required DateTime previousScheduled}) {

We only do this is if we know for sure that scheduling the instance failed.

We shouldn't do this if the scheduling request failed with a IOException or other transport error. Only if we get an error from the GCE API, then we know for sure that the instance wasn't scheduled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we might actually do this on an IOException, we probably shouldn't but I'm also okay with that minor issue.

I'd just prefer that we clearly document what the correct behavior is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants