Skip to content

Conversation

@david-crespo
Copy link
Contributor

@david-crespo david-crespo commented Aug 22, 2025

Closes #8869

  • Add update_status endpoint at /v1/system/update/status that includes:
    • Current target release (same as existing target_release_view)
    • time_last_progress, representing the time_made_target of the latest bp_target — meant to indicate the last time the update system Did Something
    • components_by_release_version, a map where the keys are to_string'd TufRepoVersions and the values are counts of components on that version
  • Remove target_release_view endpoint, which is fully redundant with update status
    • Add a line to the doc comment on target_release_update that you can use update status to check the current target release
  • Rework the TargetRelease struct that was previously being returned from target_release_view and is now part of the update status response
    • Remove generation from TargetRelease because it is not meaningful to the end user
    • Other changes that are easier to explain in the inline comments

Listing blockers or problems preventing further update progress is mentioned as a goal of #8869, but they are currently stored as JSON blobs in the DB for debug purposes. They will soon be stored in a more regimented way that should make it easy to stick a list of blockers in the response body. If this PR gets merged without that, I'll make a dedicated issue for it.

Example status response

{
  "target_release": {
    "time_requested": "2025-09-24T22:37:43.266338Z",
    "version": "2.0.0"
  },
  "components_by_release_version": {
    "install dataset": 7,
    "unknown": 15
  },
  "time_last_step_planned": "2025-09-24T22:37:41.556717Z",
  "suspended": false
}

Base automatically changed from iliana/not-experimental to main August 26, 2025 18:24

let counts = status.components_by_release_version;
assert_eq!(counts.get("install dataset").unwrap(), &7);
assert_eq!(counts.get("unknown").unwrap(), &15);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably want to change the Display impl to make these snake case. I could just write my own helper that does that and leave the Display impl alone, but it's probably bad to have to two ways of doing this.

impl Display for TufRepoVersion {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
TufRepoVersion::Unknown => write!(f, "unknown"),
TufRepoVersion::InstallDataset => {
write!(f, "install dataset")
}
TufRepoVersion::Version(version) => {
write!(f, "{}", version)
}
TufRepoVersion::Error(s) => {
write!(f, "{}", s)
}
}
}
}

/// rack should eventually correspond to the release described here.
///
/// Will only be null if a target release has never been set.
pub target_release: Option<TargetRelease>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've collapsed the TargetReleaseSource enum with its Unspecified and SystemVersion arms into Option<TargetRelease>, where TargetRelease is always a system version. This is based on my understanding that it's basically never going to be Unspecified after the first few weeks of update being released. By handling that with target_release: null, we ensure future values will be nice ones like target_release: { version: "2.0.0", time_requested: "..." } that don't distract the user with the possibility of other shapes. In the previous TargetReleaseSource enum, this was the distinction between { "type": "unspecified" } vs. { "type": "system_version", ... }. The option approach lets us make the unspecified arm nearly invisible.

/// The source of the target release.
pub release_source: TargetReleaseSource,
/// The specified release of the rack's system software
pub version: Version,
Copy link
Contributor Author

@david-crespo david-crespo Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the word "source" here very confusing. Now there is no source. There either is no target release or there is one, and when there is one, it always has a system version.

I also thought it was a little weird that you could have a target release like TargetRelease { time_requested: "...", release_source: Unspecified }, namely unspecified but still with a non-null time_requested. Based on the migration adding the unspecified value, time_requested is just whenever the migration ran, which to me is not very meaningful to the user. In this PR, time_requested is still inside TargetRelease, but since TargetRelease is now optional, you only have time_requested in the response when you have a release set.

-- System software is by default from the `install` dataset.
INSERT INTO omicron.public.target_release (
generation,
time_requested,
release_source,
tuf_repo_id
) VALUES (
1,
NOW(),
'unspecified',
NULL
) ON CONFLICT DO NOTHING;

#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, JsonSchema)]
pub struct TargetRelease {
/// The target-release generation number.
pub generation: i64,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed. I don't think it's meaningful to the user.

@david-crespo david-crespo marked this pull request as ready for review September 24, 2025 22:57
.chain(rot_version)
.chain(std::iter::once(bootloader_version))
.chain(host_version)
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to do this by hand here, but "make a flat list of all the versions in the system" feels like a good operation to have a canonical version of that lives somewhere else.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not had a chance to look carefully at the changes to nexus/src/app/update.rs or the updates.rs integration test. The API changes like fine to me. We'll need to coordinate an update to the CLI as people are testing this.

Comment on lines +233 to +237
self.datastore()
.target_release_get_generation(opctx, Generation(prev_gen))
.await
.internal_context("fetching previous target release")?
.and_then(|r| r.tuf_repo_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little sketchy? I'm not sure.

Example: after a MUPdate, you might set the target release to the same value it was before. Then the previous row will not be the previous release.

Today, the PlanningInput has this information. Maybe PlanningInputFromDb has a better way to do this?

Copy link
Contributor Author

@david-crespo david-crespo Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that is where I copied this logic from. The comment suggests that #8056 helps but I am not sure it does.

let tuf_repo = TufRepoPolicy {
target_release_generation: target_release.generation.0,
description: target_release_desc,
};
// NOTE: We currently assume that only two generations are in play: the
// target release generation and its previous one. This depends on us
// not setting a new target release in the middle of an update: see
// https://github.com/oxidecomputer/omicron/issues/8056.
//
// We may need to revisit this decision in the future. See that issue
// for some discussion.
let old_repo = if let Some(prev) = target_release.generation.prev() {
let prev_release = datastore
.target_release_get_generation(opctx, Generation(prev))
.await
.internal_context("fetching previous target release")?;
let description = if let Some(prev_release) = prev_release {
if let Some(repo_id) = prev_release.tuf_repo_id {
TargetReleaseDescription::TufRepo(
datastore
.tuf_repo_get_by_id(opctx, repo_id.into())
.await
.internal_context(
"fetching previous target release repo",
)?
.into_external(),
)
} else {
TargetReleaseDescription::Initial
}
} else {
TargetReleaseDescription::Initial
};
TufRepoPolicy { target_release_generation: prev, description }
} else {
TufRepoPolicy::initial()
};

I certainly see your point, though, and it is a problem. In the internal API logic I'm borrowing, it seems like we will only turn up component versions that match either the old or the new one passed in, because we're matching against artifact hashes. In the MUPdate case where current and prev release are the same, components that are on the the MUPdate's version will all show up as TufRepoVersion::Unknown.

fn zone_image_source_to_version(
old: &TargetReleaseDescription,
new: &TargetReleaseDescription,
source: &OmicronZoneImageSource,
res: &ConfigReconcilerInventoryResult,
) -> TufRepoVersion {
if let ConfigReconcilerInventoryResult::Err { message } = res {
return TufRepoVersion::Error(message.clone());
}
let &OmicronZoneImageSource::Artifact { hash } = source else {
return TufRepoVersion::InstallDataset;
};
if let Some(new) = new.tuf_repo() {
if new.artifacts.iter().any(|meta| meta.hash == hash) {
return TufRepoVersion::Version(
new.repo.system_version.clone(),
);
}
}
if let Some(old) = old.tuf_repo() {
if old.artifacts.iter().any(|meta| meta.hash == hash) {
return TufRepoVersion::Version(
old.repo.system_version.clone(),
);
}
}
TufRepoVersion::Unknown
}

Copy link
Contributor Author

@david-crespo david-crespo Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgallagher says to use this instead, and sort by generation number to figure out what are old and new releases (not sure I actually need old and new anyway, I just need all the versions on hand).

https://github.com/oxidecomputer/omicron/blame/4d5bdc6d348b27761348d763c4085f060bcefc18/nexus/db-queries/src/db/datastore/target_release.rs#L227-L302

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide not to do this now?

Copy link
Contributor Author

@david-crespo david-crespo Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the upshot of the discussion in chat was that since the internal API logic doesn't do any better, it makes sense to do it this way for now and improve both later. We could use that linked method to fetch more releases, but the code I'm hooking into to match up versions with hashes on the components only takes old and new.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Just some minor suggestions here (one of which, the <= thing, is probably important).

Comment on lines +233 to +237
self.datastore()
.target_release_get_generation(opctx, Generation(prev_gen))
.await
.internal_context("fetching previous target release")?
.and_then(|r| r.tuf_repo_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide not to do this now?

@david-crespo david-crespo merged commit f41bd73 into main Oct 9, 2025
17 checks passed
@david-crespo david-crespo deleted the update-status-api branch October 9, 2025 20:17
david-crespo added a commit that referenced this pull request Oct 9, 2025
…#9106)

Built on top of #8897

These are breaking API changes and I won't merge until I have a PR up on
the CLI side that makes this all work.

### Add list endpoint and fix other ones

Made the endpoints match the other ones. Not married to the verb
`upload`, but `system_update_repository_update` felt a little silly.

```diff
-system_update_get_repository        GET      /v1/system/update/repository/{system_version}
-system_update_put_repository        PUT      /v1/system/update/repository
+system_update_repository_list       GET      /v1/system/update/repositories
+system_update_repository_upload     PUT      /v1/system/update/repositories
+system_update_repository_view       GET      /v1/system/update/repositories/{system_version}
```

### Standardize and simplify response types

What's responsible for this PR being so big is that I reworked these
endpoints to return a new `TufRepo` struct that is only the metadata and
no artifacts. The main reason for this was that the list endpoint would
have to pull artifacts for every repo in the list. The list is likely to
be small, so it's probably not a big deal, but it also seems that the
artifacts are not useful to the end user. Once the list endpoint was
simplified, it makes sense to return the same thing from the view and
update endpoints. The upload endpoint returns a `TufRepoUpload` which
has the `TufRepo` together with an indicator saying whether the repo
contents were new repo or already existed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

upgrade status in external API

5 participants