Add cosmo support to reconfigurator #9134

labbott · 2025-10-02T20:13:51Z

No description provided.

nexus/reconfigurator/planning/src/system.rs

karencfv

Thanks for addressing my comment! I have a few questions.

Has this been tested on a real cosmo yet? Is it even possible at this point?

Do the SP, RoT, RoT Bootloader, Host OS take roughly the same time to reset on a Cosmo as they do on a Gimlet? We have several timeouts scattered throughout the reconfigurator code based on reset times on a gimlet.

Here we give each component a very generous amount to finish its post update actions. In the simplest of cases like the SP, we just reset the device, but others like the RoT Bootloader and the Host OS have several steps.

omicron/nexus/mgs-updates/src/driver_update.rs

Lines 645 to 682 in 81d8226

    
           // Timeouts, timeouts: always wrong! 
        
           // 
        
           // We have to pick some maximum time we're willing to wait for the `post_update` 
        
           // hook to complete. In general this hook is responsible for resetting the 
        
           // updated target to cause it to boot into its new version, but the details vary 
        
           // wildly by device type (e.g., post-update RoT bootloader actions require 
        
           // multiple RoT resets) and the expected amount of time also varies wildly 
        
           // (e.g., resetting a gimlet SP takes a few seconds, resetting a sidecar SP can 
        
           // take 10s of seconds, resetting a sled after a host OS update takes minutes). 
        
           fn post_update_timeout(update: &PendingMgsUpdate) -> Duration { 
        
               match &update.details { 
        
                   PendingMgsUpdateDetails::Sp { .. } => { 
        
                       // We're resetting an SP; use a generous timeout for sleds and power 
        
                       // shelf controllers (which should take a few seconds) and an even 
        
                       // more generous timeout for switches (which we've seen take 10-20 
        
                       // seconds in practice). 
        
                       match update.sp_type { 
        
                           SpType::Sled | SpType::Power => Duration::from_secs(60), 
        
                           SpType::Switch => Duration::from_secs(120), 
        
                       } 
        
                   } 
        
                   PendingMgsUpdateDetails::Rot { .. } => { 
        
                       // Resetting the RoT should be quick (a few seconds), but we wait 
        
                       // for boot info after the reset. 
        
                       Duration::from_secs(90) 
        
                   } 
        
                   PendingMgsUpdateDetails::RotBootloader { .. } => { 
        
                       // Resetting the bootloader requires multiple RoT resets; give this 
        
                       // a longer timeout. 
        
                       Duration::from_secs(210) 
        
                   } 
        
                   PendingMgsUpdateDetails::HostPhase1(..) => { 
        
                       // Resetting a sled takes several minutes (mostly DRAM training); 
        
                       // give something very generous here to wait for it to come back. 
        
                       Duration::from_secs(30 * 60) 
        
                   } 
        
               } 
        
           }

Here we poll the device every 10 seconds to see if it has been reset.

omicron/nexus/mgs-updates/src/driver_update.rs

Lines 415 to 443 in 81d8226

    
           if try_reset { 
        
               // We retry this until the component update has been successfully 
        
               // updated, or we get some error *other* than a communication error or 
        
               // some other transient error.  There is intentionally no timeout here. 
        
               // If we've staged an update but not managed to reset the device, 
        
               // there's no point where we'd want to stop trying to do so. 
        
               while let Err(error) = 
        
                   update_helper.post_update(log, &mut mgs_clients, update).await 
        
               { 
        
                   if error.is_fatal() { 
        
                       let error = InlineErrorChain::new(&error); 
        
                       error!(log, "post_update failed"; &error); 
        
                       return Err(ApplyUpdateError::SpComponentResetFailed( 
        
                           error.to_string(), 
        
                       )); 
        
                   } 
        
                   // We only care whether the update has completed. We ignore all 
        
                   // pre-check errors because they could all be transient if a reset 
        
                   // is in the process of happening. 
        
                   if let Ok(PrecheckStatus::UpdateComplete) = 
        
                       update_helper.precheck(log, &mut mgs_clients, update).await 
        
                   { 
        
                       break; 
        
                   } 
        
                   tokio::time::sleep(RESET_DELAY_INTERVAL).await; 
        
               } 
        
           }

For the RoT and RoT bootloader we wait a specific amount of time for boot info to show up

omicron/nexus/mgs-updates/src/rot_updater.rs

Lines 30 to 32 in 81d8226

    
           pub const WAIT_FOR_BOOT_INFO_TIMEOUT: Duration = Duration::from_secs(120); 
        
           const WAIT_FOR_BOOT_INFO_INTERVAL: Duration = Duration::from_secs(10);

omicron/nexus/mgs-updates/src/rot_updater.rs

Lines 268 to 344 in 81d8226

    
           /// Poll the RoT asking for its boot information. This confirms that the RoT has 
        
           /// been succesfully reset 
        
           pub async fn wait_for_boot_info( 
        
               log: &Logger, 
        
               mgs_clients: &mut MgsClients, 
        
               sp_type: SpType, 
        
               sp_slot: u16, 
        
               timeout: Duration, 
        
           ) -> Result<RotState, PostUpdateError> { 
        
               let before = Instant::now(); 
        
               loop { 
        
                   debug!(log, "waiting for boot info to confirm a successful reset"); 
        
                   match mgs_clients 
        
                       .try_all_serially(log, |mgs_client| async move { 
        
                           mgs_client 
        
                               .sp_rot_boot_info( 
        
                                   &sp_type, 
        
                                   sp_slot, 
        
                                   SpComponent::ROT.const_as_str(), 
        
                                   &GetRotBootInfoParams { 
        
                                       version: RotBootInfo::HIGHEST_KNOWN_VERSION, 
        
                                   }, 
        
                               ) 
        
                               .await 
        
                       }) 
        
                       .await 
        
                   { 
        
                       Ok(state) => match state.clone() { 
        
                           RotState::V2 { .. } | RotState::V3 { .. } => { 
        
                               debug!(log, "successfuly retrieved boot info"); 
        
                               return Ok(state.into_inner()); 
        
                           } 
        
                           // The RoT is probably still booting 
        
                           RotState::CommunicationFailed { message } => { 
        
                               if before.elapsed() >= timeout { 
        
                                   error!( 
        
                                       log, 
        
                                       "failed to get RoT boot info"; 
        
                                       "error" => %message 
        
                                   ); 
        
                                   return Err(PostUpdateError::FatalError { 
        
                                       error: message, 
        
                                   }); 
        
                               } 
        
                               info!( 
        
                                   log, 
        
                                   "failed getting RoT boot info (will retry)"; 
        
                                   "error" => %message, 
        
                               ); 
        
                               tokio::time::sleep(WAIT_FOR_BOOT_INFO_INTERVAL).await; 
        
                           } 
        
                       }, 
        
                       // The RoT might still be booting 
        
                       Err(error) => { 
        
                           let e = InlineErrorChain::new(&error); 
        
                           if before.elapsed() >= timeout { 
        
                               error!( 
        
                                   log, 
        
                                   "failed to get RoT boot info"; 
        
                                   &e, 
        
                               ); 
        
                               return Err(PostUpdateError::FatalError { 
        
                                   error: e.to_string(), 
        
                               }); 
        
                           } 
        
                           info!( 
        
                               log, 
        
                               "failed getting RoT boot info (will retry)"; 
        
                               e, 
        
                           ); 
        
                           tokio::time::sleep(WAIT_FOR_BOOT_INFO_INTERVAL).await; 
        
                       } 
        
                   } 
        
               } 
        
           }

omicron/nexus/mgs-updates/src/rot_bootloader_updater.rs

Lines 284 to 327 in 81d8226

    
           /// Poll the RoT asking for its boot information. This is used to check 
        
           /// the state for RoT bootloader image errors after RoT is reset 
        
           async fn wait_for_stage0_next_image_check( 
        
               log: &Logger, 
        
               mgs_clients: &mut MgsClients, 
        
               sp_type: SpType, 
        
               sp_slot: u16, 
        
               timeout: Duration, 
        
           ) -> Result<Option<RotImageError>, PostUpdateError> { 
        
               debug!(log, "attempting to verify image validity"); 
        
               match wait_for_boot_info(log, mgs_clients, sp_type, sp_slot, timeout).await 
        
               { 
        
                   Ok(state) => match state { 
        
                       // The minimum we will ever return is v3. 
        
                       // Additionally, V2 does not report image errors, so we cannot 
        
                       // know with certainty if a signature check came back with errors 
        
                       RotState::V2 { .. } => { 
        
                           let error = "unexpected RoT version: 2".to_string(); 
        
                           error!( 
        
                               log, 
        
                               "failed to get RoT boot info"; 
        
                               "error" => &error 
        
                           ); 
        
                           return Err(PostUpdateError::FatalError { error }); 
        
                       } 
        
                       RotState::V3 { stage0next_error, .. } => { 
        
                           debug!(log, "successfully completed an image signature check"); 
        
                           return Ok(stage0next_error); 
        
                       } 
        
                       // This is unreachable because wait_for_boot_info loops for some 
        
                       // time if it encounters `CommunicationFailed`, and if it hits the 
        
                       // timeout, it will return an error. 
        
                       RotState::CommunicationFailed { message } => { 
        
                           error!( 
        
                               log, 
        
                               "failed to get RoT boot info"; 
        
                               "error" => %message 
        
                           ); 
        
                           return Err(PostUpdateError::FatalError { error: message }); 
        
                       } 
        
                   }, 
        
                   Err(error) => return Err(error), 
        
               } 
        
           }

omicron/nexus/mgs-updates/src/rot_bootloader_updater.rs

Lines 269 to 277 in 81d8226

    
           // We wait for boot info to ensure a successful reset 
        
           wait_for_boot_info( 
        
               log, 
        
               mgs_clients, 
        
               update.sp_type, 
        
               update.slot_id, 
        
               WAIT_FOR_BOOT_INFO_TIMEOUT, 
        
           ) 
        
           .await?;

Sorry for the late review btw! I was out for a couple of weeks

karencfv · 2025-11-04T04:16:48Z

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

+                // We assume a valid model ID for sleds
+                model: match sp_id.type_ {
+                    SpType::Sled => GIMLET_SLED_MODEL.to_string(),
+                    _ => format!("dummy_{}", sp_id.type_),
+                },


If it's not too complex, it would be really nice to have some tests using a cosmo (unit or reconfigurator-cli ones).

I updated the simulated tests to add a single cosmo sled

karencfv · 2025-11-04T04:19:24Z

sled-hardware/types/src/lib.rs

-            "913-0000023" => Some(Self::Cosmo),
-            "913-0000019" | "913-0000006" => Some(Self::Gimlet),
+            COSMO_SLED_MODEL => Some(Self::Cosmo),
+            GIMLET_SLED_MODEL | "913-0000006" => Some(Self::Gimlet),


I realise this was already here, but is 913-0000006 just a Gimlet ID that is used for test or legacy devices? Or is this used elsewhere? Do you mind adding a comment or so we know where this ID comes from, and where it is used?

hmmm, I actually think that's incorrect now that I'm double checking it.

@dancrossnyc When you were working on #9016 where did you grab "913-0000006" from? At least according to pilot that's a sidecar serial https://github.com/oxidecomputer/pilot/blob/main/common/src/sp.rs#L135

(@labbott and I discussed this in chat; just recapping here.)

I cannot recall; I thought I saw it in an RFD, but that must be mistaken. My sleuthing uncovered the Duro/Terra is the source of truth here, and it's not registered in OANA. I must have cribbed it from somewhere, and I believe Pilot over my own recollection, so this should change.

labbott · 2025-11-04T16:05:50Z

Thanks for addressing my comment! I have a few questions.

Has this been tested on a real cosmo yet? Is it even possible at this point?

Unfortunately we've been blocked on cosmo device ID. Without a device ID, we can't connect via sprockets which means we can't run RSS which means we can't run any actual services to run reconfigurator. This is also why I've led this PR sit here.

Do the SP, RoT, RoT Bootloader, Host OS take roughly the same time to reset on a Cosmo as they do on a Gimlet? We have several timeouts scattered throughout the reconfigurator code based on reset times on a gimlet.

This is a slightly tricky question to answer. We know that we've seen Cosmo take more time than Gimlet for some things but we also haven't had a chance to finish profiling to see if this is expected or not. I think this will be something that will become more obvious as we actually start testing on actual hardware (thank you for the pointers on where to check!)

jclulow · 2025-11-06T03:14:33Z

Unfortunately we've been blocked on cosmo device ID. Without a device ID, we can't connect via sprockets which means we can't run RSS which means we can't run any actual services to run reconfigurator. This is also why I've led this PR sit here.

NB: I have just completed signing on the two Cosmos in question. They're now pending reinstallation into london; see oxidecomputer/meta#776!

labbott · 2025-11-10T16:50:46Z

   15 2025-11-10T15:47:34.879Z b379914c-3d5d-4ecb-8c0b-d7ccabdc473c  enabled: update Sled 15 (BRM13250012) host phase 1 to 18.0.0-0.ci+gitca25fe74cf3
   16 2025-11-10T15:52:49.391Z d5ca6377-bbfc-45e0-9c04-a9a535ca63bf  enabled: update Sled 15 (BRM13250012) host phase 1 to 18.0.0-0.ci+gitca25fe74cf3
   17 2025-11-10T16:15:07.444Z 2754d860-0664-4ae2-ac52-d85a806759b0  enabled: 
   18 2025-11-10T16:15:59.222Z ee854b4e-c2cb-4868-9855-35453ad2f401  enabled: update Sled 17 (BRM22250001) host phase 1 to 18.0.0-0.ci+gitca25fe74cf3
   19 2025-11-10T16:21:00.640Z dccf1c5f-8fd5-47f1-96c5-6434de78e951  enabled: update Sled 17 (BRM22250001) host phase 1 to 18.0.0-0.ci+gitca25fe74cf3

Attempt on a live system seems to work! Going to see about cleaning up the tests a bit more.

karencfv

So excited to see this is working on a real system! I have a few questions regarding the tests, but otherwise this is looking great, thanks.

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

karencfv · 2025-11-10T21:42:45Z

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

+            .gimlet_host_phase_1_artifacts(
+                ARTIFACT_HASH_HOST_PHASE_1_V1,
+                ARTIFACT_HASH_HOST_PHASE_1_V1,
+            )
+            .cosmo_host_phase_1_artifacts(
                ARTIFACT_HASH_HOST_PHASE_1_V1,


Same question about these artifact hashes being the same

This is one where I think it's okay to be the same because these are used as an intermediate step to indicate an upgrade needs to be happen.

/// Hash of a fake "version 1" artifact for host OS phase 1 /// /// This can be used to produce an inventory collection for a host slot that /// needs an update.

karencfv · 2025-11-10T21:47:28Z

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

+pub(super) const ARTIFACT_HASH_GIMLET_HOST_PHASE_1: ArtifactHash =
    ArtifactHash([29; 32]);
 /// Hash of fake artifact for host OS phase 1 (for a fake version 1.5)
 pub(super) const ARTIFACT_HASH_HOST_PHASE_1_V1_5: ArtifactHash =


Nit: For consistency, it'd be nice to have this and any other hash that is gimlet-specific, say it in the name like the other constants that were changed.

karencfv · 2025-11-10T21:50:24Z

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

+    } else if kind == ArtifactKind::GIMLET_ROT_IMAGE_A
+        && sled_type == Some(OxideSled::Cosmo)
+    {
+        ARTIFACT_HASH_ROT_COSMO_A
+    } else if kind == ArtifactKind::GIMLET_ROT_IMAGE_B
+        && sled_type == Some(OxideSled::Cosmo)


Should these not be ArtifactKind::COSMO_ROT_IMAGE_*?

This is where some of the weirdness with cosmo comes in. We are treating cosmos as just another subset of gimlet as far as the SP/RoT are concerned so using AritfactKind::GIMLET_ROT_A is correct

Ah! That's interesting. Do you mind writing a comment explaining that?

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

jgallagher

Looks great, just a couple minor style nits

jgallagher · 2025-11-11T15:08:42Z

nexus/reconfigurator/planning/src/mgs_updates/host_phase_1.rs

-        assert_eq!(first_update.artifact_hash, ARTIFACT_HASH_HOST_PHASE_1);
+        assert_eq!(
+            first_update.artifact_hash,
+            ARTIFACT_HASH_COSMO_HOST_PHASE_1


~~Why does the second update in this test pick up the Cosmo hash instead of Gimlet?~~

Nevermind - found the change to TestBoards that makes the second sled a Cosmo

nexus/reconfigurator/planning/src/mgs_updates/mod.rs

jgallagher · 2025-11-11T15:11:57Z

nexus/reconfigurator/planning/src/system.rs

        use typed_rng::TypedUuidRng;
        let unique = unique.unwrap_or_else(|| hardware_slot.to_string());
-        let model = format!("model{}", unique);
+        let model = GIMLET_SLED_MODEL.to_string();


It's probably worth extending reconfigurator-cli to be able to specify the model of a sled when adding it, right? (Definitely not as part of this PR and not particularly urgent - happy to file an issue for someone to pick up at some point.)

#9390 filed this

jgallagher · 2025-11-11T15:16:04Z

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

 }

 impl<'a> TestBoardCollectionBuilder<'a> {
+    #[allow(clippy::too_many_arguments)]


Minor style nit - I think I'm inclined to go with clippy on this function, particularly since most of the arguments are the same type (so the caller has to just know the correct order). Maybe this warrants something like a

fn new(boards: &'a TestBoards, description: TestBoardCollectionBuilderDescription)

where description contains named fields for all these Artifact* arguments?

I went back and forth. This new is a private function so I figured it was okay to only have to do it a limited number of places (exactly one in the current code). If it's bothersome I'll do a refactor.

Fix a semantic merge conflict between #9134 and #9372.

karencfv reviewed Oct 8, 2025

View reviewed changes

nexus/reconfigurator/planning/src/system.rs Outdated Show resolved Hide resolved

morlandi7 linked an issue Oct 13, 2025 that may be closed by this pull request

Successfully update a Cosmo via reconfigurator #8777

Closed

labbott force-pushed the reconfigurator_cosmo branch from 164893c to ee4ae51 Compare October 15, 2025 16:48

karencfv reviewed Nov 4, 2025

View reviewed changes

labbott force-pushed the reconfigurator_cosmo branch from 9c98062 to a3eef8a Compare November 7, 2025 15:48

karencfv reviewed Nov 10, 2025

View reviewed changes

jgallagher approved these changes Nov 11, 2025

View reviewed changes

Add cosmo support to reconfigurator

1986a96

labbott force-pushed the reconfigurator_cosmo branch from 8dfdf58 to 1986a96 Compare November 12, 2025 15:15

labbott merged commit e02810b into main Nov 12, 2025
18 checks passed

labbott deleted the reconfigurator_cosmo branch November 12, 2025 20:17

sunshowers mentioned this pull request Nov 12, 2025

[reconfigurator-cli] update snapshots #9392

Merged

sunshowers added a commit that referenced this pull request Nov 13, 2025

[reconfigurator-cli] update snapshots (#9392)

4d65868

Fix a semantic merge conflict between #9134 and #9372.

	// Timeouts, timeouts: always wrong!
	//
	// We have to pick some maximum time we're willing to wait for the `post_update`
	// hook to complete. In general this hook is responsible for resetting the
	// updated target to cause it to boot into its new version, but the details vary
	// wildly by device type (e.g., post-update RoT bootloader actions require
	// multiple RoT resets) and the expected amount of time also varies wildly
	// (e.g., resetting a gimlet SP takes a few seconds, resetting a sidecar SP can
	// take 10s of seconds, resetting a sled after a host OS update takes minutes).
	fn post_update_timeout(update: &PendingMgsUpdate) -> Duration {
	match &update.details {
	PendingMgsUpdateDetails::Sp { .. } => {
	// We're resetting an SP; use a generous timeout for sleds and power
	// shelf controllers (which should take a few seconds) and an even
	// more generous timeout for switches (which we've seen take 10-20
	// seconds in practice).
	match update.sp_type {
	SpType::Sled \| SpType::Power => Duration::from_secs(60),
	SpType::Switch => Duration::from_secs(120),
	}
	}
	PendingMgsUpdateDetails::Rot { .. } => {
	// Resetting the RoT should be quick (a few seconds), but we wait
	// for boot info after the reset.
	Duration::from_secs(90)
	}
	PendingMgsUpdateDetails::RotBootloader { .. } => {
	// Resetting the bootloader requires multiple RoT resets; give this
	// a longer timeout.
	Duration::from_secs(210)
	}
	PendingMgsUpdateDetails::HostPhase1(..) => {
	// Resetting a sled takes several minutes (mostly DRAM training);
	// give something very generous here to wait for it to come back.
	Duration::from_secs(30 * 60)
	}
	}
	}

	if try_reset {
	// We retry this until the component update has been successfully
	// updated, or we get some error other than a communication error or
	// some other transient error. There is intentionally no timeout here.
	// If we've staged an update but not managed to reset the device,
	// there's no point where we'd want to stop trying to do so.
	while let Err(error) =
	update_helper.post_update(log, &mut mgs_clients, update).await
	{
	if error.is_fatal() {
	let error = InlineErrorChain::new(&error);
	error!(log, "post_update failed"; &error);
	return Err(ApplyUpdateError::SpComponentResetFailed(
	error.to_string(),
	));
	}

	// We only care whether the update has completed. We ignore all
	// pre-check errors because they could all be transient if a reset
	// is in the process of happening.
	if let Ok(PrecheckStatus::UpdateComplete) =
	update_helper.precheck(log, &mut mgs_clients, update).await
	{
	break;
	}

	tokio::time::sleep(RESET_DELAY_INTERVAL).await;
	}
	}

	pub const WAIT_FOR_BOOT_INFO_TIMEOUT: Duration = Duration::from_secs(120);

	const WAIT_FOR_BOOT_INFO_INTERVAL: Duration = Duration::from_secs(10);

	/// Poll the RoT asking for its boot information. This confirms that the RoT has
	/// been succesfully reset
	pub async fn wait_for_boot_info(
	log: &Logger,
	mgs_clients: &mut MgsClients,
	sp_type: SpType,
	sp_slot: u16,
	timeout: Duration,
	) -> Result<RotState, PostUpdateError> {
	let before = Instant::now();
	loop {
	debug!(log, "waiting for boot info to confirm a successful reset");
	match mgs_clients
	.try_all_serially(log, \|mgs_client\| async move {
	mgs_client
	.sp_rot_boot_info(
	&sp_type,
	sp_slot,
	SpComponent::ROT.const_as_str(),
	&GetRotBootInfoParams {
	version: RotBootInfo::HIGHEST_KNOWN_VERSION,
	},
	)
	.await
	})
	.await
	{
	Ok(state) => match state.clone() {
	RotState::V2 { .. } \| RotState::V3 { .. } => {
	debug!(log, "successfuly retrieved boot info");
	return Ok(state.into_inner());
	}
	// The RoT is probably still booting
	RotState::CommunicationFailed { message } => {
	if before.elapsed() >= timeout {
	error!(
	log,
	"failed to get RoT boot info";
	"error" => %message
	);
	return Err(PostUpdateError::FatalError {
	error: message,
	});
	}

	info!(
	log,
	"failed getting RoT boot info (will retry)";
	"error" => %message,
	);
	tokio::time::sleep(WAIT_FOR_BOOT_INFO_INTERVAL).await;
	}
	},
	// The RoT might still be booting
	Err(error) => {
	let e = InlineErrorChain::new(&error);
	if before.elapsed() >= timeout {
	error!(
	log,
	"failed to get RoT boot info";
	&e,
	);
	return Err(PostUpdateError::FatalError {
	error: e.to_string(),
	});
	}

	info!(
	log,
	"failed getting RoT boot info (will retry)";
	e,
	);
	tokio::time::sleep(WAIT_FOR_BOOT_INFO_INTERVAL).await;
	}
	}
	}
	}

	/// Poll the RoT asking for its boot information. This is used to check
	/// the state for RoT bootloader image errors after RoT is reset
	async fn wait_for_stage0_next_image_check(
	log: &Logger,
	mgs_clients: &mut MgsClients,
	sp_type: SpType,
	sp_slot: u16,
	timeout: Duration,
	) -> Result<Option<RotImageError>, PostUpdateError> {
	debug!(log, "attempting to verify image validity");
	match wait_for_boot_info(log, mgs_clients, sp_type, sp_slot, timeout).await
	{
	Ok(state) => match state {
	// The minimum we will ever return is v3.
	// Additionally, V2 does not report image errors, so we cannot
	// know with certainty if a signature check came back with errors
	RotState::V2 { .. } => {
	let error = "unexpected RoT version: 2".to_string();
	error!(
	log,
	"failed to get RoT boot info";
	"error" => &error
	);
	return Err(PostUpdateError::FatalError { error });
	}
	RotState::V3 { stage0next_error, .. } => {
	debug!(log, "successfully completed an image signature check");
	return Ok(stage0next_error);
	}
	// This is unreachable because wait_for_boot_info loops for some
	// time if it encounters `CommunicationFailed`, and if it hits the
	// timeout, it will return an error.
	RotState::CommunicationFailed { message } => {
	error!(
	log,
	"failed to get RoT boot info";
	"error" => %message
	);
	return Err(PostUpdateError::FatalError { error: message });
	}
	},
	Err(error) => return Err(error),
	}
	}

	// We wait for boot info to ensure a successful reset
	wait_for_boot_info(
	log,
	mgs_clients,
	update.sp_type,
	update.slot_id,
	WAIT_FOR_BOOT_INFO_TIMEOUT,
	)
	.await?;

Add cosmo support to reconfigurator #9134

Add cosmo support to reconfigurator #9134

Uh oh!

Conversation

labbott commented Oct 2, 2025

Uh oh!

Uh oh!

karencfv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

labbott commented Nov 4, 2025

Uh oh!

jclulow commented Nov 6, 2025

Uh oh!

labbott commented Nov 10, 2025

Uh oh!

karencfv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgallagher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

karencfv left a comment •

edited

Loading