Cleanup Crucible resources with volume delete saga #1706

jmpesp · 2022-09-13T18:56:53Z

Add a volume delete saga, and call that whenever a volume needs to be deleted. Crucible resources now have accounting for their use, and when the number of references reaches zero then cleanup occurs.

Adds a table to store created region snapshots as part of this accounting, and make a record of when a Crucible snapshot is taken of a region. Region snapshots are uniquely identified by (dataset_id, region_id, snapshot_id), and additionally contain a snapshot address for identification as part of a volume construction request.

All DELETE calls to Crucible Agent(s) are now only performed by the volume delete saga.

Add tests for cases where if Nexus is properly accounting for snapshots. It should not delete the disk's corresponding region when the disk is deleted because a snapshot exists, or if multiple disks reference a running snapshot.

A small change to the simulated Crucible agent is also included in this commit - logic was moved out of the HTTP endpoint function into the Crucible struct. Logic doesn't belong in HTTP endpoint functions :)

Closes #1632

Add a volume delete saga, and call that whenever a volume needs to be deleted. Crucible resources now have accounting for their use, and when the number of references reaches zero then cleanup occurs. Adds a table to store created region snapshots as part of this accounting, and make a record of when a Crucible snapshot is taken of a region. Region snapshots are uniquely identified by (dataset_id, region_id, snapshot_id), and additionally contain a snapshot address for identification as part of a volume construction request. All DELETE calls to Crucible Agent(s) are now only performed by the volume delete saga. Add tests for cases where if Nexus is properly accounting for snapshots. It should not delete the disk's corresponding region when the disk is deleted because a snapshot exists, or if multiple disks reference a running snapshot. A small change to the simulated Crucible agent is also included in this commit - logic was moved out of the HTTP endpoint function into the Crucible struct. Logic doesn't belong in HTTP endpoint functions :) Closes oxidecomputer#1632

common/src/sql/dbinit.sql

nexus/src/app/sagas/disk_delete.rs

nexus/src/db/datastore/region_snapshot.rs

nexus/tests/integration_tests/volume_management.rs

common/src/sql/dbinit.sql

nexus/src/app/sagas/volume_delete.rs

smklein · 2022-09-26T20:58:41Z

nexus/src/db/datastore/volume.rs


-        let now = Utc::now();
-        diesel::update(dsl::volume)
+        diesel::delete(dsl::volume)


If we are no longer soft-deleting, is it even possible to have a Not Found error returned? I see we're handling it below, but wouldn't that just look like "no rows updated" now?

We're doing both - volumes are either soft deleted (if there are associated regions) and hard deleted (when those regions are cleaned up). I don't think I understand this comment though - volume_hard_delete doesn't filter on anything except the id, so it should find it every time.

I think my only beef here was with the error handling, where we handle the NotFoundByLookup case below. If we issue this delete request when the volume doesn't exist, will a NotFound error actually be returned?

https://docs.diesel.rs/master/diesel/result/enum.Error.html#variant.NotFound mentions this:

No rows were returned by a query expected to return at least one row. This variant is only returned by get_result and first. load does not treat 0 rows as an error. If you would like to allow either 0 or 1 rows, call optional on the result.

I'm not sure what happens when we make this call with execute - it doesn't seem like an error for no rows to be updated.

nexus/src/app/disk.rs

nexus/src/db/datastore/volume.rs

jmpesp · 2022-09-28T13:08:45Z

I believe I've addressed all the comments, let me know.

smklein

Ship it! Thanks for the patches.

jmpesp added 2 commits September 13, 2022 14:55

fmt and clippy

e1b4dc7

smklein self-requested a review September 26, 2022 17:50

smklein self-assigned this Sep 26, 2022

smklein approved these changes Sep 26, 2022

View reviewed changes

jmpesp added 6 commits September 26, 2022 20:10

avoid the unwrap

5979cd6

version CrucibleResources in the DB as an enum

9692cac

rename delete-volume-record to delete-volume

e2d7b1d

use on_conflict_do_nothing instead

e300f8d

assert when crucible resources not deleted too

33468d4

expand comment for Nexus::volume_delete

62e5416

smklein approved these changes Sep 28, 2022

View reviewed changes

jmpesp merged commit c13f1df into oxidecomputer:main Sep 28, 2022

jmpesp deleted the manage_crucible_resources branch September 28, 2022 17:19

jmpesp mentioned this pull request Oct 7, 2022

Resource Utilization #1782

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cleanup Crucible resources with volume delete saga #1706

Cleanup Crucible resources with volume delete saga #1706

Uh oh!

jmpesp commented Sep 13, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smklein Sep 26, 2022

Uh oh!

jmpesp Sep 27, 2022

Uh oh!

smklein Sep 28, 2022

Uh oh!

Uh oh!

Uh oh!

jmpesp commented Sep 28, 2022

Uh oh!

smklein left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cleanup Crucible resources with volume delete saga #1706

Cleanup Crucible resources with volume delete saga #1706

Uh oh!

Conversation

jmpesp commented Sep 13, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smklein Sep 26, 2022

Choose a reason for hiding this comment

Uh oh!

jmpesp Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

smklein Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jmpesp commented Sep 28, 2022

Uh oh!

smklein left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants