Skip to content

Snapshot deletion returns 500 error and subsequent disk deletion also results in 500 error afterwards #3866

@askfongjojo

Description

@askfongjojo

After fixes related to #824 and #3698, once I've deleted a snapshot (which did remove the snapshot but return a 500 error), subsequent disk deletion still returns this kind of errors:

root@oxz_nexus_1d4b924f-3725-4636-92fe-c5412426018b:~# grep 'InternalError' `svcs -L nexus` | looker
16:46:41.376Z INFO 1d4b924f-3725-4636-92fe-c5412426018b (dropshot_external): request completed
    error_message_external = Internal Server Error
    error_message_internal = saga ACTION error at node "no_result_2": deserialize failed: unknown variant `failed to delete_crucible_running_snapshots: InvalidRequest { message: "Not Found" }`, expected one of `ObjectNotFound`, `ObjectAlreadyExists`, `InvalidRequest`, `Unauthenticated`, `InvalidValue`, `Forbidden`, `InternalError`, `ServiceUnavailable`, `MethodNotAllowed`, `TypeVersionMismatch`, `Conflict`
    file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/7622dc0/dropshot/src/server.rs:839
    local_addr = 172.30.2.5:443
    method = DELETE
    remote_addr = 172.20.16.218:38978
    req_id = 8372cb4d-ae55-430d-8165-c63e82bc8f77
    response_code = 500
    uri = /v1/snapshots/db76c742-c2e9-45ce-b72c-5c49c3153fcf
16:46:47.938Z INFO 1d4b924f-3725-4636-92fe-c5412426018b (dropshot_external): request completed
    error_message_external = Internal Server Error
    error_message_internal = saga ACTION error at node "no_result_1": deserialize failed: unknown variant `failed to delete_crucible_regions: InvalidRequest { message: "must delete snapshots first!" }`, expected one of `ObjectNotFound`, `ObjectAlreadyExists`, `InvalidRequest`, `Unauthenticated`, `InvalidValue`, `Forbidden`, `InternalError`, `ServiceUnavailable`, `MethodNotAllowed`, `TypeVersionMismatch`, `Conflict`
    file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/7622dc0/dropshot/src/server.rs:839
    local_addr = 172.30.2.5:443
    method = DELETE
    remote_addr = 172.20.16.218:38990
    req_id = 0a966210-729d-442c-a1f2-c8ad18a8d5cc
    response_code = 500
    uri = /v1/disks/20edc1c4-d4de-499a-b508-80cb0edb5b2a
17:15:45.059Z INFO 1d4b924f-3725-4636-92fe-c5412426018b (dropshot_external): request completed
    error_message_external = Internal Server Error
    error_message_internal = saga ACTION error at node "no_result_2": deserialize failed: unknown variant `failed to delete_crucible_running_snapshots: InvalidRequest { message: "Not Found" }`, expected one of `ObjectNotFound`, `ObjectAlreadyExists`, `InvalidRequest`, `Unauthenticated`, `InvalidValue`, `Forbidden`, `InternalError`, `ServiceUnavailable`, `MethodNotAllowed`, `TypeVersionMismatch`, `Conflict`
    file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/7622dc0/dropshot/src/server.rs:839
    local_addr = 172.30.2.5:443
    method = DELETE
    remote_addr = 172.20.16.218:55358
    req_id = 9de10d6a-9986-44a4-b290-4caad0342256
    response_code = 500
    uri = https://mercury.sys.rack2.eng.oxide.computer/v1/disks/test-disk-del-500?project=test-project

Despite the errors, disks were deleted in the end (at least from a user perspective) but @augustuswm observed that virtual disk bytes provisioned did not get updated afterwards, e.g.

// Pre    => c67808de-deee-4b96-8aef-a461c759f098 | 2023-08-10 16:46:47.914542+00 | Project         |                   128849018880 |                0 |               0
// Create => c67808de-deee-4b96-8aef-a461c759f098 | 2023-08-10 17:08:46.548842+00 | Project         |                   139586437120 |                0 |               0
// Delete => c67808de-deee-4b96-8aef-a461c759f098 | 2023-08-10 17:15:45.038063+00 | Project         |                   139586437120 |                0 |               0

Metadata

Metadata

Assignees

Labels

known issueTo include in customer documentation and training

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions