Skip to content

Allow snapshot tap changes #4731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ and this project adheres to

- [#5065](https://github.com/firecracker-microvm/firecracker/pull/5065) Added
support for Intel AMX (Advanced Matrix Extensions).
- [#4731](https://github.com/firecracker-microvm/firecracker/pull/4731): Added
support for modifying the host TAP device name during snapshot restore.

### Changed

Expand Down
57 changes: 57 additions & 0 deletions docs/snapshotting/network-for-clones.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,63 @@ Otherwise, packets originating from the guest might be using old Link Layer
Address for up to arp cache timeout seconds. After said timeout period,
connectivity will work both ways even without an explicit flush.

### Renaming host device names

In some environments where the jailer is not being used, restoring a snapshot
may be tricky because the tap device on the host will not be the same as the tap
device that the original VM was mapped to when it was snapshotted, for example
when the tap device comes from a pool of such devices.

In this case you can use the `network_overrides` parameter of the snapshot
restore API to specify which guest network device maps to which host tap device.

For example, if we have a network interface named `eth0` in the snapshotted
microVM, we can override it to point to the host device `vmtap01` during
snapshot resume, like this:

```bash
curl --unix-socket /tmp/firecracker.socket -i \
-X PUT 'http://localhost/snapshot/load' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"snapshot_path": "./snapshot_file",
"mem_backend": {
"backend_path": "./mem_file",
"backend_type": "File"
},
"network_overrides": [
{
"iface_id": "eth0",
"host_dev_name": "vmtap01"
}
]
}'
```

This may require reconfiguration of the networking inside the VM so that it is
still routable externally.
[network setup documentation](../network-setup.md#in-the-guest) describes what
the typical setup is. If you are not using network namespaces or the jailer,
then the guest will have to be made aware (via vsock or other channel) that it
needs to reconfigure its network to match the network configured on the tap
device.

If the new TAP device, say `vmtap3` has been configured to use a guest address
of `172.16.3.2` then after snapshot restore you would run something like:

```bash
# In the guest

# Clear out the previous addr and route
ip addr flush dev eth0
ip route flush dev eth0

# Configure the new address
ip addr add 172.16.3.2/30 dev eth0
ip route add default via 172.16.3.1/30 dev eth0
```

# Ingress connectivity

The above setup only provides egress connectivity. If in addition we also want
Expand Down
46 changes: 45 additions & 1 deletion src/firecracker/src/api_server/request/snapshot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ fn parse_put_snapshot_load(body: &Body) -> Result<ParsedRequest, RequestError> {
mem_backend,
enable_diff_snapshots: snapshot_config.enable_diff_snapshots,
resume_vm: snapshot_config.resume_vm,
network_overrides: snapshot_config.network_overrides,
};

// Construct the `ParsedRequest` object.
Expand All @@ -120,7 +121,7 @@ fn parse_put_snapshot_load(body: &Body) -> Result<ParsedRequest, RequestError> {

#[cfg(test)]
mod tests {
use vmm::vmm_config::snapshot::{MemBackendConfig, MemBackendType};
use vmm::vmm_config::snapshot::{MemBackendConfig, MemBackendType, NetworkOverride};

use super::*;
use crate::api_server::parsed_request::tests::{depr_action_from_req, vmm_action_from_request};
Expand Down Expand Up @@ -181,6 +182,7 @@ mod tests {
},
enable_diff_snapshots: false,
resume_vm: false,
network_overrides: vec![],
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -210,6 +212,7 @@ mod tests {
},
enable_diff_snapshots: true,
resume_vm: false,
network_overrides: vec![],
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -239,6 +242,46 @@ mod tests {
},
enable_diff_snapshots: false,
resume_vm: true,
network_overrides: vec![],
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
parsed_request
.parsing_info()
.take_deprecation_message()
.is_none()
);
assert_eq!(
vmm_action_from_request(parsed_request),
VmmAction::LoadSnapshot(expected_config)
);

let body = r#"{
"snapshot_path": "foo",
"mem_backend": {
"backend_path": "bar",
"backend_type": "Uffd"
},
"resume_vm": true,
"network_overrides": [
{
"iface_id": "eth0",
"host_dev_name": "vmtap2"
}
]
}"#;
let expected_config = LoadSnapshotParams {
snapshot_path: PathBuf::from("foo"),
mem_backend: MemBackendConfig {
backend_path: PathBuf::from("bar"),
backend_type: MemBackendType::Uffd,
},
enable_diff_snapshots: false,
resume_vm: true,
network_overrides: vec![NetworkOverride {
iface_id: String::from("eth0"),
host_dev_name: String::from("vmtap2"),
}],
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand All @@ -265,6 +308,7 @@ mod tests {
},
enable_diff_snapshots: false,
resume_vm: true,
network_overrides: vec![],
};
let parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert_eq!(
Expand Down
24 changes: 24 additions & 0 deletions src/firecracker/swagger/firecracker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1216,6 +1216,24 @@ definitions:
Type of snapshot to create. It is optional and by default, a full
snapshot is created.

NetworkOverride:
type: object
description:
Allows for changing the backing TAP device of a network interface
during snapshot restore.
required:
- iface_id
- host_dev_name
properties:
iface_id:
type: string
description:
The name of the interface to modify
host_dev_name:
type: string
description:
The new host device of the interface

SnapshotLoadParams:
type: object
description:
Expand Down Expand Up @@ -1247,6 +1265,12 @@ definitions:
type: boolean
description:
When set to true, the vm is also resumed if the snapshot load is successful.
network_overrides:
type: array
description: Network host device names to override
items:
$ref: "#/definitions/NetworkOverride"


TokenBucket:
type: object
Expand Down
4 changes: 2 additions & 2 deletions src/vmm/src/devices/virtio/net/persist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ impl RxBufferState {
/// at snapshot.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NetState {
id: String,
tap_if_name: String,
pub id: String,
pub tap_if_name: String,
rx_rate_limiter_state: RateLimiterState,
tx_rate_limiter_state: RateLimiterState,
/// The associated MMDS network stack.
Expand Down
18 changes: 17 additions & 1 deletion src/vmm/src/persist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,21 @@
params: &LoadSnapshotParams,
vm_resources: &mut VmResources,
) -> Result<Arc<Mutex<Vmm>>, RestoreFromSnapshotError> {
let microvm_state = snapshot_state_from_file(&params.snapshot_path)?;
let mut microvm_state = snapshot_state_from_file(&params.snapshot_path)?;
for entry in &params.network_overrides {
let net_devices = &mut microvm_state.device_states.net_devices;
if let Some(device) = net_devices
.iter_mut()
.find(|x| x.device_state.id == entry.iface_id)
{
device
.device_state
.tap_if_name
.clone_from(&entry.host_dev_name);
} else {
return Err(SnapshotStateFromFileError::UnknownNetworkDevice.into());

Check warning on line 420 in src/vmm/src/persist.rs

View check run for this annotation

Codecov / codecov/patch

src/vmm/src/persist.rs#L410-L420

Added lines #L410 - L420 were not covered by tests
}
}
let track_dirty_pages = params.enable_diff_snapshots;

let vcpu_count = microvm_state
Expand Down Expand Up @@ -480,6 +494,8 @@
Meta(std::io::Error),
/// Failed to load snapshot state from file: {0}
Load(#[from] crate::snapshot::SnapshotError),
/// Unknown Network Device.
UnknownNetworkDevice,
}

fn snapshot_state_from_file(
Expand Down
1 change: 1 addition & 0 deletions src/vmm/src/rpc_interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1269,6 +1269,7 @@ mod tests {
},
enable_diff_snapshots: false,
resume_vm: false,
network_overrides: vec![],
},
)));
check_unsupported(runtime_request(VmmAction::SetEntropyDevice(
Expand Down
15 changes: 15 additions & 0 deletions src/vmm/src/vmm_config/snapshot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,16 @@ pub struct CreateSnapshotParams {
pub mem_file_path: PathBuf,
}

/// Allows for changing the mapping between tap devices and host devices
/// during snapshot restore
#[derive(Debug, PartialEq, Eq, Deserialize)]
pub struct NetworkOverride {
/// The index of the interface to modify
pub iface_id: String,
/// The new name of the interface to be assigned
pub host_dev_name: String,
}

/// Stores the configuration that will be used for loading a snapshot.
#[derive(Debug, PartialEq, Eq)]
pub struct LoadSnapshotParams {
Expand All @@ -60,6 +70,8 @@ pub struct LoadSnapshotParams {
/// When set to true, the vm is also resumed if the snapshot load
/// is successful.
pub resume_vm: bool,
/// The network devices to override on load.
pub network_overrides: Vec<NetworkOverride>,
}

/// Stores the configuration for loading a snapshot that is provided by the user.
Expand All @@ -82,6 +94,9 @@ pub struct LoadSnapshotConfig {
/// Whether or not to resume the vm post snapshot load.
#[serde(default)]
pub resume_vm: bool,
/// The network devices to override on load.
#[serde(default)]
pub network_overrides: Vec<NetworkOverride>,
}

/// Stores the configuration used for managing snapshot memory.
Expand Down
2 changes: 2 additions & 0 deletions src/vmm/tests/integration_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ fn verify_load_snapshot(snapshot_file: TempFile, memory_file: TempFile) {
},
enable_diff_snapshots: false,
resume_vm: true,
network_overrides: vec![],
}))
.unwrap();

Expand Down Expand Up @@ -344,6 +345,7 @@ fn verify_load_snap_disallowed_after_boot_resources(res: VmmAction, res_name: &s
},
enable_diff_snapshots: false,
resume_vm: false,
network_overrides: vec![],
});
let err = preboot_api_controller.handle_preboot_request(req);
assert!(
Expand Down
17 changes: 17 additions & 0 deletions tests/framework/microvm.py
Original file line number Diff line number Diff line change
Expand Up @@ -974,6 +974,7 @@ def restore_from_snapshot(
snapshot: Snapshot,
resume: bool = False,
uffd_path: Path = None,
rename_interfaces: dict = None,
):
"""Restore a snapshot"""
jailed_snapshot = snapshot.copy_to_chroot(Path(self.chroot()))
Expand Down Expand Up @@ -1001,11 +1002,27 @@ def restore_from_snapshot(
# Adjust things just in case
self.kernel_file = Path(self.kernel_file)

iface_overrides = []
if rename_interfaces is not None:
iface_overrides = [
{"iface_id": k, "host_dev_name": v}
for k, v in rename_interfaces.items()
]

optional_kwargs = {}
if iface_overrides:
# For backwards compatibility ab testing we want to avoid adding
# new parameters until we have a release baseline with the new
# parameter. Once the release baseline has moved, this assignment
# can be inline in the snapshot_load command below
optional_kwargs["network_overrides"] = iface_overrides

self.api.snapshot_load.put(
mem_backend=mem_backend,
snapshot_path=str(jailed_vmstate),
enable_diff_snapshots=snapshot.is_diff,
resume_vm=resume,
**optional_kwargs,
)
# This is not a "wait for boot", but rather a "VM still works after restoration"
if snapshot.net_ifaces and resume:
Expand Down
29 changes: 29 additions & 0 deletions tests/integration_tests/functional/test_snapshot_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,22 @@
# SPDX-License-Identifier: Apache-2.0
"""Basic tests scenarios for snapshot save/restore."""

import dataclasses
import filecmp
import logging
import os
import platform
import re
import shutil
import time
import uuid
from pathlib import Path

import pytest

import host_tools.cargo_build as host
import host_tools.drive as drive_tools
import host_tools.network as net_tools
from framework import utils
from framework.microvm import SnapshotType
from framework.properties import global_props
Expand Down Expand Up @@ -570,3 +573,29 @@ def test_physical_counter_reset_aarch64(uvm_nano):
break
else:
raise RuntimeError("Did not find CNTPCT_EL0 register in snapshot")


def test_snapshot_rename_interface(uvm_nano, microvm_factory):
"""
Test that we can restore a snapshot and point its interface to a
different host interface.
"""
vm = uvm_nano
base_iface = vm.add_net_iface()
vm.start()
snapshot = vm.snapshot_full()

# We don't reuse the network namespace as it may conflict with
# previous/future devices
restored_vm = microvm_factory.build(netns=net_tools.NetNs(str(uuid.uuid4())))
# Override the tap name, but keep the same IP configuration
iface_override = dataclasses.replace(base_iface, tap_name="tap_override")

restored_vm.spawn()
snapshot.net_ifaces.clear()
snapshot.net_ifaces.append(iface_override)
restored_vm.restore_from_snapshot(
snapshot,
rename_interfaces={iface_override.dev_name: iface_override.tap_name},
resume=True,
)