Skip to content

Adding runc jailing #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.idea/
bin/
runtime/logs
*stamp
5 changes: 4 additions & 1 deletion examples/etc/containerd/firecracker-runtime.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@
"root_drive": "/var/lib/firecracker-containerd/runtime/default-rootfs.img",
"cpu_count": 1,
"cpu_template": "T2",
"log_level": "Debug"
"log_level": "Debug",
"jailer": {
"runc_binary_path": "/usr/local/bin/runc"
}
}
107 changes: 74 additions & 33 deletions proto/firecracker.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion proto/firecracker.proto
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ message CreateVMRequest {

// Whether the VM should exit after all tasks running in it have been deleted.
bool ExitAfterAllTasksDeleted = 9;

JailerConfig JailerConfig = 10;
}

message StopVMRequest {
Expand All @@ -54,4 +56,7 @@ message GetVMInfoResponse {
message SetVMMetadataRequest {
string VMID = 1;
string Metadata = 2;
}
}

message JailerConfig {
}
9 changes: 8 additions & 1 deletion runtime/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,17 @@ type Config struct {
LogLevel string `json:"log_level"`
HtEnabled bool `json:"ht_enabled"`
Debug bool `json:"debug"`

// If a CreateVM call specifies no network interfaces and DefaultNetworkInterfaces is non-empty,
// the VM will default to using the network interfaces as specified here. This is especially
// useful when a CNI-based network interface is provided in DefaultNetworkInterfaces.
DefaultNetworkInterfaces []proto.FirecrackerNetworkInterface `json:"default_network_interfaces"`
JailerConfig JailerConfig `json:"jailer"`
}

// JailerConfig houses a set of configurable values for jailing
// TODO: Add netns field
type JailerConfig struct {
RuncBinaryPath string `json:"runc_binary_path"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does the current schema allow us to specify network namepsace per VM? If not, can we please modify the interface to be able to do that? Doesn't need to be covered in this particular PR. We can follow it up in a separate issue as well.
  2. Does the caller get a handle back to the cgroup hierarchies created by the jailer? I'm mostly asking because for VMs that require fractional CPUs, the orchestrator process would want to enforce these by modifying cpu quota & period settings for the vmm process
  3. How does the caller specify the numa node for the VM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xibz Can you answer @aaithal's questions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the current schema allow us to specify network namepsace per VM? If not, can we please modify the interface to be able to do that? Doesn't need to be covered in this particular PR. We can follow it up in a separate issue as well.

Currently no. That'll be added as a feature later.

Does the caller get a handle back to the cgroup hierarchies created by the jailer? I'm mostly asking because for VMs that require fractional CPUs, the orchestrator process would want to enforce these by modifying cpu quota & period settings for the vmm process

This does not currently return the cgroup path. We can add this as a feature request.

How does the caller specify the numa node for the VM?

Currently no numa node can be specified. We've talked about this offline and have decided to be able to specify the CPUs and mem nodes as a better option.

Please see here for a list of what is to come, #258

}

// LoadConfig loads configuration from JSON file at 'path'
Expand All @@ -76,6 +82,7 @@ func LoadConfig(path string) (*Config, error) {
CPUCount: defaultCPUCount,
CPUTemplate: string(defaultCPUTemplate),
}

if err := json.Unmarshal(data, cfg); err != nil {
return nil, errors.Wrapf(err, "failed to unmarshal config from %q", path)
}
Expand Down
12 changes: 11 additions & 1 deletion runtime/drive_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,10 @@ type stubDriveHandler struct {
mutex sync.Mutex
}

func newStubDriveHandler(path string, logger *logrus.Entry, count int) (*stubDriveHandler, error) {
// stubDrivesOpt is used to make and modify changes to the stub drives.
type stubDrivesOpt func(stubDrives []models.Drive) error

func newStubDriveHandler(path string, logger *logrus.Entry, count int, opts ...stubDrivesOpt) (*stubDriveHandler, error) {
h := stubDriveHandler{
RootPath: path,
logger: logger,
Expand All @@ -60,6 +63,13 @@ func newStubDriveHandler(path string, logger *logrus.Entry, count int) (*stubDri
if err != nil {
return nil, err
}

for _, opt := range opts {
if err := opt(drives); err != nil {
h.logger.WithError(err).Debug("failed to apply option to stub drives")
return nil, err
}
}
h.drives = drives
return &h, nil
}
Expand Down
130 changes: 130 additions & 0 deletions runtime/firecracker-runc-config.json.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How closely have you validated this config against the set of actions that Firecracker's jailer takes? In particular:

  • Are the capabilities the same and in the same sets?
  • Are the cgroups the same?
  • Are the namespaces the same? I saw that @sipsma already called out the missing "user" namespace.
  • Are the mounts the same?
  • Are the devices the same?
  • Are the masked paths the same?

Copy link
Contributor Author

@xibz xibz Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelkarp -

Are the capabilities the same and in the same sets?

Firecracker's jailer doesn't explicitly use capabilities. They rely on seccomp and inherits its capabilities from the jailer binary

Are the cgroups the same?

cgroups are all inherited by the parent which reflects what Firecracker's jailer does.

Are the namespaces the same? I saw that @sipsma already called out the missing "user" namespace.

Firecracker's jailer only uses the mount and pid namespaces. And has the option to enter a pre-existing network namespace.

Are the mounts the same?

No, Firecracker's mounts are a lot more simple, as in they don't mount anything. In our runc config we specify a bunch of different mounts, like /dev.

Are the devices the same?

Yes, with the ones we provide. runc however will create some other devices, like /dev/null

Are the masked paths the same?

The masked paths that are used are the default ones provided by runc. I will take a look into what Firecracker's jailer does in terms of this
Looks like Firecracker does not mask any paths when jailing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our initial integration, let's match what Firecracker's jailer does rather than match runc's defaults.

"ociVersion": "1.0.1",
"process": {
"terminal": false,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/firecracker",
"--api-sock",
"api.socket"
],
"env": [
"PATH=/"
],
"cwd": "/",
"capabilities": {
"effective": [
],
"bounding": [
],
"inheritable": [
],
"permitted": [
],
"ambient": [
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs",
"readonly": false
},
"hostname": "runc",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
}
],
"linux": {
"devices": [
{
"path": "/dev/kvm",
"type": "c",
"major": 10,
"minor": 232,
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/net/tun",
"type": "c",
"major": 10,
"minor": 200,
"fileMode": 438,
"uid": 0,
"gid": 0
}
],
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
},
{
"allow": true,
"major": 10,
"minor": 232,
"access": "rwm"
},
{
"allow": true,
"major": 10,
"minor": 200,
"access": "rwm"
}
]
},
"namespaces": [
{
"type": "cgroup"
},
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
],
"maskedPaths": [
"/proc/asound",
"/proc/kcore",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
Loading