-
Notifications
You must be signed in to change notification settings - Fork 200
Adding runc jailing #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding runc jailing #249
Conversation
05dbbcc
to
06f16de
Compare
runtime/config.go
Outdated
var ( | ||
// ErrNoUIDProvided returns when no UID was specified in the configuration | ||
// file | ||
ErrNoUIDProvided = fmt.Errorf("no uid provided in configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors.New?
runtime/jailer.go
Outdated
} | ||
|
||
logger.Debugf("chdir to %v", jailPath) | ||
if err := os.Chdir(jailPath); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we execute this chdir after fork()? chdir is having process-wide effect. It is better to not depend on the working directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must occur since we are using runc
as a jailer. I do not know of another way of doing this. So if you have any answers let me know :p.
Another option is to add cd $JAILPATH && runc run <id>
and that occurs in the fork. So that may be a good option.
Looks like you can specify the directory that the command is ran in!
4865958
to
b39da6d
Compare
Looks like the major and minor versions are different for |
runtime/jailer.go
Outdated
Fn: func(ctx context.Context, m *firecracker.Machine) error { | ||
// Create the proper paths needed for the runc jailer | ||
logger.Debugf("Creating root drive path %v", rootPath) | ||
if err := os.MkdirAll(rootPath, 0777); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using 0700? A bit scary to see the directory is writable for everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"debug": true, | ||
"metrics_fifo": "/tmp/fc-metrics.fifo", | ||
"jailer": { | ||
"uid": 123, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the names instead of the IDs? According to runtime-spec
Note: symbolic name for uid and gid, such as uname and gname respectively, are left to upper levels to derive (i.e. /etc/passwd parsing, NSS, etc)
Except for root (uid=0), it is hard to guess the meaning of the IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since runc uses uid
and gid
, I'd rather stick with that to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. runc and the spec are using uid/gid. There was a discussion regarding incorporating names, but that was dead-end.
containerd's config.toml is also using uid/gid on [grpc]
and [debug]
The only exception is https://github.com/containerd/containerd/pull/2257/files, which can accept non-numerical IDs.
Can we assume that most of customers are using 0/0? My concern is that the numerical IDs are not stable and would point something else unintentionally between distro changes and/or AMI updates.
c4b23ff
to
cd7e683
Compare
It's somewhat worse than that. Kernels older than 4.13 don't have a fixed minor node number for vhost-vsock, so it's unpredictable. More modern kernels have 241 hardcoded, so we can rely on this. We really should get our CI system onto a newer kernel. |
"debug": true, | ||
"metrics_fifo": "/tmp/fc-metrics.fifo", | ||
"jailer": { | ||
"uid": 123, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firecracker's recommended production host configuration configuration involves running each MicroVM as a unique UID/GID pair:
When running multiple Firecracker instances it is recommended that each runs with its unique uid and gid to provide an extra layer of security for their individually owned resources in the unlikely case where any one of the jails is broken out of.
We should ensure that we follow that recommendation. Note that the easiest way to do this might be to take advantage of user namespaces, as suggested by @sipsma above
tools/docker/Dockerfile
Outdated
@@ -142,7 +142,8 @@ FROM debian:stretch as firecracker-vm-root-builder | |||
COPY --from=firecracker-vm-root / /vm | |||
RUN mkdir -p /output \ | |||
&& cd /output \ | |||
&& mkfs.ext4 -d /vm vm.ext4 65536 | |||
&& mkfs.ext4 -d /vm vm.ext4 65536 \ | |||
&& chown 0666 vm.ext4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why mode 0666?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any files provided to the jail need to be readable, another options was to change the uid
and gid
of the file, but that didn't seem like the behaviour we'd want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but mode 0666 does much more than make the file readable.
- It makes it writable
- It makes it read/writable for everybody that can see it
If we're going to copy files around, then we should be setting owner and group bits to restrict them to the specific uid/gid needed, and we should be using the most restrictive permissions we can get away with. If we're going to bind mount files, then we should make sure that the mount is read-only.
Rather than setting file modes, you might consider using ACLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we didn't need to do this. I was running into permission issues earlier because I didn't call the chownr
function I had created. But it looks like this is no longer needed.
"linux": { | ||
"devices": [ | ||
{ | ||
"path": "/dev/kvm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is some mixed up whitespace in this file. It's not visible in github, but you've got some spaces and tabs in this section. If you view this in an editor and tweak your tab width, you'll see it. Probably best to simply filter this file through 'jq' and let it handle the whitespace cleanup.
runtime/service.go
Outdated
s.config.JailerConfig.GID = firecracker.Int(int(request.JailerConfig.GID)) | ||
} | ||
} | ||
if !s.config.JailerConfig.DisableJailing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if, instead of having a boolean for toggling jailing on/off, we might be able to support a more generic jailer. By defining a jailer interface, we could potentially support multiple types of jailers. Initially, we could support two jailers: a null jailer as a passthrough that would be the equavalent of having jailing disabled, and a runc jailer, as currently implemented. In the future, if we want to implement support for using firecracker's own jailer, or some other tool, we've then got a framework for doing so.
This would let us provide alternative implementations of the various jailer functions to facilitate testing, etc. and would reduce the amount of conditional code in service.go
runtime/config.go
Outdated
@@ -46,6 +47,15 @@ type Config struct { | |||
LogLevel string `json:"log_level"` | |||
HtEnabled bool `json:"ht_enabled"` | |||
Debug bool `json:"debug"` | |||
|
|||
JailerConfig JailerConfig `json:"jailer"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think this needs to be separated from the other config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was talking about the blank line. You can move JailerConfig
inline with the other fields instead of having a blank line in between them.
runtime/config.go
Outdated
|
||
// Validate ensures that the configuration file has valid values | ||
func (c *Config) Validate() error { | ||
if !c.JailerConfig.DisableJailing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It'd be a bit easier to read like this:
if c.JailerConfig.DisableJailing {
return nil
}
// other checks
"CAP_MKNOD" | ||
], | ||
"ambient": [ | ||
"CAP_NET_ADMIN", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these to be ambient? Do we expect non-privileged programs to need to execute operations bound by these capabilities?
Ambient (since Linux 4.3):
This is a set of capabilities that are preserved across an execve(2) of a program that is not privileged. The ambient capability set obeys the invariant that no capability can ever be ambient if it is not both permitted and inheritable.The ambient capability set can be directly modified using prctl(2). Ambient capabilities are automatically lowered if either of the corresponding permitted or inheritable capabilities is lowered.
Executing a program that changes UID or GID due to the set-user-ID or set-group-ID bits or executing a program that has any file capabilities set will clear the ambient set. Ambient capabilities are added to the permitted set and assigned to the effective set when execve(2) is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I tried removing it, but I get this error
&{FaultMessage:Cannot open TAP device. Invalid name/permissions. CreateTap(Os { code: 1, kind: PermissionDenied, message: Operation not permitted })}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried removing which?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the ambient
capability CAP_NET_ADMIN
and received that error.
@@ -0,0 +1,162 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How closely have you validated this config against the set of actions that Firecracker's jailer takes? In particular:
- Are the capabilities the same and in the same sets?
- Are the cgroups the same?
- Are the namespaces the same? I saw that @sipsma already called out the missing "user" namespace.
- Are the mounts the same?
- Are the devices the same?
- Are the masked paths the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the capabilities the same and in the same sets?
Firecracker's jailer doesn't explicitly use capabilities. They rely on seccomp and inherits its capabilities from the jailer binary
Are the cgroups the same?
cgroups are all inherited by the parent which reflects what Firecracker's jailer does.
Are the namespaces the same? I saw that @sipsma already called out the missing "user" namespace.
Firecracker's jailer only uses the mount and pid namespaces. And has the option to enter a pre-existing network namespace.
Are the mounts the same?
No, Firecracker's mounts are a lot more simple, as in they don't mount anything. In our runc config we specify a bunch of different mounts, like /dev
.
Are the devices the same?
Yes, with the ones we provide. runc however will create some other devices, like /dev/null
Are the masked paths the same?
The masked paths that are used are the default ones provided by runc. I will take a look into what Firecracker's jailer does in terms of this
Looks like Firecracker does not mask any paths when jailing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our initial integration, let's match what Firecracker's jailer does rather than match runc's defaults.
runtime/jailer.go
Outdated
return err | ||
} | ||
|
||
dstFile, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY, os.ModePerm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want os.ModePerm
here? That's 0777.
runtime/jailer.go
Outdated
} | ||
|
||
func (j jailer) jailerCommand(containerName string) *exec.Cmd { | ||
cmd := exec.Command("runc", "run", containerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to specify the full path instead of runc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning here? Basically we'd need to look up the path and see where runc is installed via exec.LookPath
, if we wanted to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting to not already know the path of runc? If we just look on $PATH
, we could be running a different runc than we expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I wasn't planning on having the runc path being part of the config, but it may not be a bad idea to have the user specify it. Is that what you were thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think having it be part of the config is reasonable.
runtime/jailer.go
Outdated
return cmd | ||
} | ||
|
||
// overwriteConfig will set the proper default values if a field had not been set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to call this defaultConfig
then? Or setDefaults
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will overwrite the configuration values like uid
and gid
. In addition when we support numa nodes, that will also be injected into the config. So I think overwriteConfig
is fitting? I can split out the logic with the defaults, like getDefaultConfig
and overwriteConfig
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone has set uid
and gid
values, why do you want to overwrite them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we generate the uid and gid ourselves when we jail. This grants us a better security boundary between VMs, since they are all ran under a different uid and gid. We could allow for users to specify one and simply check if one had been provided, and if one has, use the given uid and gid. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should split out the defaulting logic from this. What does Firecracker's jailer do today? Does it pick uid/gid for you?
Instead of overwriting values for uid/gid, can you error if those are set with a clear error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't pick a uid/gid for you. You specify which UID and GID you want to use. However, we'd be doing the same logic, as in we'd still want to choose unique UIDs and GIDs per VM and pass that into Firecracker's jailer. This is more of an abstraction above the jailer.
Instead of overwriting values for uid/gid, can you error if those are set with a clear error message?
Yea! I really like that
@@ -0,0 +1,164 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this file and runtime/firecracker-runc-config.json.example
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference lies in --privilege
being passed into docker granting CAP_SYS_ADMIN
and various other capabilities. Meaning the config file will need far less capabilities than the example
Thought this was the case, but since we are running runc
as sudo
anyways, it'll have a good amount of capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the difference between the files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a difference due to needing to run as sudo, sorry if that wasn't clear. Did we want to remove the example one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example one is the one customers will see and edit for try out firecracker-containerd. This one is the one customers don't have to care, but we'd like to use that inside integration tests (inside Docker).
So, I'd like to keep both of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're identical, I'd prefer that our integration tests copy the example file rather than keeping both of them in source control. It's hard to ensure that the files will remain identical otherwise.
(If they're intentionally different, which is fine, we should be clear about what those differences are.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may end up needing to generate, at least partially, the config.json that we use. If we decide to use user namespaces, we'll want to dynamically generate the UID/GID mappings from container -> host values. So in the longer term, this file might go away altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xibz Is this file being used? I don't see it referenced; instead I see COPY runtime/firecracker-runc-config.json.example /etc/containerd/firecracker-runc-config.json
in tools/docker/Dockerfile
.
opts := []firecracker.Opt{ | ||
firecracker.WithLogger(s.logger), | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few other fields in the Go SDK's JailerConfig
that I think we need to fill out. The main one I'm thinking about is NetNS.
I guess for the runc implementation in particular, this would be a matter of checking whether a netns path is set in the config and, if so, setting it there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xibz Can you open an issue to follow up on this and mark it as TODO here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I already had created one about a couple weeks ago, #258
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xibz Can you write a TODO comment in the code here?
26ac9d9
to
01355ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code mostly looks good at this point, but there's a compliation error:
# github.com/firecracker-microvm/firecracker-containerd/runtime [github.com/firecracker-microvm/firecracker-containerd/runtime.test] ./runc_jailer_test.go:55:30: not enough arguments in call to newRuncJailer have (*logrus.Entry, string, string, number, number) want (context.Context, *logrus.Entry, string, string, uint32, uint32) FAIL github.com/firecracker-microvm/firecracker-containerd/runtime [build failed]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed one more thing on another pass that I think is worth fixing in this PR.
I also think your issue for follow ups should include updating the getting started guide with some guidance on how to enable the jailer, but probably doesn't make sense to do that in this PR given the interface may change w/ the followups.
1055d61
to
3241530
Compare
Fixed! |
Thanks! Can you squash all the commits that you want and prepare them as if you'd be submitting? I'd like to look at the final set of commits and commit messages you want to push. |
runtime/noop_jailer.go
Outdated
} | ||
|
||
func (j noopJailer) BuildJailedMachine(cfg *Config, machineConfig *firecracker.Config, vmID string) ([]firecracker.Opt, error) { | ||
return []firecracker.Opt{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems resolved now.
} | ||
|
||
// BuildJailedMachine will return the needed options for a jailed Firecracker | ||
// instance. In addition, some configuration values will be overwritten to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"some configuration values" -> "Both cfg and machineConfig" to clarify what it does. BTW, the function is modifying the former. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea
for i, v := range m.Cfg.VsockDevices { | ||
filename := filepath.Base(v.Path) | ||
v.Path = filepath.Join("/", filename) | ||
m.Cfg.VsockDevices[i] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved?
runtime/runc_jailer.go
Outdated
} | ||
|
||
func mkdirAndChown(path string, mode os.FileMode, uid, gid uint32) error { | ||
if err := os.MkdirAll(path, mode); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Mkdir() instead of MkdirAll()? MkdirAll can create /foo/bar/baz even there is no /fo, but Chown only modifies /baz in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tools/docker/Dockerfile
Outdated
COPY _submodules/firecracker/target/$FIRECRACKER_TARGET/release/firecracker /usr/local/bin/ | ||
COPY _submodules/firecracker/target/$FIRECRACKER_TARGET/release/jailer /usr/local/bin/ | ||
COPY _submodules/firecracker/target/$FIRECRACKER_TARGET/release/firecracker /usr/local/bin/firecracker | ||
COPY _submodules/firecracker/target/$FIRECRACKER_TARGET/release/jailer /usr/local/bin/jailer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is okay to remove this jailer, if you modify the Dockerfile in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't hurt to leave it in there, imo. We are basically copying all of the built binaries of firecracker into the docker container's workspace, which I think makes sense.
3241530
to
acc3c6c
Compare
Done |
fda125e
to
e2ee150
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
bf4a14e
to
e74ac0e
Compare
There is a typo in the commit message for 45fa12c. Might as well fix it while we still can. Otherwise, LGTM |
This commit adds the jailer config to the proto defintion as well as regenerating of the models. Signed-off-by: xibz <[email protected]>
This adds jailing to the runtime. Users are now able to enable jailing by providing a jailer config and setting the "runc_binary" in the firecracker json file. Signed-off-by: xibz <[email protected]>
This commit updates the Dockerfile by setting the correct restrictive permissions to the kernel image and drive file. In addition this also adds the copy of firecracker-runc-config.json.example to the integ test work environments. Signed-off-by: xibz <[email protected]>
Adds runtime/logs to the .gitignore and updates the config example with enabling the jailer. Signed-off-by: xibz <[email protected]>
e74ac0e
to
fb4d1f6
Compare
firecracker: fix dropped test errors
Signed-off-by: xibz [email protected]
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.