-
Notifications
You must be signed in to change notification settings - Fork 200
Command inside container hangs forever #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just tried this and am able to reproduce it, though it seems inconsistent (sometimes a command will hang almost right away, other times you can run a bunch of commands, including This seems to be a bug. The stdio is going from ctr->fifo->shim->vsock->agent->fifo->container and back, so there's a lot of places this could be breaking. Will need more investigation. |
I locally changed the code to have the VM agent write all output to /dev/console and to forward all VM stdout+stderr to containerd logs. I got similar behavior to happen, though this time the command only hung for a little bit (maybe 15-30ish seconds, I wasn't counting) before the task just exited. The logs from the VM are actually showing what looks to be corruption in
Here's the full containerd logs that snippet was extracted from, which has the full kernel stack trace: containerd.log So the stdio being hung may be a side effect of whatever underlying issue is happening with the block device, though it's worth reproducing again to make sure the block device corruption is the only issue. |
I'm growing a bit more concerned about this issue. I have been hitting it again when doing unrelated manual tests, including when I do absolutely no interaction with Out of curiosity, I tried running it with the devmapper snapshotter to see if it's something specific to the naive snapshotter, but the kernel panic still happens. |
I am able to reproduce this in an automated test case now by just doing repeated writes to the filesystem, see my fork here: https://github.com/sipsma/firecracker-containerd/blob/kernelpanic/runtime/service_integ_test.go#L791-L824 When the linked to test case runs, it just hangs indefinitely, but if you look at So this appears to have nothing to do with Here's containerd logs with the relevant output from a run of the test:
|
Just tested an up-to-date Linux 4.14.147 kernel with the same results:
|
I think there's an issue with the stub drive patching. In @sipsma's test, we try to write 128 1 MB files. However, if we remove the loop and only try to write a single 1 MB file, we still crash. We also see the following in the kernel output prior to the crash, which doesn't look right:
|
If I manually patch a drive, the right thing happens (This replaces a 512B drive with a 100 MB drive):
|
But if the backing store for the drive changes from a plain file to a block device, we see the capacity drop to zero, as we see during the panic-inducing test:
|
When firecracker-containerd is patching a drive, Firecracker is inspecting the size of the backing file by calling std::fs::metadata, which ultimately calls stat() on Unix. However since our stub drive is backed by a loopback device, the size from stat() is always zero. Luckily, there is ioctl() we can use for getting the size of a block device (thanks @sipsma!). Firecracker should call this ioctl() instead of relying stat() when the backing file is a block device. |
The fix has been merged! Waiting Firecracker's next release. We have no further actions from our end, assuming Firecracker 0.20 doesn't have any breaking changes :) |
Seems 0.20.0 is coming soon. |
Use Config.VMID as Firecracker's instance ID
* Since firecracker-microvm/firecracker#2125, `cargo build` doesn't build jailer by default. (firecracker-microvm#263) * Fix Benchmark Goroutine (firecracker-microvm#259) * Jailer configuration API cleanup and improved logging with Debug log level (firecracker-microvm#255) * Firecracker is internally has an instance ID, but the SDK didn't have the way to configure the ID. This change connects Config.VMID to the instance ID. (firecracker-microvm#253) * Fixed error that was not being test against in `TestWait` (firecracker-microvm#251) * Fixes issue where socket path may not be defined since the config file has yet to be loaded (firecracker-microvm#230) * Fixed error that was not being test against in `TestNewPlugin` (firecracker-microvm#225) * Download Firecracker 0.21.1 and its jailer from Makefile (firecracker-microvm#218) Signed-off-by: xibz <[email protected]>
I'm able to run a new busybox container using this command:
When I'm inside, I execute these commands:
The
ls
command hangs forever. The task can't be killed, paused or destroyed. Logs fromfirecracker-containerd
:If I stop and start again the
firecracker-containerd
, then this lines are shown in the log:Why the
ls
command hangs forever? What I can do to avoid this? Maybe setup container timeout?The text was updated successfully, but these errors were encountered: