-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix the size of a virtio device, backed by a block device #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What would be a good way to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR!
As a high-level comment for the way you introduced your bug fix, I would suggest to refactor a little bit with the following considerations:
- IMO
BLKGETSIZE64
ioctl definition should reside in other place than vmm crate, in the idea that we can place the definition better inside the code base. - If we take a look at net_util crate, inside tap.rs, there are definitions for some ioctls related to TAP.
- It would make sense to add a similiar crate, but it does not worth while we have just one ioctl definition, and no code logic or whatsoever.
I would suggest moving the BLKGETSIZE64
ioctl definition into a new file (e.g. block.rs), inside sys_util crate. There is the place for the unit test also (NOT_OK test).
As a final note, it would be really helpful if you can give us any heads up when you decide to provide a PR for an issue, in order to not overlap with the work of someone else who is assigned to that issue. I've just wrote the solution for this issue yesterday, and I was delaying the PR because of the integration tests (OK, NOT_OK), but while we're at it, we can move forward with this PR.
vmm/src/block_device.rs
Outdated
use std::path::Path; | ||
|
||
#[test] | ||
fn test_invalid_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an OK assert test I would make an integration test.
You can find here: https://github.com/firecracker-microvm/firecracker/blob/master/tests/integration_tests/functional/test_drives.py#L14 a test which does block device rescan for a file backed virtio block device.
I thought about how we should make the OK test before you commited this PR, and I came up with renaming the test from the link to express that it does rescan for a file backed virtio block device
and add another similar test which will make rescan for a loop block device backed viritio block device
.
The new test should:
- Create a loop device which will expose the same file from the test from the link as a block device.
- Start Firecracker with this
loop device backed virtio block device
inserted. - Truncate the file.
- Update loop device size on host.
- Issue a BlockDeviceRescan to Firecracker, to update the virtio block device capacity accordingly.
- Check the virtio block device size, to make sure the size is updated.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guidance. I basically followed your suggestion. Please take a look 706584d.
Sorry for taking the issue without let you know beforehand. The issue has been opened for a few weeks and I thought you were working on something else. I have made separate commits to make review easier. But it would be better to squash all of them into one or two (877f2f0 + other commits) before merging. What do you think? |
vmm/src/lib.rs
Outdated
.map_err(|_| DriveError::BlockDeviceUpdateFailed)? | ||
} else { | ||
metadata.len() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the fseek(SEEK_END) / ftell()
method for getting the file or block dev size? We wouldn't need to seccomp-whitelist another ioctl, and IMO the code would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that but it works! Would it be better to update the file-backed device path as well? Since metadata() wouldn't need open(), I'd like to only update block.rs to use fseek (std::io::Seek in Rust).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed on 14ae545
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should work on both files and block devices. With that in mind, do we really need an extra branch in the code to check if the source is a regular file or a block device and an extra function in sys_util
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata.len()
would be faster...? But yeah, I'm fine inclining open + seek here.
"""Verify that rescan works with a device-backed virtio device.""" | ||
test_microvm = test_microvm_with_ssh | ||
# To access a block device, seccomp_level = 2 is too restrictive | ||
test_microvm.jailer.seccomp_level = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests should always be run with seccomp_level = 2
(i.e. the most restrictive whitelist), since this is the recommended way to run Firecracker in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The line is not needed anymore due to 14ae545.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with how it looks now. Please consider the comment from the end of the test_rescan_dev
, if it would make sense. Basically, I saw that there is a loop device left, which points to a Firecracker session test file (f2
filesystem), which was deleted when test has finished.
Thanks @iulianbarbu. Will rebase the branch once CI is being green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention that although the changes look good to me, in the end, these changes should be structured in commits that do not overlap, ideally. I noticed that changes from the first commit are dissmised by changes from the second commit, and so on. The commits should contain a title which ideally will reflect briefly the crate/area where the changea reside (e.g sys_util: added get_blk_dev_size), and a description, with respect to 50/70 rule. Also, every commit should build without compile errors, and it would be helpful if you can make sure that style test is succesful, which will remove the need for other commits which fix style.
Hmm, not so sure about |
@kzys , this is just a guess: I suspect that the machine for x86_64 CI is shared by multiple PRs. In this case, running this PR previously on the machine resulted in the creation of a loop device which is now conflicting in some way with a rerun of the integration test. I saw this happening on my local machine, while I was running multiple times |
I think the x86 CI host is not having enough loopback devices. Can you check the configuration? You might also need |
Synced-up with @aghecenco. The last commit skips the integration test on hosts that don't have loopback devices (incl. the x86_64 CI host). Hope that works... |
4795ebb
to
d8ad616
Compare
Seems like the aarch64 failure is coming from #1381, not this PR. |
The aarch64 check is not required to merge this PR (or others); it’s currently failing due to an unrelated issue and will not be mandatory until that’s fixed. The buildkite-aarch64 check validates each PR on Arm too, so nothing’s broken here. |
@kzys the coverage dropped a bit after merging latest upstream changes - can you add a new commit that lowers |
Hard-linking from /dev/loopN to a jailed directory doesn't work. Signed-off-by: Kazuyoshi Kato <[email protected]>
When a virtio device is backed by a block device, the actual size of the block device file doesn't represent the size as a virtio device. Fixes firecracker-microvm#1316 Signed-off-by: Kazuyoshi Kato <[email protected]>
@aghecenco Sure. Rebased and now looks fine. |
Instead of calling blockdev(1), this change uses os.File#Seek which would be more effecient. firecracker-microvm/firecracker#1371 Signed-off-by: Kazuyoshi Kato <[email protected]>
Reason for This PR
#1316
Description of Changes
When a virtio device is backed by a block device, the actual size of the block device file doesn't represent the size as a virtio device.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.
PR Checklist
[Author TODO: Meet these criteria. Where there are two options, keep one.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]
git commit -s
).clearly provided.
doc changes are included in this PR. Docs in scope are all
*.md
fileslocated either in the repository root, or in the
docs/
directory.code is included in this PR.
reflected in
firecracker/swagger.yaml
.this PR have user impact and have been added to the
CHANGELOG.md
file.unsafe
code has been added, or, the newly addedunsafe
code is unavoidable and properly documented.