-
Notifications
You must be signed in to change notification settings - Fork 200
Add haveged to image-builder's rootfs. #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
runtime/service_integ_test.go
Outdated
select { | ||
case exitStatus := <-exitCh: | ||
assert.NoError(t, exitStatus.Error(), "failed to retrieve exitStatus") | ||
assert.Equal(t, uint32(0), exitStatus.ExitCode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can get rid of cast with EqualValues
runtime/service_integ_test.go
Outdated
} | ||
rngtestStdin.Close() | ||
|
||
// rngtest will exit non-zero if any blocks fail its randomness tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get this properly, rngtest will return 1 if no errors happen, but at least one block fails the FIPS tests
. Can this potentially be a flaky test that we'll have to retry? May be it worth to define some success threshold, like no more than 1% or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I ran a test where I locally increased the number of blocks by a factor of 1024 and got 27 block failures out of 32863 total tested. So, presuming those are all false-negatives and that I'm remembering how to math, if we test 32 blocks in a single test case, the probability of there being 1 false-negative is ~2.6% and the probability of there being 2 false-negatives is ~0.03%.
2.6% is a little high, I agree, so I'll update the test case to look at the output and check how many failures there were in case of a non-zero exit code, only failing if there was more than 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note, I ended up actually testing 1023 blocks (not just 32) because I fixed a separate problem that allowed me to actually test the expected number of blocks (had to specify iflag=fullblock
in the dd command). So re-running the calculations I went with a failure tolerance of 4 blocks as the chances of 5 block failures drops down to ~0.1%
071af8b
to
7cb884d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,9 @@ | |||
[Unit] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newer version of the haveged package provide their own unit file. You might consider borrowing the file from buster, or at least some of its settings. https://salsa.debian.org/debian/haveged/blob/buster/debian/haveged.service
We should upgrade the VM image to buster one of these days...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buster upgrade is coming... d56b94b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, only difference between the service files now is I inlined the args to the daemon instead of maintaining a separate env file (just seems simpler for us) and left out apparmor.service systemd-random-seed.service systemd-tmpfiles-setup.service
from the After=
parameters as I don't see any of those services under image-builder's rootfs
haveged ensures that there is sufficient entropy available to processes running within the microvm. Without sufficient entropy, it is fairly easy for processes to get blocked on reading /dev/random or making getrandom() syscalls, including during boot (which can result in CreateVM calls to fail if the agent process gets blocked). haveged was chosen as it enforces no minimum kernel requirements, does not add CPU requirements (i.e. existence of RDRAND or similar instructions) and is currently used by Debian for related use cases such as seeding entropy in their installer. One other option was "rngd", which has versions that support use of RDRAND. It was not chosen as RDRAND is not universally trusted or portable. Similarly, use of the "random.trust_cpu=on" kernel boot parameter was ruled out for now as it relies on RDRAND and additionally enforces a minimum kernel version of 4.19. Signed-off-by: Erik Sipsma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 LGTM
…ependabot/go_modules/github.com/go-openapi/errors-0.20.0 Bump github.com/go-openapi/errors from 0.19.9 to 0.20.0
haveged ensures that there is sufficient entropy available to processes running
within the microvm. Without sufficient entropy, it is fairly easy for processes
to get blocked on reading /dev/random or making getrandom() syscalls, including
during boot (which can result in CreateVM calls to fail if the agent process
gets blocked).
haveged was chosen as it enforces no minimum kernel requirements, does not
add CPU requirements (i.e. existence of RDRAND or similar instructions) and is
currently used by Debian for related use cases such as seeding entropy in their
installer.
One other option was "rngd", which has versions that support use of RDRAND.
It was not chosen as RDRAND is not universally trusted or portable. Similarly,
use of the "random.trust_cpu=on" kernel boot parameter was ruled out for now
as it relies on RDRAND and additionally enforces a minimum kernel version of
4.19.
Signed-off-by: Erik Sipsma [email protected]
Addresses #325
Verified the test by running it w/out starting the haveged daemon, which resulted in the container just getting blocked on /dev/random and the test timing out.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.