-
Notifications
You must be signed in to change notification settings - Fork 18k
crypto/rand: warn about very slow (60-second) /dev/urandom reads #22614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@HptmHavoc and/or @freespace it would be useful if you could kill the hung program (running the go-tip binary) with SIGQUIT ( |
Here is the Output with SIGQUIT
|
@ncw output after killing with SIGQUIT on b2get-go-tip is below:
|
@bradfitz based on your suggestion I checked
|
The backtrace does show getrandom blocking. It looks like it's probably blocking when generating the session-ticket key but, if it weren't that, it would quickly block in something else. TLS needs entropy and the system doesn't have any. Previously we ignored that state of affairs and ran insecurely. Now we'll block until the entropy pool has hit 128 bits at some point in the system's history, which is the minimal (and "correct") thing to do. Sadly, since there's a history of ignoring these issues in all sorts of software (due to OSes not providing the correct interface), VMs aren't correctly plumbed to seed entropy from the host. It is not an easy call to make. If the fallout proves to be too large, we may have made this move too soon. |
@agl, maybe we should make Go log to stderr after N seconds of blocking on gathering entropy, telling the user what's happening? Maybe. |
I have done that in the past in a different context and the experience was that lots of automated processes were unhappy about the unexpected stderr output since the condition was frequently transient. However, that case was triggered in a more flakey way than this and perhaps the delay (which was only a handful of seconds, as I recall) was too small. |
@bradfitz that seems like a nice pragmatic solution and there is form for log message like this, eg the log.Printf~s in net/http/transport.go. Not my call though! |
Now that we know what this was about, I removed the release blocker label. We can decide what to do during the Go 1.11 cycle. |
Ping @FiloSottile and @agl. What to do here? What's the rebuttal to the https://www.2uo.de/myths-about-urandom/ argument that we should always use urandom? |
@bradfitz The getrandom syscall (without the GRND_RANDOM flag) already fetches from the same source as /dev/urandom. It just waits until the kernel gathers enough entropy once to provide a secure urandom service from there on. Go is doing exactly what that link recommends, no rebuttal needed. Specific quote:
|
Oh, and a quick concrete fix for a low-entropy box (usually some embedded SBC) is to save entropy from a previous run on disk, and feed it back to the kernel at boot. haveged (userspace entropy sources) obviously helps too, but it might take it a while to gather enough entropy; saving some from the previous run gets everything going faster. Resources: https://www.freedesktop.org/software/systemd/man/systemd-random-seed.service.html http://man7.org/linux/man-pages/man4/random.4.html (search for e.g. RNDADDENTROPY) |
Per discussion with proposal review:
What should N be? @agl says too small is annoying because it causes spurious warnings. What is a time interval after which it's likely that the entropy is just not coming? |
For NeedsDecision, only decision left is to pick N (number of seconds before printing warning). |
uuid() (which is used to set requests ids) can be quite slow because of golang/go#22614 essentially access to /dev/urandom has to be serialized, and when everyone is piling in, the cost of assigning a request id is (measured) 3.5% of the whole execution time! This change makes the access to /dev/urandom asynchronous, reading in 32k chunks (one read every 4k requests), which eliminates the cost completely. It has been used in anger, at a rate of 50k requests/second, and it has not misbehaved. Change-Id: I47e524ef11344aadafff770444f214093fe6e008 Reviewed-on: http://review.couchbase.org/98891 Reviewed-by: Sitaram Vemulapalli <[email protected]> Reviewed-by: Johan Larson <[email protected]> Tested-by: Marco Greco <[email protected]>
Let's start with N=60 seconds. This will happen mainly at boot time and you want a note about why the system has stopped booting. Most people will wait a minute in that context. That should eliminate any concern about flakes. |
Change https://golang.org/cl/139419 mentions this issue: |
What version of Go are you using (
go version
)?go version go1.9.2 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?This issue shows itself on
linux/arm
What did you do?
Compile this program and run it on a linux/arm server.
What did you expect to see?
If it works it should produce output similar to this
What did you see instead?
However when compiled with go1.9 or go-tip it hangs indefinitely on some ARM servers. It works fine with go-1.8.
This issue came up as part of rclone/rclone#1794
I haven't managed to replicate the hang on my raspberry Pi 3, but both @HptmHavoc and @freespace have on scaleway arm servers using different kernels.
I'm not 100% sure what is going on but it is clearly a regression.
From the above @freespace wrote:
Mine is a scaleway arm server.
And from the above @HptmHavoc wrote:
Here pretty much the same, also a scaleway arm server.
The text was updated successfully, but these errors were encountered: