-
Notifications
You must be signed in to change notification settings - Fork 200
Added timeout for VMshutdown #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added timeout for VMshutdown #272
Conversation
@sipsma PTAL |
f2ac15e
to
7e4c8c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, thanks! Just some minor comments
runtime/service.go
Outdated
@@ -67,6 +67,7 @@ const ( | |||
defaultVsockPort = 10789 | |||
minVsockIOPort = uint32(11000) | |||
firecrackerStartTimeout = 5 * time.Second | |||
stopVMTimeoutSeconds = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For this variable, I think it should be a time.Duration
(so just 5 * time.Second
) so it can stay agnostic of particular units.
Also, can this be named more like defaultStopVMTimeout
? Just to make it clear it's a default value that can be optionally overridden.
runtime/service.go
Outdated
_, err = s.Shutdown(requestCtx, &taskAPI.ShutdownRequest{Now: true}) | ||
if err != nil { | ||
return nil, err | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need the for
statement wrapping the select
, unless I'm missing something we should be good with just the select
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I took a second look and think we will need a little bit more than just removing the for
. We want to make sure that if we hit the timeout while blocked on the Shutdown
call, the StopVM
method doesn't stay blocked. I think the simplest way to do this will be running Shutdown
in its own goroutine. Roughly sketched out:
shutdownCh := make(chan error)
go func() {
defer close(shutdownCh)
_, err = s.Shutdown(...)
shutdownCh <- err
}()
select {
case <- ctx.Done():
...
case err = <-shutdownCh:
if err != nil {
return nil, err
}
return Empty{}, nil
}
proto/firecracker.proto
Outdated
@@ -36,6 +36,7 @@ message CreateVMRequest { | |||
|
|||
message StopVMRequest { | |||
string VMID = 1; | |||
uint32 VMShutDownTimeOut = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this variable, units matter, so can we name it more like VMShutdownTimeoutSeconds
?
Oh also, the tests are failing just because your commit |
7e4c8c1
to
577335b
Compare
@Zyqsempai Saw in the test runs of your latest commits that the The fix should be to not create a new
To explain what I think the actual problem is, containerd gives our code a context object during initialization in the NewService method, which we decided to call One place we give the As a result, the Putting all the above together, canceling |
@sipsma Yep, it's really tricky, but totally make sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one more minor comment. Also, can you squash the separate commits into a single one? That should be sufficient for this change. Then it should be good for me to ship!
runtime/service.go
Outdated
shutdownCh := make(chan error) | ||
go func() { | ||
defer close(shutdownCh) | ||
_, err = s.Shutdown(requestCtx, &taskAPI.ShutdownRequest{Now: true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed this, let's use _, err := s.Shutdown(...)
(:=
instead of =
), just to avoid the confusion that can result from overwriting a named return val from within a separate goroutine.
c8cc629
to
cab9097
Compare
@sipsma Done, thanks for your comments. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks again!
Two (a bit bikeshed-y) questions:
|
Yes, this change essentially prevents StopVM from ever being blocked forever waiting for the shutdown, which I think is preferable as I can't think of any situation in which being blocked literally forever is the desired behavior for stopping the VM.
I think adding it to StopVM is particularly valuable because it ensures that users don't get in a situation where they are unable cleanup resources on their machine (once the timeout is hit, we just send SIGKILL). The cases where other operations besides StopVM get blocked forever are still not good, but the worst case scenario is the user doesn't create a VM or get information about their VM, which is somewhat less harmful than being unable to remove resources from their machine. That being said, I agree that adding timeouts to those other operations could still be valuable and is something to consider in the future (perhaps just not in this PR). |
Thanks. I agree that blocking literally forever won't be what customers want. How about renaming the field from |
Can this not be handled by setting a context deadline when making the request? Rather than an API change? |
I think having the client specify the timeout in the Also see this comment; from a purely internal code perspective, using context timeouts here creates more confusion than it's worth IMO. I think in practice it ends up being simpler to just have it be part of the API. |
Agree, @Zyqsempai I think that's worth making before we merge here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with changing the var name but LGTM otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had one minor nit comment, but otherwise looks good.
runtime/service.go
Outdated
@@ -499,6 +500,13 @@ func (s *service) createVM(requestCtx context.Context, request *proto.CreateVMRe | |||
// created yet and the timeout is hit waiting for it to exist, an error will be returned but the shim will | |||
// continue to shutdown. | |||
func (s *service) StopVM(requestCtx context.Context, request *proto.StopVMRequest) (_ *empty.Empty, err error) { | |||
var timeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little bit cleaner
timeout := defaultStopVMTimeout
if request.VMShutdownTimeoutSeconds > 0 {
timeout = time.Duration(request.VMShutdownTimeoutSeconds) * time.Second
}
Signed-off-by: bpopovschi <[email protected]>
cab9097
to
1ac8f63
Compare
Done! |
Issue #247
Added timeout for VM Shutdown
Added special type of error for case when timeout exceeded
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.