-
Notifications
You must be signed in to change notification settings - Fork 248
fix: backoff retry and timeouts in HNS restart #3540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
60c3734
to
6c690ed
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses backoff and retry improvements to enhance the stability of HNS restart operations on Windows.
- Introduces exponential backoff delay in retry operations for stopping and starting the HNS service.
- Refactors timeout creation in the tryStartServiceFn and tryStopServiceFn functions by using a deadline function.
Comments suppressed due to low confidence (4)
platform/os_windows.go:335
- [nitpick] The variables 'n' and 'limit' are ambiguous; consider renaming them to 'attemptCount' and 'maxAttempts' to improve clarity.
var n, limit time.Duration = 0, 3
platform/os_windows.go:355
- [nitpick] The variable 'deadline' shadows the 'deadline' function; consider renaming it (e.g., 'ctxWithTimeout') to avoid confusion.
deadline, cancel := deadline(ctx)
platform/os_windows.go:382
- [nitpick] Consider renaming 'n' and 'limit' to 'attemptCount' and 'maxAttempts' respectively for consistency and clarity in this context.
var n, limit time.Duration = 0, 3
platform/os_windows.go:402
- [nitpick] The reuse of the name 'deadline' causes shadowing of the deadline function; renaming it to something like 'ctxWithTimeout' could reduce ambiguity.
deadline, cancel := deadline(ctx)
6c690ed
to
4ed0f83
Compare
/azp run Azure Container Networking PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the HNS restart process by introducing deadlines for service stop/start operations and using an exponential backoff retry strategy. Key changes include:
- Adding a delay type for exponential backoff in the retry calls.
- Implementing deadlines with a 90-second timeout for both stopping and starting the service.
- Modifying the error handling to use the deadline's signal instead of the original context cancellation.
Comments suppressed due to low confidence (1)
platform/os_windows.go:361
- The error wrap message 'context cancelled' may be misleading for a timeout scenario. Consider changing it to 'deadline exceeded' to better reflect a timeout condition.
case <-deadline.Done():
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Evan Baker <[email protected]>
4ed0f83
to
1a397d8
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Evan Baker <[email protected]>
Signed-off-by: Evan Baker <[email protected]>
#3529 refactored the HNS restart such that the stop/start could be retried - this adds a deadline to fail and cause that retry, and changes the retry strategy to exponential backoff instead of constant