-
Notifications
You must be signed in to change notification settings - Fork 274
Add retries when reponse from IMDSv2 retruns a 401 #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #244 +/- ##
==========================================
+ Coverage 81.43% 81.60% +0.16%
==========================================
Files 10 10
Lines 792 799 +7
==========================================
+ Hits 645 652 +7
Misses 131 131
Partials 16 16
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for jumping in to help out with these code changes!!
I think a better spot for the retry logic would be in the Request()
function so that we don't need to duplicate the 401 retry logic in the individual paths.
I would recommend moving the retry for-loop around this block
aws-node-termination-handler/pkg/ec2metadata/ec2metadata.go
Lines 184 to 206 in 24ff89c
if e.v2Token == "" || e.tokenTTL <= secondsBeforeTTLRefresh { | |
e.Lock() | |
token, ttl, err := e.getV2Token() | |
if err != nil { | |
e.v2Token = "" | |
e.tokenTTL = -1 | |
log.Log().Msgf("Unable to retrieve an IMDSv2 token, continuing with IMDSv1: %v", err) | |
} else { | |
e.v2Token = token | |
e.tokenTTL = ttl | |
} | |
e.Unlock() | |
} | |
if e.v2Token != "" { | |
req.Header.Add(tokenRequestHeader, e.v2Token) | |
} | |
httpReq := func() (*http.Response, error) { | |
return e.httpClient.Do(req) | |
} | |
resp, err := retry(e.tries, 2*time.Second, httpReq) | |
if err != nil { | |
return nil, fmt.Errorf("Unable to get a response from IMDS: %w", err) | |
} |
e.v2Token
and e.tokenTTL=0
to do a retry on the request while fetching a new token.
The panic should occur in the main pkg. An error returned by the GetSpotITNEvent()
or GetScheduledMaintenanceEvents()
functions will propagate up to this monitoring loop:
aws-node-termination-handler/cmd/node-termination-handler.go
Lines 100 to 111 in 24ff89c
for _, fn := range monitoringFns { | |
go func(monitor monitor.Monitor) { | |
log.Log().Msgf("Started monitoring for %s events", monitor.Kind()) | |
for range time.Tick(time.Second * 2) { | |
err := monitor.Monitor() | |
if err != nil { | |
log.Log().Msgf("There was a problem monitoring for %s events: %v", monitor.Kind(), err) | |
metrics.ErrorEventsInc(monitor.Kind()) | |
} | |
} | |
}(fn) | |
} |
This reverts commit c1e3477.
move Panic to main function. add tests for 401 retries.
@bwagner5 - thanks for the feedback! I've move this logic to the Request function and add two new local vars to the monitor loop to track the previous error and if you get the same error 3 times in a row it should panic. This way it should cover more than just the 401 error but any error that's duplicated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This looks great!
- Can you also run
make fmt
to correct the go report card issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks!!
Issue #, if available:
#229
Description of changes:
Added a for loop for getting scheduled maintenance events and spot instance events. Set the retry limit to 1.
Not sure of the best way to add this to the test suite. Also curious if throwing a panic to sovle the second item in the issue is what's best to do here or not.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.