-
Notifications
You must be signed in to change notification settings - Fork 909
Intermittent issue on sdk v2 - Unable to load credentials from service endpoint #3448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @striker50 Thank you very much for your submission. Here, the unable to connect to service endpoint indicates that the InstanceProfileCredentialsProvider attempted to refresh the credentials but could not connect to the service endpoint before timeout. After some research, I have found some other users issue submission reporting high latency of credential refresh : see Since you are using EMR, you can adjust the hop limit using the modify-instance-metadata-options command if you need to make it larger. You can find more information on the use of IMDSv2 as well as hop-limit configuration here. Note: This seems to be a bug submission question for the AWS Java SDK V2 rather than a bug submission for AWS Java SDK V1. To facilitate other user guidance search, I will update the label accordingly and transfer the submission to the appropriate repository. Best, Yasmine |
@yasminetalby We created a new EMR cluster with below configs but still seeing the same issue
|
@yasminetalby Any update on this issue ? we are blocked on deploying our services to prod due to this issue. Can you please escalate it. |
Hello @skumarstrike02 , @striker50 , Apologies for the delay. Please make sure to remove any sensitive information. It seems that the EC2 team behind the Metadata Service and are still investigating the latency issue mentioned above. Best, Yasmine |
I was also wondering it you have enabled the SDK metrics? Best, Yasmine |
HI @yasminetalby We ran the service with updated logging settings, below are the logs around the error : |
Hello @skumarstrike02 , Thank you very much for the extra information and for your collaboration. Best, Yasmine |
HI @yasminetalby Thanks |
Hello Sumeet, Is it possible for you to enable the metrics? This would allow us to see if there is any configuration change that can be made that will help in your case. Best, Yasmine |
Hi @yasminetalby , Can it be enabled on a running EMR cluster, if yes please point me to the steps on how to enable it. Thanks |
Hello @skumarstrike02 , You can find how to enable SDK metrics in the documentation provided in earlier comments : https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/metrics.html The metrics that would be interesting for us in this case is : CredentialsFetchDuration (see : https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/metrics-list.html) Here you can see a similar issue where the customer provides the metrics : #1667 (comment) Best, Yasmine |
HI @yasminetalby Is there any update on the issue from SDK team ? |
Hello @skumarstrike02 , Thank you very much for providing the documentation. There is an internal ticket open about this case investigation. Have you attempted to update the SDK to the latest version? Have you seen any improvement on the behavior? Best, Yasmine |
Hi @yasminetalby
|
Hello @skumarstrike02 , Thank you very much for the update. Thank you very much for your collaboration. Sincerely, Yasmine |
hello @yasminetalby
|
Hello @skumarstrike02 , Thank you very much for bringing this up to my attention. I apologize for any confusing guidance provided. I am happy to keep this GitHub issue open if this is your preferred medium of communication. To provide the latest update, the behavior was raised to the service team to investigate the connectivity issue. We are currently waiting on an update from IMDS regarding this case. Thank you very much for your time and collaboration. Please let me know which is your preferred medium of communication. I will keep on providing updates here based on your communication preferences and the latest service team update. Best regards, |
It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it. |
Please keep the issue open until we get a resolution from aws support and/or sdk team. |
AWS sdk team helped us with code optimization w.r.t making static SQS client and that change seems to be working for us and helped in resolving the credentials throttling issue. Thanks everyone. this issue can be closed. |
Hello @skumarstrike02 , We are happy to hear that the fix worked. Thank you very much for for your collaboration and for letting us know that this issue submission could be resolved. Sincerely, Yasmine |
|
@skumarstrike02 what was the change here? what do you mean by static SQS client? does it different than the Singleton ? |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
I am running a Spark application on an EMR 5.30.1 cluster. Seeing intermittent credential access issue after upgrading AWS SDK version from 1.11.297 to 2.17.11.
We upgraded from v1 to v2 as v1 doesn't have stable support for configuring custom VPC endpoints - As recommended here aws/aws-sdk-java#2135 (comment)
On moving to sdk v2, the VPC endpoint access is working fine but we are seeing INTERMITTENT SQS sendMessage() failures because of credential access issue due to connection timed out.
Following https://docs.amazonaws.cn/en_us/sdk-for-java/latest/developer-guide/migration-client-credentials.html I also enabled async credential refresher using below code during SQS client initialization. But the issue still occurs
Please advise on how to fix the intermittent credential access issue on aws sdk v2. We get the credentials from InstanceProfileCredentialsProvider.
Expected Behavior
Consistent credential access for the SQSClient when sending messages
Current Behavior
code snippet:
Error message:
java.util.concurrent.ExecutionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from service endpoint. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) ...... at java.lang.Thread.run(Thread.java:750) Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from service endpoint. at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.refreshCredentials(HttpCredentialsProvider.java:110) at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:132) at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:89) at java.util.Optional.map(Optional.java:215) at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.resolveCredentials(HttpCredentialsProvider.java:146) at software.amazon.awssdk.awscore.client.handler.AwsClientHandlerUtils.createExecutionContext(AwsClientHandlerUtils.java:79) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.createExecutionContext(AwsSyncClientHandler.java:68) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:99) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:169) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:95) at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55) at software.amazon.awssdk.services.sqs.DefaultSqsClient.sendMessage(DefaultSqsClient.java:1528) ....... ....... Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1228) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1207) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990) at software.amazon.awssdk.regions.internal.util.ConnectionUtils.connectToEndpoint(ConnectionUtils.java:45) at software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:112) at software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:91) at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.refreshCredentials(HttpCredentialsProvider.java:79) ... 21 more
Reproduction Steps
code snippet for AWS SDK v2:
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.17.11
JDK version used
1.8
Operating System and version
EMR clusters
The text was updated successfully, but these errors were encountered: