Skip to content

com.amazonaws.SdkClientException: Unable to load credentials from service endpoint #2521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cbcoutinho opened this issue Mar 9, 2021 · 6 comments
Labels
guidance Question that needs advice or information.

Comments

@cbcoutinho
Copy link

I'm trying to set up a small Spark cluster using docker-compose, and vending my credentials to each of the containers via the ECS Task Metadata Endpoint. This is provided by another docker container using the https://github.com/awslabs/amazon-ecs-local-container-endpoints image.

Containers are able to cURL the endpoint (169.254.170.2/creds), and the env vars are respected by other SDKs such as python/boto3, but I can't seem to get the spark containers to reach the endpoint. I've tried using the standard hadoop-aws jars as well as the latest 1.11.x versions of aws-sdk-java to no avail.

Describe the bug

The spark containers that I'm using to query some data on S3 locally are erroring out due to missing credentials. The following spark-shell command works on EMR clusters, but I'm trying to specifically run this locally using docker-compose and it seems like the aws-sdk-java doesn't respect the ECS metadata endpoint.

The endpoint seems to be ignored or working incorrectly for java - the python sdk (boto3) works as expected.

Expected Behavior

The credentials available using the ECS Task metadata endpoint should allow java application(s) access to the credentials

Current Behavior

After the spark.read.... job is started, there is a considerable hang of about 10s or more before the entire process fails. I'm not sure if the problem is with the ECS endpoints or if it's related to the timeout itself. The awscli doesn't suffer from the same timeout.

Steps to Reproduce

Start the various docker-containers using docker-compose, and then launch a spark-shell from within either the spark-master or spark-worker containers:

$ export AWS_xxxx=....
$ docker-compose up -d # Credentials are passed to the ecs-endpoint container via the override file, see below
$ docker-compose exec spark-master bash
.
.
.
root@88ce36ed6089:/opt/bitnami/spark# env | grep AWS
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/creds
AWS_DEFAULT_REGION=eu-west-1
root@88ce36ed6089:/opt/bitnami/spark# curl 169.254.170.2/creds | jq
{
  "AccessKeyId": "ASIxxxxxx",
  "Expiration": "2021-03-09T00:59:46Z",
  "RoleArn": "",
  "SecretAccessKey": "+tiY/xxxxxx",
  "Token": "Fwoxxxxxx=="
}
root@88ce36ed6089:/opt/bitnami/spark# python3 -m pip install awscli
root@88ce36ed6089:/opt/bitnami/spark# aws s3 ls
my-bucket
root@88ce36ed6089:/opt/bitnami/spark# spark-shell --master spark://spark-master:7077 --packages org.apache.hadoop:hadoop-aws:2.10.1
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/bitnami/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-258bebeb-c8c6-4af9-b07f-7a7a06e058dd;1.0
        confs: [default]
        found org.apache.hadoop#hadoop-aws;2.10.1 in central
        found com.amazonaws#aws-java-sdk-bundle;1.11.271 in central
        found org.apache.commons#commons-lang3;3.4 in central
:: resolution report :: resolve 245ms :: artifacts dl 7ms
        :: modules in use:
        com.amazonaws#aws-java-sdk-bundle;1.11.271 from central in [default]
        org.apache.commons#commons-lang3;3.4 from central in [default]
        org.apache.hadoop#hadoop-aws;2.10.1 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-258bebeb-c8c6-4af9-b07f-7a7a06e058dd
        confs: [default]
        0 artifacts copied, 3 already retrieved (0kB/9ms)
21/03/09 00:37:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/03/09 00:37:13 WARN SparkContext: Please ensure that the number of slots available on your executors is limited by the number of cores to task cpus and not another custom resource. If cores is not the limiting resource then dynamic allocation will not work properly!
Spark context Web UI available at http://88ce36ed6089:4040
Spark context available as 'sc' (master = spark://spark-master:7077, app id = app-20210309003714-0001).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.2
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.format("json").load("s3a://my-bucket/my-prefix/*")
21/03/09 00:40:00 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
java.net.SocketTimeoutException: doesBucketExist on my-bucket: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
  at org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:342)
  at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
  at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
  at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
  at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
  at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
  at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:375)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:311)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
  at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:376)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:232)
  ... 47 elided
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
  at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:159)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1166)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:762)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:724)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
  at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:5129)
  at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:5103)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4352)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
  at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344)
  at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:376)
  at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
  ... 66 more
Caused by: com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
  at com.amazonaws.auth.EC2CredentialsFetcher.handleError(EC2CredentialsFetcher.java:183)
  at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:162)
  at com.amazonaws.auth.EC2CredentialsFetcher.getCredentials(EC2CredentialsFetcher.java:82)
  at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:164)
  at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:137)
  ... 83 more
Caused by: java.net.SocketTimeoutException: connect timed out
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:607)
  at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
  at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
  at sun.net.www.http.HttpClient.New(HttpClient.java:339)
  at sun.net.www.http.HttpClient.New(HttpClient.java:357)
  at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1205)
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
  at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
  at com.amazonaws.internal.ConnectionUtils.connectToEndpoint(ConnectionUtils.java:54)
  at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:116)
  at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:87)
  at com.amazonaws.auth.InstanceProfileCredentialsProvider$InstanceMetadataCredentialsEndpointProvider.getCredentialsEndpoint(InstanceProfileCredentialsProvider.java:189)
  at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:122)
  ... 86 more

Possible Solution

None, yet

Context

Trying to use ECS endpoints in a docker-compose setting

Your Environment

# docker-compose.yml
version: '2'

services:
  spark-master:
    image: docker.io/bitnami/spark:3-debian-10
    user: root
    #volumes:
      #- ./spark:/data
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    ports:
      - '8080:8080'
      - '4040:4040'

  spark-worker:
    image: docker.io/bitnami/spark:3-debian-10
    user: root
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
# docker-compose.override.yml
version: "2"
networks:
  # This special network is configured so that the local metadata
  # service can bind to the specific IP address that ECS uses
  # in production
  credentials_network:
    driver: bridge
    ipam:
      config:
        - subnet: "169.254.170.0/24"
          gateway: 169.254.170.1

services:
  # This container vends credentials to your containers
  ecs-local-endpoints:
    # The Amazon ECS Local Container Endpoints Docker Image
    image: amazon/amazon-ecs-local-container-endpoints
    volumes:
      # Mount /var/run so we can access docker.sock and talk to Docker
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      # Share local AWS credentials with ECS service
      AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
      AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"
      AWS_SESSION_TOKEN: "${AWS_SESSION_TOKEN}"
      AWS_DEFAULT_REGION: "eu-west-1"
    networks:
      credentials_network:
        # This special IP address is recognized by the AWS SDKs and AWS CLI
        ipv4_address: "169.254.170.2"

  # Here we reference the application container(s) that we are testing
  # You can test multiple containers at a time, simply duplicate this section
  # and customize it for each container, and give it a unique IP in 'credentials_network'.
  spark-master:
    depends_on:
      - ecs-local-endpoints
    networks:
      - credentials_network
      - default
    environment:
      AWS_DEFAULT_REGION: "eu-west-1"
      AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: "/creds"

  spark-worker:
    depends_on:
      - ecs-local-endpoints
    networks:
      - credentials_network
      - default
    environment:
      AWS_DEFAULT_REGION: "eu-west-1"
      AWS_CONTAINER_CREDENTIALS_RELATIVE_URI: "/creds"
@cbcoutinho cbcoutinho added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 9, 2021
@cbcoutinho
Copy link
Author

Possibly related to other timeout-related issues (e.g. #2365), but those are using the EC2 endpoint which is different. I can't find it documented anywhere that the aws-sdk-java does/doesn't support the ECS endpoint, but it may be related to that issue

@debora-ito
Copy link
Member

Hi @cbcoutinho thank you for the detailed report.

Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
  at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:159)
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1166)

It looks like ContainerCredentialsProvider is not in the default list of credential providers of org.apache.hadoop.fs.s3a.AWSCredentialProviderList. Is that the provider that you were expecting to pick up the credentials? I'm sorry, I'm not super familiar with ECS environments.

@cbcoutinho
Copy link
Author

cbcoutinho commented Mar 9, 2021 via email

@debora-ito
Copy link
Member

I'm glad I could help!

@debora-ito debora-ito added guidance Question that needs advice or information. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 10, 2021
@github-actions
Copy link

COMMENT VISIBILITY WARNING

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@cbcoutinho
Copy link
Author

For completeness, this was the additional flag that I needed when invoking spark-jobs in my local cluster:

--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.ContainerCredentialsProvider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

2 participants