Skip to content

S3 Transfer Manager upload causes memory issues #3726

@joernschumacher0001

Description

@joernschumacher0001

Describe the bug

Trying to upload a large amount of file (~50GB in about 400 files) with the TransferManager uses a huge amount of memory. The JVM size (as reported by top) greatly exceeds the max heap size configured at startup (we start with -mx6G so we can stay below 16GB).

We tried adjusting the available configurations of both transfer manager and async S3 client, with no success; this involved a lot of trial and error because it was not clear how they interact exactly.

  • the TransferManager can be given an Executor, but then seems to immediately empty the requests into the asyncronous client?
  • we limited the S3Client's maxConcurrency, but this didn't have to have any influence
  • we did not find any configuration for total or individual connections' buffer size

Expected Behavior

The transfer manager could throttle requests towards the S3 client if too many failed requests occur, and/or the configuration should enable the configuration of a maximum total buffer.

Current Behavior

This results in many of the uploads to fail in the best case, in application shutdown (by the OS because of host RAM exhaustion) in the worst.

If the uploads fail, a CompletedDirectoryUpload is returned, correctly containing the failed files, and the memory situation recovers.

Outputting the CompletedDirectoryUpload.failedDownloads() result looks like this:

[FailedFileUpload(request=UploadFileRequest(putObjectRequest=PutObjectRequest(Bucket=<our-bucket>, Key=<target-path>, source=<local_file>, configuration=[<our-logging-listener>]), exception=software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Java heap space), [...]]

The configured TransferListener logs success and failure; the failure case looks like

software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: A callback has reported failure.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:127)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:93)
	at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:24)

Our current workaround is to set the JVM memory low enough that the OS does not shut down the process, so we get returned the list as described; we can then retry single file which works fine.

Reproduction Steps

This is our setup:

var tm = S3TransferManager.builder()
   .s3Client(S3AsyncClient.crtBuilder()
      .credentialsProvider(awsCredentialsProvider)
      .maxConcurrency(10)
      .targetThroughputInGbps(0.6d)
      .build())
   .build();
tm.uploadDirectory(builder -> builder
            .bucket(ourBucket)
            .source(Path.of(ourDirectory))
            .s3Prefix(s3DirectoryKey)
            .uploadFileRequestTransformer(transformer -> transformer.addTransferListener(loggingListener))
        )
}

The loggingListener really only logs success and failure:

class LoggingTransferListener implements TransferListener {
        @Override
        public void transferComplete(final Context.TransferComplete context) {
            logService.info(log, "transfer complete");
        }

        @Override
        public void transferFailed(final Context.TransferFailed context) {
            logService.error(log, "transfer failed", context.exception());
        }
}

We're assuming the main issue is the amount of data.

Possible Solution

  • allow overall available memory configuration
  • size the individual connections' buffer dynamically according to available or configured memory

Additional Information/Context

Our use case is persisting a Kafka Streams state store, to give an idea of how the data to be stored is structured. Total size is about 20-25GB across ~400 files as mentioned in the bug description.

AWS Java SDK version used

2.19.21

JDK version used

17.0.6

Operating System and version

Linux 4.14.301-224.520.amzn2.x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions