Skip to content

StsCredentialsProvider uses excessive number of threads in multi-tenant setup #3259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cloudshiftchris opened this issue Jun 21, 2022 · 3 comments · Fixed by #3275
Closed
Labels
feature-request A feature should be added or improved.

Comments

@cloudshiftchris
Copy link

cloudshiftchris commented Jun 21, 2022

Describe the bug

When configuring StsCredentialsProvider (or it's subclasses, e.g. StsAssumeRoleCredentialsProvider) for async credential refreshing, an instance of software.amazon.awssdk.utils.cache.NonBlocking is created for each credentials provider. This further creates a ScheduledThreadPoolExecutor with a single thread to handle the async refreshing of credentials.

In a multi-tenant environment where a credentials-provider-per-tenant is used (to provide scoped-down per-tenant IAM policies, e.g https://aws.amazon.com/blogs/apn/isolating-saas-tenants-with-dynamically-generated-iam-policies/) there are a proliferation of ScheduledThreadPoolExecutors (each with a single thread). Threads are expensive resources to create and have laying around at scale.

It isn't necessary to have a thread-per-credentials-provider - a shared ScheduledThreadPoolExecutor with a small pool size would suffice.

Expected Behavior

Scaling the use of StsCredentialsProvider in a multi-tenant environment doesn't consume excessive/unnecessary thread resources.

Current Behavior

A background thread is created for each instance of StsCredentialsProvider for async refresh of credentials. Threads are expensive resources to create and have laying around at scale. Each thread consumes memory (thread stack), and there are hard limits to the number of threads that can be created (these vary based on OS, configuration, and other use of threads inside an app).

Reproduction Steps

n/a. This is a non-functional/scalability design defect, evident in a cursory review of the StsCredentialsProvider code.

Possible Solution

Allow for StsCredentialsProvider builder to take an (optional) ScheduledThreadPoolExecutor (and remove cleanup logic in NonBlocking for externally provided Executors).

This allows consumers, in advanced use cases, to manage the background threads efficiently and avoid resource-starvation scenarios at scale.

Additional Information/Context

No response

AWS Java SDK version used

2.7.214

JDK version used

17

Operating System and version

Mac OS Catalina

@cloudshiftchris cloudshiftchris added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 21, 2022
@debora-ito debora-ito added the needs-review This issue or PR needs review from the team. label Jun 23, 2022
@debora-ito
Copy link
Member

@cloudshiftchris thank you for the detailed report. Although we don't have a lot of experience on the multi-tenant model, your suggested solution makes sense from the perspective of a product feature.

I see you submitted a PR, we'll take a look.

@debora-ito debora-ito added feature-request A feature should be added or improved. and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. needs-review This issue or PR needs review from the team. labels Jun 28, 2022
@debora-ito
Copy link
Member

Changed to feature-request since it's an optimization.

millems added a commit that referenced this issue Jun 29, 2022
Fixes #3259.

1. Share thread pools across async credential providers (anything using CachedSupplier's NonBlocking prefetch strategy).
2. Log a warning if an extreme number of concurrent refreshes are happening, to help users detect when they're not closing their credential providers.

Even though this is an increase in resource sharing, it should not cause increased availability risks. Because these threads are only used for background refreshes, if one particular type of credential provider has availability problems (e.g. SSO or STS high latency), it only disables background refreshes, not prefetches or synchronous fetches.
millems added a commit that referenced this issue Jun 29, 2022
Fixes #3259.

1. Share thread pools across async credential providers (anything using CachedSupplier's NonBlocking prefetch strategy).
2. Log a warning if an extreme number of concurrent refreshes are happening, to help users detect when they're not closing their credential providers.

Even though this is an increase in resource sharing, it should not cause increased availability risks. Because these threads are only used for background refreshes, if one particular type of credential provider has availability problems (e.g. SSO or STS high latency), it only disables background refreshes, not prefetches or synchronous fetches.
@github-actions
Copy link

github-actions bot commented Jul 9, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws-sdk-java-automation pushed a commit that referenced this issue Sep 18, 2024
…f4d5156d6

Pull request: release <- staging/f344cecd-49c8-4893-b772-753f4d5156d6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants