-
Notifications
You must be signed in to change notification settings - Fork 910
StsCredentialsProvider uses excessive number of threads in multi-tenant setup #3259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@cloudshiftchris thank you for the detailed report. Although we don't have a lot of experience on the multi-tenant model, your suggested solution makes sense from the perspective of a product feature. I see you submitted a PR, we'll take a look. |
Changed to |
Fixes #3259. 1. Share thread pools across async credential providers (anything using CachedSupplier's NonBlocking prefetch strategy). 2. Log a warning if an extreme number of concurrent refreshes are happening, to help users detect when they're not closing their credential providers. Even though this is an increase in resource sharing, it should not cause increased availability risks. Because these threads are only used for background refreshes, if one particular type of credential provider has availability problems (e.g. SSO or STS high latency), it only disables background refreshes, not prefetches or synchronous fetches.
Fixes #3259. 1. Share thread pools across async credential providers (anything using CachedSupplier's NonBlocking prefetch strategy). 2. Log a warning if an extreme number of concurrent refreshes are happening, to help users detect when they're not closing their credential providers. Even though this is an increase in resource sharing, it should not cause increased availability risks. Because these threads are only used for background refreshes, if one particular type of credential provider has availability problems (e.g. SSO or STS high latency), it only disables background refreshes, not prefetches or synchronous fetches.
|
…f4d5156d6 Pull request: release <- staging/f344cecd-49c8-4893-b772-753f4d5156d6
Describe the bug
When configuring StsCredentialsProvider (or it's subclasses, e.g. StsAssumeRoleCredentialsProvider) for async credential refreshing, an instance of software.amazon.awssdk.utils.cache.NonBlocking is created for each credentials provider. This further creates a ScheduledThreadPoolExecutor with a single thread to handle the async refreshing of credentials.
In a multi-tenant environment where a credentials-provider-per-tenant is used (to provide scoped-down per-tenant IAM policies, e.g https://aws.amazon.com/blogs/apn/isolating-saas-tenants-with-dynamically-generated-iam-policies/) there are a proliferation of ScheduledThreadPoolExecutors (each with a single thread). Threads are expensive resources to create and have laying around at scale.
It isn't necessary to have a thread-per-credentials-provider - a shared ScheduledThreadPoolExecutor with a small pool size would suffice.
Expected Behavior
Scaling the use of StsCredentialsProvider in a multi-tenant environment doesn't consume excessive/unnecessary thread resources.
Current Behavior
A background thread is created for each instance of StsCredentialsProvider for async refresh of credentials. Threads are expensive resources to create and have laying around at scale. Each thread consumes memory (thread stack), and there are hard limits to the number of threads that can be created (these vary based on OS, configuration, and other use of threads inside an app).
Reproduction Steps
n/a. This is a non-functional/scalability design defect, evident in a cursory review of the StsCredentialsProvider code.
Possible Solution
Allow for StsCredentialsProvider builder to take an (optional) ScheduledThreadPoolExecutor (and remove cleanup logic in NonBlocking for externally provided Executors).
This allows consumers, in advanced use cases, to manage the background threads efficiently and avoid resource-starvation scenarios at scale.
Additional Information/Context
No response
AWS Java SDK version used
2.7.214
JDK version used
17
Operating System and version
Mac OS Catalina
The text was updated successfully, but these errors were encountered: