Skip to content

Stuck forever in List Object call with rate limiter #806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
newpoo opened this issue Feb 21, 2018 · 6 comments
Closed

Stuck forever in List Object call with rate limiter #806

newpoo opened this issue Feb 21, 2018 · 6 comments
Labels
help wanted We are asking the community to submit a PR to resolve this issue.

Comments

@newpoo
Copy link

newpoo commented Feb 21, 2018

The stack trace is as below. It's sleeping in DefaultRateLimiter for an insanely large number of seconds. The aws-sdk-cpp version is 1.2.7.

The issue goes away if we disable rate limiter.

What platform/OS are you using?

Ubuntu 14.04

What compiler are you using? what version?

gcc 5.4

What's your CMake arguments?

cmake -DCMAKE_BUILD_TYPE=Release -DCUSTOM_MEMORY_MANAGEMENT=0 -DSTATIC_LINKING=1 -DBUILD_ONLY="s3"

Can you provide a TRACE level log? (sanitize any sensitive information)

#0  0x00007fbc30d44b9d in nanosleep () from /mnt/rockstore/bin/libpthread.so.0
#1  0x0000000000705462 in sleep_for<long, std::ratio<1l, 1000l> > (__rtime=<synthetic pointer>) at /usr/include/c++/5/thread:292
#2  Aws::Utils::RateLimits::DefaultRateLimiter<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1l> >, true>::ApplyAndPayForCost (this=0x7fbc18e680d0, cost=5850) at /usr/local/include/aws/core/utils/ratelimiter/DefaultRateLimiter.h:115
#3  0x00007fbc32a43344 in Aws::Http::CurlHttpClient::WriteData(char*, unsigned long, unsigned long, void*) [clone .part.32] () from /mnt/rockstore/bin/libaws-cpp-sdk-core.so
#4  0x00007fbc2d3ebe10 in ?? () from /mnt/rockstore/bin/libcurl.so.4
#5  0x00007fbc2d405700 in ?? () from /mnt/rockstore/bin/libcurl.so.4
#6  0x00007fbc2d3fff37 in ?? () from /mnt/rockstore/bin/libcurl.so.4
#7  0x00007fbc2d409c2c in ?? () from /mnt/rockstore/bin/libcurl.so.4
#8  0x00007fbc2d40a3d1 in curl_multi_perform () from /mnt/rockstore/bin/libcurl.so.4
#9  0x00007fbc2d401a13 in curl_easy_perform () from /mnt/rockstore/bin/libcurl.so.4
#10 0x00007fbc32a45d84 in Aws::Http::CurlHttpClient::MakeRequest(Aws::Http::HttpRequest&, Aws::Utils::RateLimits::RateLimiterInterface*, Aws::Utils::RateLimits::RateLimiterInterface*) const () from /mnt/rockstore/bin/libaws-cpp-sdk-core.so
#11 0x00007fbc32a0e615 in Aws::Client::AWSClient::AttemptOneRequest(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*) const () from /mnt/rockstore/bin/libaws-cpp-sdk-core.so
#12 0x00007fbc32a11f3c in Aws::Client::AWSClient::AttemptExhaustively(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*) const () from /mnt/rockstore/bin/libaws-cpp-sdk-core.so
#13 0x00007fbc32a15394 in Aws::Client::AWSXMLClient::MakeRequest(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*) const () from /mnt/rockstore/bin/libaws-cpp-sdk-core.so
#14 0x00007fbc32667189 in Aws::S3::S3Client::ListObjects(Aws::S3::Model::ListObjectsRequest const&) const () from /mnt/rockstore/bin/libaws-cpp-sdk-s3.so
@marcomagdy
Copy link
Contributor

I see that you're using the DefaultRateLimiter. What values are you instantiating it with?

@newpoo
Copy link
Author

newpoo commented Feb 21, 2018

We use 64 * 1024 * 1024 as the rate parameter. This doesn't happen consistently. Seems to be triggered by some kind of race condition.

@marcomagdy
Copy link
Contributor

Wondering if your system clock gets skewed over time which messes up the calculation.
We did a bad job of logging in that particular class. But to rule that out, you can enable verbose logging and see if a clock skew is getting reported during requests. we do detect and log clock skew when making requests to calculate Sigv4 signature.

@newpoo
Copy link
Author

newpoo commented Feb 21, 2018

That's possible. we are running on EC2 instance with TSC clock, which is not stable.
We just wrote a simple rate limiter which can handle unstable clock. We will see if we hit this issue again with this simple limiter.

@marcomagdy
Copy link
Contributor

Any updates?

@newpoo
Copy link
Author

newpoo commented Feb 28, 2018

We haven't seen this issue again since we migrated to our own rate limiter.
In case it can benefit others, this is the rate limiter we are using:
https://github.com/pinterest/rocksplicator/blob/master/common/aws_s3_rate_limiter.h

@newpoo newpoo closed this as completed Feb 28, 2018
@justnance justnance added help wanted We are asking the community to submit a PR to resolve this issue. and removed help wanted labels Apr 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted We are asking the community to submit a PR to resolve this issue.
Projects
None yet
Development

No branches or pull requests

3 participants