-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Unbalanced InitializeGlobalStatics/CleanupGlobalStatics() calls in AWSClient and race in implementation #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dumb question, are you calling InitApi() before you do any of this? |
:-) Yes. It is possible that this logic is invoked outside of "aws init" guard object scope. But I've checked -- there are 8 guard objects active at this moment -- SDK is definitely initialized. There is at least a dozen of parallel threads retrieving other objects using the same one more note: crash is access violation at 0x00000020 address -- i.e. something somewhere is NULL when it shouldn't be. Offending code:
|
Also, nuget defaults don’t upgrade everything when you pull in a new dependency, can you check that all of the versions match? |
We don't keep pdb files around from earlier versions unfortunately. Also, Microsoft's debugger does some annoying timestamp checking to ensure the pdb file was created for the corresponding dll. So even if the code is the same it won't let you use the pdb. Your best bet is to build from source. |
Everything seems in order:
Sigh... I'll have to figure out how to do it -- we were always using nuget packages on windows. Too bad this problem popped up like literally one day before major release. :-\ Even .map file can be useful here -- I'd be able to find in which function this happens using it's address.
Custom streambuf is used only for upload -- for download I opted for piece-by-piece download using |
@crusader-mike what credentials provider are you using? |
@marcomagdy I pass credentials directly into S3Client ctor:
|
Huh... Upgrading SDK to 1.4.1 caused the problem to go away, it seems -- I can no longer reproduce it. What is more interesting -- CPU usage went down ~2 times and network utilization went up ~5 times (debug build). On the bad side -- our QA will chew my head off. They've just finished multi-week regression tests. |
Actually, taking it back... With latest SDK it shows up in another place and is even harder to reproduce. I spotted something though -- AWSClient ctor/dtor call InitializeGlobalStatics/CleanupGlobalStatics. And everything seems fine... Until you realize that generated move ctor is not gonna do that! Guess what -- my code moves S3Client instances. I am not sure that this is the root cause, but I'll keep digging. Also, InitializeGlobalStatics/CleanupGlobalStatics implementation looks very weird to me (and I've done a fair amount of lockless programming) -- I highly doubt it is correct. |
The compiler won't generate a move ctor if there's a user-defined dtor; and in this case, there is one defined. Which means we have a different kind of bug. When the client is copied, we don't increment the refcount. |
... but it will generate copy ctor, right? which won't call |
Yes, correct. That's probably the bug. See my updated comment above. |
Working on getting rid of "S3Client move" logic in our code and testing. (Edit: nope, crash is still there) Pretty sure there are 2 bugs here -- Some background -- I have a S3Client cache:
As you see -- after all is done, I could end up with "global statics" not being initialized even though I have live instances of S3Client in cache. I think my crashes happen during destruction of this cache. |
@marcomagdy What cmake cmdline I should use to build the same binary as one you publish in nuget? I tried different combinations -- every time I end up with dll size that differs from dll in nuget package. (my download speed ends up being 4 times slower too) |
Caught it -- same problem as #781
|
To summarize:
Thank you. |
Fixed in 1.4.55 |
This fix isn't good -- there are still race conditions here. For example, imagine a case of two threads calling Similarly, there is a race between dtor and ctor. Edit: I can't reopen this issue (and too lazy to file a new one), gotta leave this on your conscience :-) |
Whoops, that’s embarrassing. We’ll get that fixed. Thanks for spotting it. |
@crusader-mike can you take a look at #879 see if I missed a use case? |
Nope. You can't use atomic variable here to properly protect global state (unless your entire state is atomic variable). Think about it -- for other threads at any point in time there should be only two observable states: "s_refCount == 0 and global state is completely unitialized" and "s_refCount != 0 and global state is completely initialized". You are guaranteed to break this invariant by modifying s_refCount and global state separately (from other thread perspective). You have to use mutex here to "merge" these modifications into one (in eyes of observers). P.S. I am probably repeating myself -- but stay away from lockless programming unless you are ready to sink significant time into it. Even if you get to the point when you can comfortably write such code -- necessary knowledge will take so much space in your brain that it will probably push out most of other C++-related stuff :) |
I think I'll take @singku's suggestion and initialize this container in Aws::init. This whole ordeal was about trying to optimize that container so that it doesn't grow unbounded. But that reasoning is not strong enough to complicate the constructor's logic (and definitely doesn't warrant a mutex).
I appreciate your concern. But if I'm going to stay away from hard programming paradigms, I might as well go program in JavaScript or Ruby 😄. |
It isn't just a concern -- I think every time I stumbled upon an atomic variable in SDK code there was a race condition nearby. You can achieve similar effect using static variables or std::call_once -- though afaik they are implemented using mutex. Yes, moving this into AwsInit sounds like a good idea. atomic variables isn't necessarily better than mutex -- each atomic variable has a "mutex" associated with it (google MESI protocol) and price of using it grows with number of processors in your system and gets especially bad on modern servers (due to NUMA). A lot of wait-free algorithms turned out to be slower than straightforward mutex-protected code simply because with mutex you pay this price only twice (lock + unlock) but with atomic variables -- on every access. If you are interested in lockless programming I suggest reading/watching these:
|
IIRC this code was written before aws init existed. WRT luckless stuff, I agree. Another aspect here in hindsight, lockless algorithms aren’t great for long term code viability. Even if they are correct when written, it increases cognitive load for all future developers and will inevitably be broken by future changes. |
luckless™ code is amazing for generating cognitive load, lol. And sometimes it takes people years to realize that "it broke". |
That is not true. An atomic read of register-wide variables, like pointers and integers (at least on x86) is simply a memory load. Locking a mutex is a system call. |
My low-lvl x86 is a bit rusty, but I am sure there is a fence on read -- even if you don't observe in generated asm because x86 is naturally consistent (and acquire-read is just a read), it still exists on higher level (compiler can't reorder certain things around it). Also, afaik, on modern systems locking mutex is system call (or better to say -- switch to kernel mode) only if you have to wait. Grabbing mutex that isn't locked is inexpensive. In pre-futex times on Linux mutex was expensive, yes. |
Unfortunately I’ve had to read a ton of clang and llvm source code lately and pthread_mutex using the fast mutex init is only a system call under contention. The same is true for SRWLocks on windows... std::mutex simply calls into those apis. The days of critical sections and full mutex objects on windows are, thankfully, behind us.. |
Also, fwiw, you’re screwed on ARM, SPARC, and PowerPC regardless of which you use. |
@JonathanHenson Are you referring to this or the fact that on these platforms memory model is more relaxed than C++11 model (which means you can't use C++ to squeeze the maximum performance-wise)? |
Fixed in v1.4.57 |
Small note: since you've left |
I have to leave the dtor so it can be marked as virtual. I did have a defaulted copy/move ctors and assignment operators, but the code failed to build on VS2013, which unfortunately we still support. It is by far the worst C++11 compliant compiler. |
Yeah, VS2013 was rough around the edges... Can you work around that like this:
? |
addresses the issue raised in aws#815
see discussion at aws#815
Uh oh!
There was an error while loading. Please reload this page.
Platform: Windows
SDK version: 1.3.31
I am chasing a very rare crash (deep in AWS SDK) that happens in
GetObject()
call. After spending more than a day trying to reproduce it I finally caught one incidence in a debugger. Unfortunately nuget package doesn't contain .pdb files and symbols package was not refreshed since 2016.If you have them laying around -- can you provide me with pdb files for x64 v140 Debug 1.3.31 build of -core and -s3 dlls, please?
Also, it would be really helpful if .symbols nuget packages were released along with redist packages (or maybe even included into them).
Code is very simple and I suspect a bug in SDK itself:
Stack:
The text was updated successfully, but these errors were encountered: