Skip to content

[D114119][libc++] Fix potential lost wake-up in counting semaphore #77659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ldionne opened this issue Jan 10, 2024 · 0 comments · Fixed by #79265
Closed

[D114119][libc++] Fix potential lost wake-up in counting semaphore #77659

ldionne opened this issue Jan 10, 2024 · 0 comments · Fixed by #79265
Assignees
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. phabricator Related to migration from Phabricator

Comments

@ldionne
Copy link
Member

ldionne commented Jan 10, 2024

This issue tracks picking up https://reviews.llvm.org/D114119 from the Phabricator archive.

The "futex table" in src/atomic.cpp contains 256 platform native futexes
(something like Linux/OpenBSD/NetBSD futex, macOS _ulock*, FreeBSD
umtx, Windows WaitOnAddress). This table is needed because some platforms
don't natively support waiting on atomics of sizes 8/16/32/64 bits. On Linux,
for example, futexes are always 32 bit. On these platforms, calling
atomic_wait on an atomic of unsupported size then waits using one of
the table futexes "under the hood".
Each futex of the table has an associated "waiter count" (__contention_state)
which tracks the number of waiters on that futex. This count is used to avoid
calling the platform wake function needlessly.
If the platform does support the size of the atomic natively then the
wait/notify syscalls (e.g. futex) are called directly on the atomic instead
of using one from the table. However, the "waiter count" of the table futex
is still used to minimize native wake up calls.
The part of __libcpp_atomic_wait_backoff_impl where the waiting actually
happens looks like this:

auto const __monitor = __libcpp_atomic_monitor(__a);
if(__test_fn())
    return true;
__libcpp_atomic_wait(__a, __monitor);

...so the current value of the futex (either the table one if the size is not
natively supported, or the actual one otherwise) is saved. Then the test
function is called. If the test is unsuccessful (in case of semaphores the
atomic is "0") then we wait on the futex, telling it the __monitor value to
be free of races.
This will work absolutely fine if one of the table futexes is used. Waking one
of them up will always increase the futex value by one, so the futex wait will
instantly return if the current atomic value is different than __monitor
(which indicates that a wakeup must have happened after the call to
__test_fn but before going to sleep). There is a corner case when there are
~4 billion wakeups in this short interval. This is mitigated by waiting for at
most 2s when the native platform futex size is only 32 bit (explained by
Olivier Giroux here: ogiroux/atomic_wait#3).
Now, if the native atomic/futex is used (which is the case on macOS for
counting_semaphore!), deadlocks are much more likely to occur, because now
the futex is no longer a simple counter that always increments, but instead
the "real" value (in case of counting_semaphore just the semaphore value
itself). There might be executions like the following:

- T1 calls acquire()
- T2 calls try_acquire()
- T3 calls release() two times
- semaphore starts with "0"

T1                        T2                        T3              sem
------------------------------------------------------------------------
                                                                     0
------------------------------------------------------------------------
acquire():
sees that sema=0
goes into
__libcpp_atomic_wait_backoff_impl                                    0
------------------------------------------------------------------------
                                                    release()        1
------------------------------------------------------------------------
__monitor=1                                                          1
------------------------------------------------------------------------
                          try_acquire():
                          acquires one
                          successfully,
                          returns true                               0
------------------------------------------------------------------------
__test_fn():
returns false
since sem==0                                                         0
------------------------------------------------------------------------
                                                    release()        1
------------------------------------------------------------------------
calls
__libcpp_atomic_wait(__a, 1);
--> deadlock!                                                        1
------------------------------------------------------------------------

Here, __libcpp_atomic_wait(__a, 0); should have been called, because
__test_fn() made a decision to sleep based on the value "0" of sem. But
instead, the value of __monitor is used which is "1" and leads to this ABA
style problem where T1 sleeps although the semaphore is unlocked.
I think to solve this problem you could require the "test function" to save its
current understanding of the atomic into an in/out argument so that the atomic
wait function knows which value to use for its second argument. In case of the
semaphores, it could look like this:

     _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY
     void acquire()
     {
-        auto const __test_fn = [this]() -> bool {
-            auto __old = __a.load(memory_order_relaxed);
-            return (__old != 0) && __a.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
+        auto const __test_fn = [this](ptrdiff_t& __old) -> bool {
+            while (true) {
+                if (__old == 0)
+                    return false;
+                if (__a.compare_exchange_weak(__old, __old - 1, memory_order_acquire, memory_order_relaxed))
+                    return true;
+            }
         };
-        __cxx_atomic_wait(&__a.__a_, __test_fn);
+        __cxx_atomic_wait_fn(&__a.__a_, __test_fn, memory_order_relaxed);
     }

For semaphores, the resulting __old is always "0" if "false" is returned. So
the second argument to futex is always "0" which should be correct..
I've been hacking on a patch which is very much WIP but I could post it if
there's interest (and if this analysis even makes sense!).

@ldionne ldionne added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. awaiting-review Has pending Phabricator review labels Jan 10, 2024
@ldionne ldionne added phabricator Related to migration from Phabricator and removed awaiting-review Has pending Phabricator review labels Jan 10, 2024
agozillon pushed a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. phabricator Related to migration from Phabricator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants