-
-
Notifications
You must be signed in to change notification settings - Fork 812
Fix ValueObservation deadlock #1364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you very much @mattgallagher. This is a deadlock indeed 😖 Many observations are started. Some of them (set A) are able to access the writer and register a transaction observer. The remaining observations (set B) could not yet access a writer access. Each one of them has captured a reader. Before observations in the set B are enqueued for a write access, an actual write is performed, that triggers one of the early observations. From the write access that has just performed the modification, this observation waits until it can find a reader and fetch the fresh values. But no reader is available, because they've all been captured by the observations of set B. The writer waits for a reader, and all readers are held by observations of the set B that are waiting for the writer. That's the deadlock. Apologies, of course. |
|
OK, so I had a good shower, and I have a plan :-) We're just paying the price of early days of observations, when I was obsessed about notifying every single change without ever ever missing one. This is an atavism that was first broken in GRDB 5: ValueObservation on a DatabasePool has learned to start even if write transactions are currently running, with a price: individual changes performed during those early writes are not individually notified. But after those early moments, an observation still exhibits a legacy behavior: it insists on notifying every single change. That's why the triggered observation of the set A above blocks the writer until it could acquire a reader: it insists on fetching from the exact same state of the database that was left by the committed changes. My plan is to have this observation perform an asynchronous fetch (or, more precisely, something more akin to "setNeedsFetching", so that fast subsequent writes don't trigger more fetches than needed). This will remove the deadlock, without breaking any documented behavior of ValueObservation. I hope this is clear 🤞 |
|
Sounds okay to me. For my part, I'm happy to receive slightly fewer observations. I leave it to you to decide if this requires breaking behavior. I assume that TransactionObserver would still receive every update for those situations where it is required. |
This reminds me of #966 by @steipete. I'm not promising I'll beat both horses at the same time, but this definitely belongs to the same topic.
The described changes are not supposed to be breaking, because the documentation clearly states (in order to support the behavior introduced by GRDB 5):
Those rules are supposed to give us enough freedom. Of course, some people may rely on some aspects of the current behavior. We'll see, then. I don't want to lose what we have progressively gained for DatabasePool observations over the years. I think it's pretty cool:
Yes, totally. TransactionObserver is very low-level, and it must keep on exposing everything. |
|
Rewrote the test and removed the FIXME. Now I can look for the solution. |
|
Last CI tests have passed, but the previous ones did not. Things have turned flaky. I have to check if tests are badly written, or if the fix for the deadlock has introduced a bug. |
|
OK, I think we're good. |
|
Oops, wrong destination branch 😬😅 |
|
The fix has shipped in v6.12.0. Thanks again for your help, @mattgallagher 👍 |
This test demonstrates the possible deadlock between observers and writers as reported in issue #1362 . The observers all hold the
Pool.itemSemaphoreand the writer holds theSerializedDatabase.queueand neither can proceed since each is waiting for the other's resource.To get this test reliable, I needed to insert a
Thread.sleepinside theValueConcurrentObserver. Obviously, this PR is not for merging. However, thisThread.sleepdoes not change the overall logic and exists primarily to simulate the system being under load.The
observerCountof 10 is not arbitrary. If you reduce theobserverCountby 1, then the code doesn't deadlock so you can confirm that the test is otherwise logically sound.