Skip to content

Conversation

@mattgallagher
Copy link
Collaborator

@mattgallagher mattgallagher commented Apr 18, 2023

This test demonstrates the possible deadlock between observers and writers as reported in issue #1362 . The observers all hold the Pool.itemSemaphore and the writer holds the SerializedDatabase.queue and neither can proceed since each is waiting for the other's resource.

To get this test reliable, I needed to insert a Thread.sleep inside the ValueConcurrentObserver. Obviously, this PR is not for merging. However, this Thread.sleep does not change the overall logic and exists primarily to simulate the system being under load.

The observerCount of 10 is not arbitrary. If you reduce the observerCount by 1, then the code doesn't deadlock so you can confirm that the test is otherwise logically sound.

@mattgallagher mattgallagher marked this pull request as draft April 18, 2023 02:57
@groue
Copy link
Owner

groue commented Apr 18, 2023

Thank you very much @mattgallagher.

This is a deadlock indeed 😖

Many observations are started. Some of them (set A) are able to access the writer and register a transaction observer. The remaining observations (set B) could not yet access a writer access. Each one of them has captured a reader. Before observations in the set B are enqueued for a write access, an actual write is performed, that triggers one of the early observations. From the write access that has just performed the modification, this observation waits until it can find a reader and fetch the fresh values. But no reader is available, because they've all been captured by the observations of set B.

The writer waits for a reader, and all readers are held by observations of the set B that are waiting for the writer. That's the deadlock.

Apologies, of course.

@groue
Copy link
Owner

groue commented Apr 18, 2023

OK, so I had a good shower, and I have a plan :-) We're just paying the price of early days of observations, when I was obsessed about notifying every single change without ever ever missing one.

This is an atavism that was first broken in GRDB 5: ValueObservation on a DatabasePool has learned to start even if write transactions are currently running, with a price: individual changes performed during those early writes are not individually notified.

But after those early moments, an observation still exhibits a legacy behavior: it insists on notifying every single change.

That's why the triggered observation of the set A above blocks the writer until it could acquire a reader: it insists on fetching from the exact same state of the database that was left by the committed changes.

My plan is to have this observation perform an asynchronous fetch (or, more precisely, something more akin to "setNeedsFetching", so that fast subsequent writes don't trigger more fetches than needed). This will remove the deadlock, without breaking any documented behavior of ValueObservation.

I hope this is clear 🤞

@mattgallagher
Copy link
Collaborator Author

mattgallagher commented Apr 18, 2023

Sounds okay to me. For my part, I'm happy to receive slightly fewer observations. I leave it to you to decide if this requires breaking behavior. I assume that TransactionObserver would still receive every update for those situations where it is required.

@groue
Copy link
Owner

groue commented Apr 18, 2023

Sounds okay to me. For my part, I'm happy to receive slightly fewer observations.

This reminds me of #966 by @steipete. I'm not promising I'll beat both horses at the same time, but this definitely belongs to the same topic.

I leave it to you to decide if this requires breaking behavior.

The described changes are not supposed to be breaking, because the documentation clearly states (in order to support the behavior introduced by GRDB 5):

ValueObservation may coalesce subsequent changes into a single notification.

ValueObservation may notify consecutive identical values.

Those rules are supposed to give us enough freedom. Of course, some people may rely on some aspects of the current behavior. We'll see, then.

I don't want to lose what we have progressively gained for DatabasePool observations over the years. I think it's pretty cool:

  1. #736: The ability to fetch the initial value even if there is a concurrent write (some writes are very slow, and people were not happy: #601). This has introduced two side effects: the two relaxed rules above, but also the double notification of the initial value.
  2. #1248: The removal of this double notification when the version of SQLite allows it. This has introduced a side effect: the creation of more reader connections.
  3. #1350: The removal of those extra connections. Side effect: our deadlock 😬
  4. And now, let's now remove the deadlock, in the context of the two relaxed rules! 🚀

I assume that TransactionObserver would still receive every update for those situations where it is required.

Yes, totally. TransactionObserver is very low-level, and it must keep on exposing everything.

@groue
Copy link
Owner

groue commented Apr 18, 2023

Rewrote the test and removed the FIXME. Now I can look for the solution.

@groue groue changed the title A test demonstrating issue #1362 Fix ValueObservation deadlock Apr 19, 2023
@groue
Copy link
Owner

groue commented Apr 19, 2023

Last CI tests have passed, but the previous ones did not. Things have turned flaky.

I have to check if tests are badly written, or if the fix for the deadlock has introduced a bug.

@groue
Copy link
Owner

groue commented Apr 29, 2023

OK, I think we're good.

@groue groue marked this pull request as ready for review April 29, 2023 14:34
@groue groue added the bug label Apr 29, 2023
@groue groue merged commit f955e5f into master Apr 29, 2023
@groue groue deleted the issue1362-test branch April 29, 2023 14:35
@groue
Copy link
Owner

groue commented Apr 29, 2023

Oops, wrong destination branch 😬😅

@groue
Copy link
Owner

groue commented Apr 29, 2023

The fix has shipped in v6.12.0. Thanks again for your help, @mattgallagher 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants