Fix ValueObservation deadlock #1364

mattgallagher · 2023-04-18T02:55:10Z

This test demonstrates the possible deadlock between observers and writers as reported in issue #1362 . The observers all hold the Pool.itemSemaphore and the writer holds the SerializedDatabase.queue and neither can proceed since each is waiting for the other's resource.

To get this test reliable, I needed to insert a Thread.sleep inside the ValueConcurrentObserver. Obviously, this PR is not for merging. However, this Thread.sleep does not change the overall logic and exists primarily to simulate the system being under load.

The observerCount of 10 is not arbitrary. If you reduce the observerCount by 1, then the code doesn't deadlock so you can confirm that the test is otherwise logically sound.

…abasePool.

groue · 2023-04-18T06:32:57Z

Thank you very much @mattgallagher.

This is a deadlock indeed 😖

Many observations are started. Some of them (set A) are able to access the writer and register a transaction observer. The remaining observations (set B) could not yet access a writer access. Each one of them has captured a reader. Before observations in the set B are enqueued for a write access, an actual write is performed, that triggers one of the early observations. From the write access that has just performed the modification, this observation waits until it can find a reader and fetch the fresh values. But no reader is available, because they've all been captured by the observations of set B.

The writer waits for a reader, and all readers are held by observations of the set B that are waiting for the writer. That's the deadlock.

Apologies, of course.

groue · 2023-04-18T07:22:21Z

OK, so I had a good shower, and I have a plan :-) We're just paying the price of early days of observations, when I was obsessed about notifying every single change without ever ever missing one.

This is an atavism that was first broken in GRDB 5: ValueObservation on a DatabasePool has learned to start even if write transactions are currently running, with a price: individual changes performed during those early writes are not individually notified.

But after those early moments, an observation still exhibits a legacy behavior: it insists on notifying every single change.

That's why the triggered observation of the set A above blocks the writer until it could acquire a reader: it insists on fetching from the exact same state of the database that was left by the committed changes.

My plan is to have this observation perform an asynchronous fetch (or, more precisely, something more akin to "setNeedsFetching", so that fast subsequent writes don't trigger more fetches than needed). This will remove the deadlock, without breaking any documented behavior of ValueObservation.

I hope this is clear 🤞

mattgallagher · 2023-04-18T07:40:14Z

Sounds okay to me. For my part, I'm happy to receive slightly fewer observations. I leave it to you to decide if this requires breaking behavior. I assume that TransactionObserver would still receive every update for those situations where it is required.

groue · 2023-04-18T08:03:24Z

Sounds okay to me. For my part, I'm happy to receive slightly fewer observations.

This reminds me of #966 by @steipete. I'm not promising I'll beat both horses at the same time, but this definitely belongs to the same topic.

I leave it to you to decide if this requires breaking behavior.

The described changes are not supposed to be breaking, because the documentation clearly states (in order to support the behavior introduced by GRDB 5):

ValueObservation may coalesce subsequent changes into a single notification.

ValueObservation may notify consecutive identical values.

Those rules are supposed to give us enough freedom. Of course, some people may rely on some aspects of the current behavior. We'll see, then.

I don't want to lose what we have progressively gained for DatabasePool observations over the years. I think it's pretty cool:

#736: The ability to fetch the initial value even if there is a concurrent write (some writes are very slow, and people were not happy: #601). This has introduced two side effects: the two relaxed rules above, but also the double notification of the initial value.
#1248: The removal of this double notification when the version of SQLite allows it. This has introduced a side effect: the creation of more reader connections.
#1350: The removal of those extra connections. Side effect: our deadlock 😬
And now, let's now remove the deadlock, in the context of the two relaxed rules! 🚀

I assume that TransactionObserver would still receive every update for those situations where it is required.

Yes, totally. TransactionObserver is very low-level, and it must keep on exposing everything.

groue · 2023-04-18T16:23:51Z

Rewrote the test and removed the FIXME. Now I can look for the solution.

groue · 2023-04-19T11:58:16Z

Last CI tests have passed, but the previous ones did not. Things have turned flaky.

I have to check if tests are badly written, or if the fix for the deadlock has introduced a bug.

groue · 2023-04-29T14:34:07Z

OK, I think we're good.

groue · 2023-04-29T14:36:10Z

Oops, wrong destination branch 😬😅

groue · 2023-04-29T16:31:10Z

The fix has shipped in v6.12.0. Thanks again for your help, @mattgallagher 👍

A test demonstrating deadlock when observer count exceeds size of Dat…

c159638

…abasePool.

mattgallagher marked this pull request as draft April 18, 2023 02:57

Removed some unneeded configuration from the test.

7feaa74

mattgallagher mentioned this pull request Apr 18, 2023

Deadlock in Pool.swift in v6.10.1 #1362

Closed

The scenario that leads to the #1362 deadlock

d07e8bd

groue force-pushed the issue1362-test branch from a5c4f47 to 0041f2d Compare April 18, 2023 18:19

groue changed the title ~~A test demonstrating issue #1362~~ Fix ValueObservation deadlock Apr 19, 2023

groue added 4 commits April 29, 2023 12:47

Refactor ValueObservationRecorder matching test

45edf58

Disable xcpretty on $CI

bf28d1b

Fix tests that relied on observation of every single changes

8b1b3b0

Fix #1362

65c803a

groue force-pushed the issue1362-test branch from 8d86f59 to 65c803a Compare April 29, 2023 11:25

groue marked this pull request as ready for review April 29, 2023 14:34

groue added the bug label Apr 29, 2023

groue merged commit f955e5f into master Apr 29, 2023

groue deleted the issue1362-test branch April 29, 2023 14:35

groue mentioned this pull request Jun 1, 2023

SQL logic error when starting a ValueObservation immediately after opening a DatabasePool #1383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix ValueObservation deadlock #1364

Fix ValueObservation deadlock #1364

Uh oh!

mattgallagher commented Apr 18, 2023 •

edited

Loading

Uh oh!

groue commented Apr 18, 2023 •

edited

Loading

Uh oh!

groue commented Apr 18, 2023 •

edited

Loading

Uh oh!

mattgallagher commented Apr 18, 2023 •

edited

Loading

Uh oh!

groue commented Apr 18, 2023 •

edited

Loading

Uh oh!

groue commented Apr 18, 2023

Uh oh!

groue commented Apr 19, 2023

Uh oh!

groue commented Apr 29, 2023

Uh oh!

groue commented Apr 29, 2023 •

edited

Loading

Uh oh!

groue commented Apr 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix ValueObservation deadlock #1364

Fix ValueObservation deadlock #1364

Uh oh!

Conversation

mattgallagher commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groue commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groue commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattgallagher commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groue commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groue commented Apr 18, 2023

Uh oh!

groue commented Apr 19, 2023

Uh oh!

groue commented Apr 29, 2023

Uh oh!

groue commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groue commented Apr 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattgallagher commented Apr 18, 2023 •

edited

Loading

groue commented Apr 18, 2023 •

edited

Loading

groue commented Apr 18, 2023 •

edited

Loading

mattgallagher commented Apr 18, 2023 •

edited

Loading

groue commented Apr 18, 2023 •

edited

Loading

groue commented Apr 29, 2023 •

edited

Loading