Skip to content

Replace DirtyNIF execution model with a different mechanism #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
warmwaffles opened this issue Feb 2, 2022 · 56 comments
Open

Replace DirtyNIF execution model with a different mechanism #192

warmwaffles opened this issue Feb 2, 2022 · 56 comments
Milestone

Comments

@warmwaffles
Copy link
Member

The dirtynif execution model is okay for most cases, but we have an issue where long running writes need to timeout / be interrupted mid execution and aborted. The only way to do that is to utilize https://sqlite.org/c3ref/progress_handler.html

@warmwaffles warmwaffles added this to the 1.0 milestone Feb 2, 2022
@josevalim
Copy link

I wonder if you may be able to stick with the process model as long as you assume each instance of the database is backed by one process. This way you make Elixir processes your thread pool, instead of having to deal with it in C/C++/Rust.

@warmwaffles warmwaffles changed the title Replace DirtyNIF execution model with threadpool Replace DirtyNIF execution model with a different mechanism Jan 19, 2023
@warmwaffles
Copy link
Member Author

@josevalim I think that's a good idea where only one process in the supervision tree sits there waiting for another DbConnection call. This could make it easier to cancel existing running queries.

Let me noodle on this a little more.

@josevalim
Copy link

Right, I think in this sense DBConnection may be getting more in the way than helping. Because you need serial access around SQLite3 and having a pool around serial access is not giving you anything. And when you don't need serial access (i.e. read operations), you don't need a pool at all.

DBConnection is necessary for Ecto but, even then, it should not be a requirement. There are only three features in Ecto.Adapters.SQL that require DBConnection (sandbox, stream, and disconnect_all) but they are all addressable.

Then there is the part of DBConnection as a convenience. I can think of two features it gives:

  1. a transaction manager - if the process doing a transaction crashes, we need to abort it. but perhaps sqlite3 api itself already has conveniences for handling dangling transactions
  2. the ownership pool

Of course there is a lot of work here, so I am not saying we should yank DBConnection, but it may be worth considering moving the DBConnection bits to ecto_sqlite3. In this project, based on the analysis above (which may be incomplete), you would only be missing the transaction manager (and it may be provided by sqlite3).

@LostKobrakai
Copy link
Contributor

There might be an additional benefit to managing transactions in a sqlite specific way. Currently there's no information returned to changesets about which constraint was violated, but there seems to be some pragmas to figure those out as sqlite doesn't close transactions on errors.

What I'm wondering about @josevalim's suggestion is how using processes to serialize or not serialize access would work for transactions which only become write transactions mid execution. Wouldn't the suggested approach require knowing if a transaction does writes or not in advance?

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

Wouldn't having one queuing process per db instance negate WAL benefits? There still needs to be a pool of connections for concurrent reads. It's not possible to open a single connection and start multiple transactions on it. And if we try to work around that with snapshots, we'd start losing isolation.

@josevalim
Copy link

@ruslandoga the reads can go directly to the NIF. Only the writes and operations inside a transaction would go through the process.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

But how exactly would reads go directly to the NIF? Using what database connection? It can't be the one from the "writer" process since it might be in the middle of a transaction, and then we need synchronisation of reads and writes.

Ah, I think I understand now. Since the reads are "scoped" to a prepared statement, it might be possible to reuse a single database connection for multiple statements at once. I'll try it out.

UPDATE it works: ruslandoga#3; and reads seemingly became faster

iex(1)> {:ok, conn} = Exqlite.start_link(database: "demo.db", debug: [:trace])
{:ok, #PID<0.245.0>}
iex(2)> Exqlite.write(conn, "create table users(name text not null) strict")
# *DBG* <0.245.0> got call checkout from <0.252.0>
# *DBG* <0.245.0> sent ok to <0.252.0>, new state {[],<0.252.0>}
# *DBG* <0.245.0> got cast checkin
{:ok, []}
# *DBG* <0.245.0> new state {[],nil}

iex(3)> Exqlite.write(conn, "insert into users(name) values(?)", ["john"])
{:ok, []}
iex(4)> Exqlite.write(conn, "insert into users(name) values(?)", ["david"])
{:ok, []}
iex(5)> Exqlite.write(conn, "insert into users(name) values(?)", ["peter"])
{:ok, []}

iex(6)> {:ok, task_sup} = Task.Supervisor.start_link
{:ok, #PID<0.254.0>}

iex(7)> Task.Supervisor.async_stream_nolink(task_sup, 1..10000, fn _ -> Exqlite.query(conn, "select * from users") end, max_concurrency: 1000) |> Enum.into([])
[
  ok: {:ok, [["john"], ["david"], ["peter"]]},
  ok: {:ok, [["john"], ["david"], ["peter"]]},
  ok: {:ok, [["john"], ["david"], ["peter"]]},
  ok: {:ok, [["john"], ["david"], ["peter"]]},
  ok: {:ok, [["john"], ["david"], ["peter"]]}
# ...

I wonder how the statements cache would work with this approach. It'd need to be serialised somehow as well.

@LostKobrakai
Copy link
Contributor

LostKobrakai commented Feb 9, 2023

I think the tricky thing would be supporting concurrent reads from within transactions, which didn't yet do writes:
https://www.sqlite.org/lang_transaction.html#deferred_immediate_and_exclusive_transactions

Not sure how exqlite could know which statement within a transaction requires a SHARED LOCK (reads) vs one of the ones required for writing. Maybe it could naively check for SELECT … in the sql.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

@LostKobrakai I think all transactions would need to go through the queue. Even if a read transaction never becomes a write one, it still can't be executed without syncing with the writers first. Otherwise multiple transactions can be attempted:

BEGIN; -- from writer in process 1
INSERT INTO ...
INSERT INTO ...

                 BEGIN; -- a reader starts a transaction in process 2
                 -- SQLite complains about re-entry: "cannot start a transaction within a transaction"

COMMIT; -- writer finishes the transaction from process 1

This is actually what I was worried about in the comments above. But since reads don't have to be wrapped in transactions, single connection approach works.

It might be possible to do snapshots though, for read-only transactions without BEGIN / COMMIT, at the NIF level.

@LostKobrakai
Copy link
Contributor

I just tried this with two table plus instances on the same database and it allowed me to start a read transaction just fine while another write transaction was started. Trying to start a write transaction I got database is locked. Though I guess that's two "connections" to sql, is it?

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

two table plus instances

That means two connections. I think the approach discussed in this issue is having only a single connection open to the database.


$ sqlite3 demo.db

SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> begin;
sqlite> begin;
Runtime error: cannot start a transaction within a transaction

@LostKobrakai
Copy link
Contributor

Given DBConnection is an abstraction over connections I'm not so sure. I'd for sure appreciate if it actually could do concurrent reads (in a transaction) given how much code in a business application requires to be run in a transaction, which usually start reading a bunch of things before they write.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

I think there is danger of those transactions failing with SQLITE_BUSY so serialising all transactions seems like a safer way.

-- conn 1                                            -- conn 2
sqlite> begin;
                                                     sqlite> begin;
                                                     sqlite> select * from users;
sqlite> delete from users where name = 'patrick';
                                                     sqlite> insert into users(name) values('jacob');
                                                     Runtime error: database is locked (5)

If it's about performance, we can try coming up with some benchmarks to see at what write/read ratio a single connection approach wins / loses. My hot take is that (read)write transactions are rare and reads are common and having un-pooled reads would end up being faster on average. Better memory usage too, probably.

@LostKobrakai
Copy link
Contributor

That's why I was wondering if exqlite could catch writes before they queue for "BUSY WAIT", but the queuing happens in elixir land. In the end this wouldn't be much different to e.g. a timeout due to a to long running query with e.g. postgres/ecto. Another option would be possible if exqlite can timeout a transaction from the outside and setting busy_timeout to a value larger than elixirs internal timeout.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

I don't understand. How would busy_timeout help in the above example? No matter how long we wait, if we started a deferred transaction before some other connection made a write to a table we read from, we won't be able to commit it even if the other connection's writer is finished.

-- conn 1                                            -- conn 2
sqlite> begin;
                                                     sqlite> begin;
                                                     sqlite> select * from users;
sqlite> delete from users where name = 'patrick';
sqlite> commit;
                                                     sqlite> insert into users(name) values('jacob');
                                                     Runtime error: database is locked (5)

We can however remove all queueing in Elixir and rely on what SQLite returns (error code or ok), but that seems more difficult to develop for. One would need to know and understand SQLite specifics. And we probably wouldn't be able to rollback exiting writers or provide :checkout_timeout per query since pragma busy_timeout is global (I think).


I think busy_timeout would make concurrent read+write begin immediate; transactions work by queuing them, but that'd disable concurrent reads as well.

@LostKobrakai
Copy link
Contributor

That's unexpected, I would've expected this to work based on the documentation on the topic. If this is the case I'd wonder why have busy_timeout in the first place, if waiting makes no sense.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

What if we try updating a record we read but that since has been deleted? I don't think it can work at all.

-- conn 1                                            -- conn 2
sqlite> begin;
                                                     sqlite> begin;
                                                     sqlite> select rowid from users where name = 'john';
sqlite> delete from users where name = 'john';
sqlite> commit;
                                                     sqlite> update users set name = 'john jr.' where rowid = 1;
                                                     Runtime error: database is locked (5)

@LostKobrakai
Copy link
Contributor

It there are conflicting changes I think most databases will roll back one transaction. Though deleting one row and adding another shouldn't really be conflicting, but that might just be sqlite.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 9, 2023

I'm back to thinking that a single connection with un-pooled reads wouldn't work. If we issue reads directly to the NIF while a writer is in a transaction, we lose isolation guarantees as we can read data from unfinished transaction.

iex> spawn(fn ->
  Exqlite.transaction(db, fn conn ->
    Exqlite.query(conn, "insert into users(name) values(?)", ["new"])
    :timer.sleep(:timer.seconds(1000))
  end)
end)
iex> Exqlite.query(db, "select * from users")
{:ok,
 [
   ["david"],
   ["peter"],
   ["new"]
 ]}

I'm thinking now about using two connections, one for writers and one for readers. The readers connection would accept un-pooled reads. But even when un-pooled, reads would still limited by the number of dirty schedulers.


but that might just be sqlite.

Yeah, I think in the stock version SQLite "marks" whole database as dirty (or rather it compares the transaction ids when it started and at the first write) if it has been modified (outside of the current transaction) since the transaction has started. There is a begin-concurrent branch that allows marking pages as dirty (but that still doesn't help when the databases are small and fit in a few pages), and now I think a high-concurrency branch is in the works with row-level locking like in the other dbs.

@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 24, 2023

I'm thinking now about using two connections, one for writers and one for readers. The readers connection would accept un-pooled reads.

That didn't end up working either. Reads start an implicit transaction and that's a problem since concurrent reads can force that transaction to be indefinitely open and that connection would never get new data.

conn1 (readonly)                                            conn2 (writeonly)
prepare(stmt1)
step(stmt1) <- tx opened                                                 
                                                            insert row
step step step(stmt1) <- no row, ok
prepare(stmt2) 
step step step (stmt2) <- no row
finish(stmt1) <- tx not closed since stmt2 still stepping
prepare(stmt3) <- no row, not ok since current tx was opened for stmt1                                                                          

@josevalim
Copy link

josevalim commented Jul 13, 2023

Sorry for staying out of this conversation and thanks for exploring @ruslandoga.

It seems like we would need the process to work as read-write lock. While reads are happening, we allow them to run concurrently. Once a write arrives, we start queueing events and draining the reads. Once they are all drained, execute the write, and then move to the next in the queue. If next is a read, we start doing concurrent work again. All queries can still run on the client. You can find an implementation of serialized reads-writes from NIF running from the client here: https://github.com/elixir-explorer/adbc/blob/main/lib/adbc_connection.ex

Having a concurrent reads version should an expansion of the API above (but hey, you can even start with a full serialized version and add the read concurrency later). API wise, you could keep "query" as a write operation but you can introduce a "read_query" where the user guarantees they are exclusively reading.

In any case, the main point is that I think this library can be cleaner and easier to maintain if it doesn't try to do NIFs, process management, and DBConnection at the same time. I think NIFs + process management would make this usable across more scenarios and DBConnection is an Ecto concern (which perhaps we can move all the way up to Ecto itself).

@ruslandoga
Copy link
Contributor

ruslandoga commented Jul 13, 2023

@josevalim @warmwaffles if there are no other takers, I'm interested in exploring this approach :)

@josevalim
Copy link

Definitely go for it from my side - but that's @warmwaffles decision to make. :)

@warmwaffles
Copy link
Member Author

warmwaffles commented Jul 13, 2023

Yea sure why not. @ruslandoga if you want to give it a shot I'm all for it. I'm really swamped with day job work at the moment, but I have time to do reviews and testing implementations.

@ruslandoga
Copy link
Contributor

ruslandoga commented Jul 19, 2023

I've started a PR #255 for #192 (comment) and while working on it I had one concern. With this approach we miss out on WAL allowing multiple readers when there's a writer, but I guess it could be tackled in ecto_sqlite3 by starting two connections and, for example, dedicating one for writes and the other for reads. To avoid #192 (comment) these connections would be "cooperative" so that after a write happens, the read connection would be drained from reads to close the implicit transaction and allow for the new data to become visible.

@ruslandoga
Copy link
Contributor

ruslandoga commented Jul 21, 2023

The only thing the "reader genserver" does in "normal" state is preparing the statements. Since there is no cache, we can move that to the caller as well.

Then the bottleneck would only be able to appear during read connection draining after a locked query finishes (assuming it was a write). Just to recap, we do this to make any implicit transactions opened in that read connection to be closed by finishing/releasing all ongoing statements.

During this draining all incoming reads are stored in the queue inside genserver, but it doesn't appear to be all that different from how db_connection works, with the difference that it probably uses ets to queue the callers: the only reason for the read queue to grow are long reads started before the locked query finished, maybe during draining we can switch into codel mode... I'll try to cause this scenario in the benchmarks to see how big of an issue this is.


I'll also try creating a new read connection each time we start draining the old one instead of queuing reads.

@warmwaffles
Copy link
Member Author

One of the main reasons I wanted to look into replacing the DirtyNIF execution model is that I would like to be able to cancel queries that overrun the timeout. Ecto would timeout requests if they took too long to run while modifying records.

One use case was a developer I was talking to was updating / cleaning up records and the query was touching thousands of records overrunning the default timeout. Although it looked like it timed out, instead it was still running in the background and causing compounding latency issues.

What my hope is that with the new implementation that you are working on @ruslandoga we are not going to timeout the connection at all, but instead keep it open until the request finishes.

@ruslandoga
Copy link
Contributor

ruslandoga commented Oct 9, 2023

Right now I'm exploring the following approach:

  • exqlite becomes just the NIF wrapper, like https://github.com/ruslandoga/sqlite.ex, but exposes some more utility functions
  • ecto_sqlite3 starts two processes, a reader and a writer (for a readwrite db), with reader multiplexing select statements and waiting for transaction close on changes to db, and with writer just queueing writes and Repo.query calls

I'll try to finish it up by the end of the week and open new PRs.

@ruslandoga
Copy link
Contributor

ruslandoga commented Oct 9, 2023

One of the main reasons I wanted to look into replacing the DirtyNIF execution model is that I would like to be able to cancel queries that overrun the timeout. Ecto would timeout requests if they took too long to run while modifying records.

If we go with drop-DBConnection then there wouldn't be any need for timeouts, I think. But if we still want to provide them, it can be done while keeping dirty schedulers but with a slightly different INSERT handling: instead of building a single multi-row INSERT INTO ... (?, ?, ?), (?, ?, ?), (?, ?, ?), etc.. statement we can prepare an insert for a single row and re-bind it for every row. We can do it similar to multi_step with a limited number of steps per each invocation (if the caller times out and exits, we stop invoking and rollback). I called it multi_bind_step and it seems to be ~x2 faster than the current approach (benched in an in-memory DB, to measure the "nif overhead" instead of actual performance):

mix run bench/insert_all.exs from a GitHub Actions run
Benchmark: https://github.com/ruslandoga/sqlite.ex/blob/master/bench/insert_all.exs

Results are copied from https://github.com/ruslandoga/sqlite.ex/actions/runs/6361581237/job/17276103771

Operating System: Linux
CPU Information: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Number of Available Cores: 2
Available memory: 6.76 GB
Elixir 1.15.6
Erlang 26.1.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 10, 100, 1000, 10000
Estimated total run time: 56 s

Benchmarking Exqlite.Sqlite3 single insert with input 10 ...
Benchmarking Exqlite.Sqlite3 single insert with input 100 ...
Benchmarking Exqlite.Sqlite3 single insert with input 1000 ...
Benchmarking Exqlite.Sqlite3 single insert with input 10000 ...
Benchmarking SQLite.multi_bind_step with input 10 ...
Benchmarking SQLite.multi_bind_step with input 100 ...
Benchmarking SQLite.multi_bind_step with input 1000 ...
Benchmarking SQLite.multi_bind_step with input 10000 ...

##### With input 10 #####
Name                                    ips        average  deviation         median         99th %
SQLite.multi_bind_step              26.96 K       37.09 μs   ±189.66%       35.20 μs       60.82 μs
Exqlite.Sqlite3 single insert       11.72 K       85.34 μs    ±67.27%       85.20 μs      113.40 μs

Comparison: 
SQLite.multi_bind_step              26.96 K
Exqlite.Sqlite3 single insert       11.72 K - 2.30x slower +48.25 μs

##### With input 100 #####
Name                                    ips        average  deviation         median         99th %
SQLite.multi_bind_step               6.84 K      146.22 μs    ±38.67%      141.50 μs      301.60 μs
Exqlite.Sqlite3 single insert        3.05 K      327.79 μs    ±48.37%      301.80 μs      967.90 μs

Comparison: 
SQLite.multi_bind_step               6.84 K
Exqlite.Sqlite3 single insert        3.05 K - 2.24x slower +181.57 μs

##### With input 1000 #####
Name                                    ips        average  deviation         median         99th %
SQLite.multi_bind_step               807.23        1.24 ms     ±9.70%        1.22 ms        1.55 ms
Exqlite.Sqlite3 single insert        411.44        2.43 ms     ±9.29%        2.35 ms        3.00 ms

Comparison: 
SQLite.multi_bind_step               807.23
Exqlite.Sqlite3 single insert        411.44 - 1.96x slower +1.19 ms

##### With input 10000 #####
Name                                    ips        average  deviation         median         99th %
SQLite.multi_bind_step                84.01       11.90 ms    ±15.12%       12.21 ms       14.41 ms
Exqlite.Sqlite3 single insert         41.14       24.31 ms     ±3.27%       24.34 ms       26.87 ms

Comparison: 
SQLite.multi_bind_step                84.01
Exqlite.Sqlite3 single insert         41.14 - 2.04x slower +12.40 ms

And I forgot about UPDATE/DELETE (bad ClickHouse influence)

For UPDATE/DELETE, indeed, we'd need to do yielding nifs that would check if the caller pid is still alive. I'll try it in my fork.


What my hope is that with the new implementation that you are working on @ruslandoga we are not going to timeout the connection at all, but instead keep it open until the request finishes.

That's my current approach. And additionally, new writes don't start until the previous one returns.

Although it looked like it timed out, instead it was still running in the background and causing compounding latency issues.

Actually, this leads me to wonder why there were compounding issues in the timed out query case since SQLite wouldn't allow parallel write transactions either and with busy_timeout pragma set (in exqlite master it's set to two seconds by default) it would "queue" them in the same way. Meaning the latencies would've had stayed bounded to two seconds for the writes until the long write query finished. I guess there were some retries involved which made the long query get re-queued.

@warmwaffles
Copy link
Member Author

  • ecto_sqlite3 starts two processes, a reader and a writer (for a readwrite db), with reader multiplexing select statements and waiting for transaction close on changes to db, and with writer just queueing writes and Repo.query calls

I'll try to finish it up by the end of the week and open new PRs.

This is intriguing to me. I thought it was required to implement db_connection, but looking at it now, it's not really necessary.

@ruslandoga
Copy link
Contributor

ruslandoga commented Oct 11, 2023

ecto_sqlite3 starts two processes, a reader and a writer (for a readwrite db), with reader multiplexing select statements and waiting for transaction close on changes to db, and with writer just queueing writes and Repo.query calls

This doesn't seem to improve much as I'm checking pragma data_version before each SELECT which slows down small queries ~x2. And without checking pragma data_version before each SELECT we run into a possibility of failing to read own writes. So it seems like multiplexing SELECT statements on the same connection won't work.

binch/fetch_all.exs
rows = fn count ->
  Enum.map(1..count, fn i -> [i, to_string(i), i / 1.0] end)
end

Benchee.run(
  %{
    "data_version + SQLite.fetch_all (prepared)" =>
      {fn {db, data_version_stmt, stmt} ->
         {:ok, _} = SQLite.fetch_all(db, data_version_stmt, 10)
         {:ok, _rows} = SQLite.fetch_all(db, stmt, 100)
       end,
       before_scenario: fn input ->
         {:ok, db} = SQLite.open(~c":memory:", 0x2)
         :ok = SQLite.execute(db, "create table demo(a integer, b text, c real) strict")
         :ok = SQLite.insert_all(db, "insert into demo(a, b, c) values (?, ?, ?)", input)
         {:ok, data_version_stmt} = SQLite.prepare(db, "pragma data_version")
         {:ok, stmt} = SQLite.prepare(db, "select * from demo")
         {db, data_version_stmt, stmt}
       end,
       after_scenario: fn {db, _data_version_stmt, _stmt} -> :ok = SQLite.close(db) end},
    "SQLite.fetch_all (prepared)" =>
      {fn {db, stmt} -> {:ok, _rows} = SQLite.fetch_all(db, stmt, 100) end,
       before_scenario: fn input ->
         {:ok, db} = SQLite.open(~c":memory:", 0x2)
         :ok = SQLite.execute(db, "create table demo(a integer, b text, c real) strict")
         :ok = SQLite.insert_all(db, "insert into demo(a, b, c) values (?, ?, ?)", input)
         {:ok, stmt} = SQLite.prepare(db, "select * from demo")
         {db, stmt}
       end,
       after_scenario: fn {db, _stmt} -> :ok = SQLite.close(db) end},
    "Exqlite.Sqlite3.fetch_all" =>
      {fn {db, stmt} -> {:ok, _rows} = Exqlite.Sqlite3.fetch_all(db, stmt, 100) end,
       before_scenario: fn input ->
         {:ok, db} = Exqlite.Sqlite3.open(":memory:")
         :ok = Exqlite.Sqlite3.execute(db, "create table demo(a integer, b text, c real) strict")

         sql = [
           "insert into demo(a, b, c) values ",
           "(?, ?, ?)"
           |> List.duplicate(length(input))
           |> Enum.intersperse(?,)
         ]

         {:ok, stmt} = Exqlite.Sqlite3.prepare(db, sql)
         :ok = Exqlite.Sqlite3.bind(db, stmt, List.flatten(input))
         :done = Exqlite.Sqlite3.step(db, stmt)

         {:ok, stmt} = Exqlite.Sqlite3.prepare(db, "select * from demo")

         {db, stmt}
       end,
       after_scenario: fn {db, _stmt} -> :ok = Exqlite.Sqlite3.close(db) end}
  },
  inputs: %{
    "1" => rows.(1),
    "5" => rows.(5),
    "10" => rows.(10),
    "50" => rows.(50)
    # "1000" => rows.(1000),
    # "10000" => rows.(10000)
  }
  # profile_after: true
)
benchmark results
Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 8 GB
Elixir 1.15.6
Erlang 26.1.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1, 10, 5, 50
Estimated total run time: 1.40 min

Benchmarking Exqlite.Sqlite3.fetch_all with input 1 ...
Benchmarking Exqlite.Sqlite3.fetch_all with input 10 ...
Benchmarking Exqlite.Sqlite3.fetch_all with input 5 ...
Benchmarking Exqlite.Sqlite3.fetch_all with input 50 ...
Benchmarking SQLite.fetch_all (prepared) with input 1 ...
Benchmarking SQLite.fetch_all (prepared) with input 10 ...
Benchmarking SQLite.fetch_all (prepared) with input 5 ...
Benchmarking SQLite.fetch_all (prepared) with input 50 ...
Benchmarking data_version + SQLite.fetch_all (prepared) with input 1 ...
Benchmarking data_version + SQLite.fetch_all (prepared) with input 10 ...
Benchmarking data_version + SQLite.fetch_all (prepared) with input 5 ...
Benchmarking data_version + SQLite.fetch_all (prepared) with input 50 ...

##### With input 1 #####
Name                                                 ips        average  deviation         median         99th %
SQLite.fetch_all (prepared)                     380.14 K        2.63 μs   ±550.88%        2.50 μs        2.88 μs
Exqlite.Sqlite3.fetch_all                       330.86 K        3.02 μs   ±492.09%        2.46 μs       10.33 μs
data_version + SQLite.fetch_all (prepared)      208.34 K        4.80 μs   ±303.00%        4.58 μs        5.33 μs

Comparison:
SQLite.fetch_all (prepared)                     380.14 K
Exqlite.Sqlite3.fetch_all                       330.86 K - 1.15x slower +0.39 μs
data_version + SQLite.fetch_all (prepared)      208.34 K - 1.82x slower +2.17 μs

##### With input 10 #####
Name                                                 ips        average  deviation         median         99th %
SQLite.fetch_all (prepared)                     138.75 K        7.21 μs   ±203.62%        6.33 μs       18.04 μs
Exqlite.Sqlite3.fetch_all                       136.85 K        7.31 μs   ±183.38%        6.50 μs       18.46 μs
data_version + SQLite.fetch_all (prepared)      105.23 K        9.50 μs   ±131.89%        8.46 μs       34.25 μs

Comparison:
SQLite.fetch_all (prepared)                     138.75 K
Exqlite.Sqlite3.fetch_all                       136.85 K - 1.01x slower +0.1000 μs
data_version + SQLite.fetch_all (prepared)      105.23 K - 1.32x slower +2.30 μs

##### With input 5 #####
Name                                                 ips        average  deviation         median         99th %
SQLite.fetch_all (prepared)                     217.70 K        4.59 μs   ±335.99%        4.17 μs        6.50 μs
Exqlite.Sqlite3.fetch_all                       217.10 K        4.61 μs   ±335.68%        4.21 μs        6.79 μs
data_version + SQLite.fetch_all (prepared)      147.42 K        6.78 μs   ±186.21%        6.29 μs       12.25 μs

Comparison:
SQLite.fetch_all (prepared)                     217.70 K
Exqlite.Sqlite3.fetch_all                       217.10 K - 1.00x slower +0.0127 μs
data_version + SQLite.fetch_all (prepared)      147.42 K - 1.48x slower +2.19 μs

##### With input 50 #####
Name                                                 ips        average  deviation         median         99th %
SQLite.fetch_all (prepared)                      34.16 K       29.28 μs    ±84.26%          24 μs      136.13 μs
Exqlite.Sqlite3.fetch_all                        33.99 K       29.42 μs    ±84.59%       24.21 μs      136.75 μs
data_version + SQLite.fetch_all (prepared)       31.10 K       32.15 μs    ±81.88%       26.29 μs      115.41 μs

Comparison:
SQLite.fetch_all (prepared)                      34.16 K
Exqlite.Sqlite3.fetch_all                        33.99 K - 1.00x slower +0.141 μs
data_version + SQLite.fetch_all (prepared)       31.10 K - 1.10x slower +2.88 μs

@josevalim
Copy link

And without checking pragma data_version before each SELECT we run into a possibility of failing to read own writes.

What do you mean, can you expand? I am not well versed on sqlite pragmas. :)

@ruslandoga
Copy link
Contributor

ruslandoga commented Oct 11, 2023

Since ongoing selects piggy back on the same read transaction we need to know when it's safe to issue new reads and when it's not. By safety here I mean reading own writes.If we write and then read from the read-only connection that has started the read transaction before our write, the data we wrote would be invisible to us. So to avoid that I'm checking data_version pragma which, if changed since the last invocation, means other connections wrote something into the database. If it has changed, I'm blocking/queuing checkouts until the current read transaction closes (i.e. until all statements finish). Since I'm checking that pragma before each select, it negatively affects read performance.

@josevalim
Copy link

For Ecto, I would say transactions always go to the write-based connection. Would that solve this problem?

@ruslandoga
Copy link
Contributor

ruslandoga commented Oct 11, 2023

Unfortunately this wouldn't help since selects open a read transaction implicitly if there isn't one already. That read transaction is closed when the select statement finishes stepping through rows. If another select is started before the previous one finishes (on the same connection), the transaction is extended.

@josevalim
Copy link

I see, thank you, very helpful. So I guess we have no other option than having a single process that we send everything through?

@warmwaffles
Copy link
Member Author

Or perhaps, we just live with the DirtyNIF for now. 😞

@josevalim
Copy link

I believe we most likely still need DirtyNIFs. It is more of a question of removing the DBConnection dependency. :)

@josevalim
Copy link

I am reading this once more and I am wondering, how we can improve and simplify things. We have three options:

  1. This library only exposes Dirty NIFs. It is up to users to wrap them in processes and guarantee whatever isolation model they care about. The reason why this makes sense is that, even if we provide safe guards here, someone can run two operating system processes and reproduce some of the isolation issues. As long as the NIFs are safe (no segfault, memory leak, etc), we should be good.

  2. This library provides a DBConnection wrapper, with serial reads and serial writes, and a single connection. Which is what happens today.

  3. This library provides a DBConnection wrapper, with a serial connection that allows concurrent reads and serial writes. We allow multiple reads to run at the same time, as soon as there is a write, we wait for all reads to finish, perform the write, and then start executing the reads again. This should be an improvement over the status quo today (and justifies keeping DBConnection).

Pretty much all other designs we tried were not good enough.

Have I summed up things properly? :)

@LostKobrakai
Copy link
Contributor

I don't think the assessment of what is the current state with point 2 is correct: What we currently have with ecto_sqlite3 is DBConnection with concurrent reads and writes. Concurrent writes block at the sqlite level though and are either successfuly serialized with sqlites busy mechanic or some writes run into busy timeouts and crash.

@josevalim
Copy link

Thank you @LostKobrakai. If that's the case, then we might as well have gone with Dirty NIFs all the way? What would DBConnection gives us if we allow both concurrent reads and writes?

@LostKobrakai
Copy link
Contributor

To my knowledge that was done to integrate with ecto (hence it being in ecto_sqlite3 and not this repo). Not sure if there would've been other / simpler options to make ecto work with exqlite.

@warmwaffles
Copy link
Member Author

warmwaffles commented Dec 20, 2023

The DbConnection implementation is done in this project. I intended this library to be as a mechanism by which to interact with sqlite directly. My intention was to keep the ecto implementation away from this repo so that in theory a different sqlite NIF library could replace this should it be necessary.

We could strip this implementation down even further and lift the DbConnection out and into the ecto_sqlite3 library.

The DirtyNIF mechanism works decently, until it doesn't. I usually run into issues where a query operation times out via ecto, but it is still running in the background consuming memory with no way to stop it without hard killing the erlang process.

@josevalim
Copy link

@warmwaffles I don't think you can replace dirty nifs though. That's the way to do C integration. Your issue with query operations may be addressed by the following:

  1. return a NIF resource (a reference) on every query from the NIF
  2. pass this NIF resource to read the query result
  3. set a destructor to the NIF resource that will cancel the query if it was not cancelled yet

WDYT? Could that work? We are solving similar issues in Apache ADBC.

I would probably still say then that the DBConnection abstraction could be moved to the ecto_sqlite3 repo, it is not really necessary for SQLite3 itself, and then it should be debated if ecto_sqlite3 itself needs DBConnection. :)

And in case I haven't seen it before, thank you for your work and everyone's feedback and inputs on the thread. ❤️

@warmwaffles
Copy link
Member Author

That may work. Handing back a handle to cancel a running query and doing that transparently.

I would probably still say then that the DBConnection abstraction could be moved to the ecto_sqlite3 repo, it is not really necessary for SQLite3 itself, and then it should be debated if ecto_sqlite3 itself needs DBConnection.

I agree with this. I'll start looking at moving it out of this repository soon. Probably over the holiday break I'll knock it out.

@josevalim
Copy link

Concurrent writes block at the sqlite level though and are either successfuly serialized with sqlites busy mechanic or some writes run into busy timeouts and crash.

Apparently sqlite does not automatically serializes it for you, you are respnsible to do it, which pretty much means a DBConnection with a pool of 1 is the way to go. So perhaps this can be closed? More info here: https://www.sqlite.org/rescode.html#busy

@warmwaffles
Copy link
Member Author

Yea sadly at the current state of things, I don't know if I can interrupt the current executing query to cancel it. I haven't had a whole lot of time to dedicate to exploring a way to timeout/cancel executions.

@LostKobrakai
Copy link
Contributor

Yea sadly at the current state of things, I don't know if I can interrupt the current executing query to cancel it.

https://www.sqlite.org/c3ref/interrupt.html

This seems to be an explicit feature of sqlite. Couldn't this be used?

@warmwaffles
Copy link
Member Author

Yes, that's the documentation I was looking at that spurred this ticket.

@dimitarvp
Copy link

So I have started and stopped writing this comment probably 17 times during the last two weeks. 😃 I hate talking before I have a 100% solution but this time around, and since I finally have inspiration and some time + energy to work on my Elixir<=>Rust SQLite library, and as it is progressing much faster and nicer than I thought (basic feature completeness achieved a few days ago without stuff like sessions, backups, UDFs and the like that I don't find critical for v1), I figured it's time to try and address everything in this thread -- which took me 4 full reads and taking notes like a school student; hey, I never claimed I was smart.

Let me know if I missed something important. Please!

--- Single library vs. two or more ---

I agree with Jose that trying to put everything in a single library presents architectural challenges (too long to outline in this very likely overloaded comment IMO, but ask if you are curious). I already have xqlite working quite fine and it covers the "raw" NIF access to SQLite with some very nice guard rails (never aborts / panics) -- though test coverage needs little more work, which I'll put in VerySoon™. Next thing planned is an xqlite_ecto3 library which should be done in several weeks... right after I add a few more features that are extremely important for Ecto / DBConnection integration. Would love critical reviews of it, with me naively assuming you guys will have the time and desire to do so.

--- Read / write serialization ---

I grumpily acknowledge the problems with read and write serialization, with or without WAL, and how that can be a major pain for the users of the library. I thought a lot about it and can't find a perfect solution. Mine follows:

  • xqlite_ecto3 will integrate closely with DBConnection and pool_size will definitely be more than one (unless the use Ecto.Repo, ... block specifies 1, of course).
  • WAL / WAL2 mode is hugely preferred and I'll make that blindingly obvious in the README file(s).
  • Ditto for using PRAGMA busy_timeout = 5000; (configurable). I am still mulling over it but I am fairly convinced that I should just transparently put that in place in the Ecto / DBConnection integration layer (i.e. the future xqlite_ecto3 lib). Without it almost everything in this list becomes rather moot. Is my thinking correct? 🤔
  • My NIF library already specifically catches the busy or locked DB cases and returns a proper error tuple (with atoms) that you can easily pattern-match on if you want to do f.ex. several recursive retries (or you can crash 🤷). My main priority is to give you the choice. Still not sure how would I expose that choice inside an integration layer library and then to the end-user of the library... any ideas? I kind of got stuck here as I really don't want to use Config in a library. Might even ask an LLM and laugh at its expense on this topic. I need some reassurance that there's something out there that's stupider than me.

And just to make this crystal clear: my Elixir + Rust code already took care of synchronization. No separate Elixir processes will access the connection handle at the same time -- I fought very hard to avoid a Mutex but was ultimately unable to do so because TL;DR the BEAM VM can move an Elixir process to another OS thread at any yield point, and the Rust compiler does not take kindly to that at all.

The Mutex lock is per-handle, of course. Which implies SQLite must always be open in "no mutex" mode. I'll make sure this is spelled out in the Rust code, even though me using rusqlite's defaults already does it, it's just implicit.

So yeah, my code will not do database-level locking, ever. I let SQLite do it via its innate one-writer-at-a-time limitation. We handle it via timeouts and cancellations.

--- Zombie queries / executes ---

I have experimented with cancellation via using SQLite's set_progress_handler mechanism. I have, rather non-creatively, called it CancelToken (it can be created and cancelled and unregistered). It works very well already! (Not pushed to the repo yet.) I spent an entire afternoon wondering about the perfect value to pass to SQLite (== number of SQLite VM instructions that have to be executed before the progress handler is called [again]) and, some mailings list threads later, I settled at 1000. Which roughly means "on CPU-intensive queries / executes you might get 50 progress handler calls during a single millisecond, yet you might only get called once every 20 milliseconds on disk-heavy queries / executes". Far from what we would call ideal but... researched as much as I could before burning out and for now it's going to be like this.

Crucially, and thanks to Rust's RAII mechanisms, I now have draft code that automatically unregisters an SQLite progress handler either on success or error (tested). Unless I am thinking about this wrong, panics are not interesting to handle because most of any state you would have would be at this point either corrupted or about-to-be-corrupted.

To clarify: I am combining SQLite's set_progress_handler with Rust's atomic boolean facilities and have the progress handler check for its value becoming true (and that will be done via a NIF function i.e. cancel(token)); if that happens then sqlite3_interrupt is called. As far as I am aware, that's the way to do it and I've seen people report success doing it similarly.

EDGE CASE: timeout for a query / execute expires but the NIF operation finishes before SQLite calls the progress handler and sqlite3_interrupt right after. Gosh, I agonized hours on this but I can't find a way out. If it happens, I suggest we just trust the BEAM VM facilities (i.e. if you use Task.await with a timeout) and if a timeout is hit then a timeout is hit and that's that; silently discard that now-outdated success result. Still, this scenario should be extremely rare so this tradeoff should be fine, I believe. Thoughts?

--- Streaming ---

I would have forgotten about this if I haven't read this thread several times. But yes, streaming is absolutely getting added in the next few days. It's laborious and annoying to add (a lot of Rust boilerplate) but I love the idea and I came to Elixir for stuff like this in the first place i.e. to have lag-less and smooth operation with oft yields. So I feel quite strongly about this and it is getting done (basically through prepared statements lifecycle management). Very soon.

--- End ---

Again, please let me know if I neglected a worry expressed here. I really want to keep pushing my work further and provide a full-blown SQLite library and as perfect an Ecto / DBConnection integration as at all permitted by both SQLite and the BEAM VM. I want to get this done and for it to be done as well as my limited ability allows me.

@warmwaffles Hope you don't find me talking about a competing library in your own library's issue tracker rude. Just wanted to go where the really important discussion is. I took some lessons from your code, by the way. Nice work.

And cheers all. Loved the discussion. Opened my eyes about a few things.

@josevalim / @LostKobrakai / @ruslandoga

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants