-
Notifications
You must be signed in to change notification settings - Fork 52
Performance issues / possible memory leak #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In my experience and benchmarks, if you don't have a need for multiple nodes or long-running (write) transactions, SQLite3 will generally be faster than Postgres and the like, due to lack of client/server overhead. Most of the bad reputation is due to SQLite3 not having WAL on by default due to backward compat concerns. Thanks for this data. I have a feeling we are not cleaning up statement objects correctly in the |
It could be the prepared statements not having their destructors called when the reference count goes to zero. Very odd. I'll take a look later in the day but that's what I have a feeling is happening. @kevinlang it could also be that we aren't free'ing some binary data for sql statements / results. I haven't noticed any crazy amounts of memory climb in my apps, but if @rupurt is issuing a ton of updates and selects, that could be the culprit. EDIT: specifically the |
That's a good point. Most of the anecdotes I read about GC issues in BEAM usually come down to large binaries causing an issue. If I have time I'll try to look into that. @rupurt can you comment more on the "lock up" issues? I think that may be a separate issue. Is the VM itself locking up (due to memory issues) or just the database? Typically for the latter you should get an error message - what sort are you getting? The most common cause of lock up is usually due to upgrading a READ transaction to a WRITE transaction. https://sqlite.org/isolation.html |
Cool. Thanks gents for looking into it so quickly. Interesting to know about the performance. I'm only basing that on what I hear from other folks, so that encourages me to keep pushing forward with the current strategy :)
The "lock up" problem isn't a deadlock. I've run into deadlocks in the past but they were problems with my code. The "lock up" behavior I'm seeing now is not a total VM lock. It's a connection lock. I can remote attach to the instance and issue commands in IEx, but I can't query anything from the SQLite DB. Everything is working fine for ~30 mins or so, memory climbs linearly and then eventually the connection just locks up. I don't have any logs handy with the error message. But I think it eventually timed out after 15 seconds or so. FWIW I'm running this on a GCP |
@rupurt do you have a chart of the number of queries executed along with the memory usage? I'm just wanting to get a sense of the volume. |
@warmwaffles I don't. But that does sound handy so let me figure out if I can add one. I'll also post the logs for the timeout error on this next run. |
This is the error I get
|
I've been walking through the nif code and can't spot any egregious leaks. I wonder if the statements are never being free'd because something is still holding a reference to it. |
Shoot, sorry @warmwaffles. I ended up resolving this on my side. It was a bug in my code... :/ |
Haha, well @rupurt it actually sent me down a rabbit hole and I think this is more friendly now. Give the latest version a shot. |
@warmwaffles @kevinlang Now K8s shows that our pod uses all memory 5 GBs. Attached a screenshot (pink line) I’m thinking if it’s possible that we have memory leak in Sqlite? We’re using:
Repo Config:
When the Pod is out of memory (it's a limit 5 GBs) then I see in the log errors:
Thank you for your time. Code how it reads:
|
I don't know if kevin is going to be helping anymore, he's been pretty silent for the last few months. I am however going to take a look at this again. Side note, this is news to me.
Does that actually set the environment values during compilation? |
Thanks a lot. About DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT. It seems so. |
@rupurt do you recall how you resolved the memory leak issue on your side of things? @lauragrechenko Are you doing any other large-ish queries to sqlite3? I wonder if this is tied to the timeout issues that @LostKobrakai was experiencing a few weeks back. |
Each node has "audit.db". |
Just to clear up some potential confusion here. NIF memory usage is not tracked by the erlang vm unless specifically hooked into it. So it's expected to not see erlang report the memory usage of sqlite. This does however not mean that there's no memory leak. I've seem some stange behaviour this week around a large delete query causing reboots in our nerves system though, which I'm not yet sure about the root cause. |
Are those 100 entries being written in batches or sequentially? If they are happening sequentially, it may not be full filling all of the writes in time and slowly backing up. Do you know how big the database is on each node? |
For writing we use Ecto.Repo.insert_all/3. But I can stop all writing, it's only reading by this 1 process. And it's the same - memory grows. Now the DB is ~5.5 GB |
I need to look into using a custom memory allocator for sqlite to use erlang's machinery so we can get some better telemetry on it. And add telemetry to exqlite in general. |
@warmwaffles the memory leak from my initial post was 110% my own crappy application code :) I was cancelling and creating many orders. The orders that were cancelled were in their final resting state. So I just cleared out the callbacks for those orders as they should never get executed. |
Yes, it does. :) It's kind of hack-ish, but couldn't do it any different way. |
This is a decent option that I'll need to add to the documentation so others can utilize it if they want to enable / disable features when compiling sqlite. |
@lauragrechenko I haven't forgotten about this, I just did not have time this weekend to dig into it more. I'm going to build a benchmarking / load testing suite soon to try and pin point the issue. |
@warmwaffles I was about to write you. Maybe it'll help someone: So now instead of
and it seems working just fine :) |
Heh, I still need to build a good benchmark and test suite along with adding better telemetry for memory usage and what not. |
@lauragrechenko I've opened a PR here to utilize a custom memory allocator for sqlite. If you could give it a run in your environment for tests or something, feedback would be extremely welcome. |
@warmwaffles |
@lauragrechenko released v0.10.0 that has the custom allocator in place now and erlang vm memory usage will now include sqlite usage. |
@warmwaffles |
@warmwaffles |
The three modes allowed are `deferred`, `immediate`, and `exclusive`. By default the mode is set to `deferred`. For more information about why you would use `immediate` over `deferred`, take a look at the following articles: * https://fractaledmind.github.io/2024/04/15/sqlite-on-rails-the-how-and-why-of-optimal-performance/ * https://highperformancesqlite.com/watch/transaction-modes Closes: #153
Howdy 👋
I'm in the process of converting a market making trading system that creates many orders from ETS to Ecto. I plan on supporting any database that Ecto currently has support for, but I've started with SQLite because I want to be able to distribute it without any external dependencies.
SQLite is probably not the best choice for this kind of system due to the high number of writes. But I've pushed forward with the philosophy that if I can make it work somewhat performant in SQLite it should be great with other DB's.
Everything has gone pretty well so far, however now that I'm running it in production for long periods of time I'm noticing that there is severe performance degradation over time that causes a total lock in the SQLite DB. I've attached a graph below of my telemetry output. It shows:
I'm also attaching a graph of query times with the hope that they're helpful
The following SQL statements are executed regularly as the hot path
They're issued from this Elixir module https://github.com/fremantle-industries/tai/blob/orders-ecto-repo/apps/tai/lib/tai/new_orders/services/apply_order_transition.ex
The text was updated successfully, but these errors were encountered: