-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-42213: Remove redundant cyclic GC hack in sqlite3 #26462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pablo/Victor: Whenever you have time 🙏🏻 |
I'm not sure if this should be backported to 3.10. It's just a cleanup, so I guess it's fine to backport it, but on the other hand there is no harm in not backporting it. I have no strong preference here. |
The Windows failures looks very similar to the GC issues in #24203. https://github.com/python/cpython/pull/26462/checks?check_run_id=2709446243 |
Adding 209196eaaa44d02551dad00516ff32fb447c1d8f fixes the test issues on Windows. @vstinner, what do you make of this? |
Lib/sqlite3/test/dbapi.py
Outdated
@@ -190,6 +193,8 @@ def test_open_uri(self): | |||
with sqlite.connect('file:' + TESTFN + '?mode=ro', uri=True) as cx: | |||
with self.assertRaises(sqlite.OperationalError): | |||
cx.execute('insert into test(id) values(1)') | |||
cx.close() | |||
gc_collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the falls to GC collect here? These should not be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know; it's the same issue as in #24203. I'll try my best to reproduce this on my Mac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, discovered #26475 while debugging this, but it does unfortunately not fix this issue.
The sqlite3 module now fully implements the GC protocol, so there's no need for this workaround anymore.
209196e
to
70511f3
Compare
FYI, rebased onto |
Did you run |
Windows (x64) CI job failed:
|
Ah, there is also a warning on Windows (same CI job):
|
Yes, I did :) |
Yes, this is the reason for the draft/WIP status. On Windows, the GC uses a lot more time to break the ref. cycle between the statement cache and the connection. From what I can see, it seems that it just loops longer; there's a load of traverse calls before the cycle is broken. On Mac and Linux, the cycle is broken much faster. This results in the objects living slightly longer on Windows, so sqlite3 is clinging on to |
In general, I dislike relying on implicit resource management. I would prefer to emit a ResourceWarning if a resource is not released explicitly (by calling a close() method for example). Maybe some tests should call the close() method explicitly? |
That's the workaround I've added in #24203: see d2078bc. It's exactly the same problem: The LRU statement cache ( Previously (or should I say currently), |
Keeping Python objects in memory is fine. What is not fine is to hold an OS release, like a file. Do you mean that calling close() does not close the file on disk? |
It does close it, but not until GC is done. |
|
The connection object is decref'ed after
My computer time today is very fragmented, so my replies will be short and hopefully not too messy :) |
Well, again, IMO sqlite3 must behave as the io module: emit a ResourceWarning if a connection is not closed explicitly. All tests must explicitly close the connection. Closing a file must not depend if a sqlite object is part of a reference cycle or not. There are too many ways to create reference cycles in Python. |
In that case, I believe explicitly closing the connection and explicitly triggering gc.collect is the best thing to do here. |
NOBODY EXPECTS THE SPANISH INQUISITION (I'm Spanish so I can say that :) ) I think there is some confusion between @vstinner and @erlend-aasland. Let me try to clarify what are the expectations so is easier to reconduct the discussion:
One of the things that @vstinner is saying (and I very much agree with) is that we should never depend on calling Furthermore, @vstinner thinks that this implicit resource management (closing on destruction) is so bad that a warning should be emmited if we ever reach that code with the connection open, and all users should explicitly call In short:
|
I understand that it is possible that the database file remains open after sqlite3_close_v2() is called, if some other sqlite objects still exist. Oh, this is counter intuitive. How can a programmer know if a database file is closed or not? Try to delete it on Windows? :-) Would it help to emit a warning or even an hard exception if this case happen? |
Not the Spanish inquisition!!! 😂 Thanks, @pablogsal! :) I believe we're getting closer to understanding the various issues. Victor:
Yes, it is counter-intuitive, and no, there's no API for detecting if a database file is closed (AFAIK). |
I understand that the close() method must destroy/release/close all sqlite objects before calling sqlite3_close_v2(). Said differently, would it be possible to implement a public or private close() method on each Python sqlite object which somehow prevents to close a database (sqlite3_close_v2), to release the resources, without having to destroy the Python object? See my file object: even if the Python object is not destroyed, the inner resource is released. It requires all file methods to raise an exception if the file is closed. |
Anyway, I must take a break now; back at the computer in 5/6 hours :) |
The Python project is developed asynchronously. You don't have to reply in less than 1 hour. It's perfectly fine to take 1 week or even 1 month to reply to a review. Your message sends notifications which are counter-productive. I go to the PR to see the comment... to say that you will reply later, ah. On IRC, I only get a notification that you wrote a comment, but I don't get the content. |
- add wrapper for sqlite3_close_v2() - explicitly free pending statements before close()
Looks like I got it right. PTAL, @vstinner & @pablogsal. Thanks for your patience and guidance. |
* up cursors, as they may have strong refs to statements. */ | ||
Py_CLEAR(self->statement_cache); | ||
Py_CLEAR(self->statements); | ||
Py_CLEAR(self->cursors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you use the connection after you call close()
? Are we checking for NULL for these veriables everywhere else? (Cannot check myself as I'm on my phone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll get a sqlite3.ProgrammingError
. There's a lot of sanity checks in the code for this. The sqlite3 test suite runs fine, but of course, we do not have 100% code coverage yet. (Digression: see issue 43553 for improving sqlite3
code coverage.)
After close, if you try to use a cursor, fetch from a pending statement, or manipulate the connection, you'll get a sqlite3.ProgrammingError
:
>>> cx = sqlite3.connect(":memory:")
>>> cu = cx.cursor()
>>> res = cu.execute("select 1")
>>> cx.close()
>>> res.fetchall()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Cannot operate on a closed database.
>>> cu.execute("select 1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Cannot operate on a closed database.
>>> cx.create_function("test", 1, lambda x: x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Cannot operate on a closed database.
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more tests wrt. this: 1b23df1
Also, there's some regression tests that exercise operations on closed connections (and closed cursors).
Let me know if you want me to add more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a strange observation:
I did a ref leak test on Windows yesterday, and it segfaulted on test_bpo31770
. After some testing, I reverted the change highlighted in this conversation; I removed the three Py_CLEAR
, added pysqlite_do_all_statements(self, ACTION_FINALIZE, 1)
back, but kept connection_close()
. That fixed the ref leak test on Windows, and the test suite still worked; it did not "leak" open test database files. So, for the new PR, I'll keep this modification. We'll see how the CI fares :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc. @vstinner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI seems to be fine with this: erlend-aasland#9
The following methods check indirectly if the connection is closed. *_sqlite3.Connection.execute(): call self.cursor().execute()
Other methods call pysqlite_check_connection(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum. Can you try to split your PR in two parts? First PR to rewrite the C code, second PR to rewrite the tests.
Fixed in 7d8014a.
Yes, I could do that, but that would make the CI fail on the first PR. |
...and sort imports
Hum, I am not strictly thinking about C vs Python, but more something like that: First PR:
Second PR:
|
This PR is no longer only about "Remove redundant cyclic GC hack in sqlite3" and "The sqlite3 module now fully implements the GC protocol; there's no need for this workaround anymore." It's now way more than that. |
True. Can I re-use the same issue number for both PRs? |
Yes. It's a good practice to put related changes in the same bpo. |
The sqlite3 module now fully implements the GC protocol; there's no
need for this workaround anymore.
https://bugs.python.org/issue42213