Skip to content

bpo-42213: Remove redundant cyclic GC hack in sqlite3 #26462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented May 31, 2021

The sqlite3 module now fully implements the GC protocol; there's no
need for this workaround anymore.

https://bugs.python.org/issue42213

@erlend-aasland
Copy link
Contributor Author

Pablo/Victor: Whenever you have time 🙏🏻

@erlend-aasland
Copy link
Contributor Author

I'm not sure if this should be backported to 3.10. It's just a cleanup, so I guess it's fine to backport it, but on the other hand there is no harm in not backporting it. I have no strong preference here.

@erlend-aasland erlend-aasland marked this pull request as draft May 31, 2021 09:54
@erlend-aasland erlend-aasland changed the title bpo-42213: Remove redundant cyclic GC hack in sqlite3 [WIP] bpo-42213: Remove redundant cyclic GC hack in sqlite3 May 31, 2021
@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented May 31, 2021

The Windows failures looks very similar to the GC issues in #24203.
The macOS failure is a spurious ssl failure.

https://github.com/python/cpython/pull/26462/checks?check_run_id=2709446243

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented May 31, 2021

Adding 209196eaaa44d02551dad00516ff32fb447c1d8f fixes the test issues on Windows.

@vstinner, what do you make of this?

@@ -190,6 +193,8 @@ def test_open_uri(self):
with sqlite.connect('file:' + TESTFN + '?mode=ro', uri=True) as cx:
with self.assertRaises(sqlite.OperationalError):
cx.execute('insert into test(id) values(1)')
cx.close()
gc_collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the falls to GC collect here? These should not be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know; it's the same issue as in #24203. I'll try my best to reproduce this on my Mac.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, discovered #26475 while debugging this, but it does unfortunately not fix this issue.

The sqlite3 module now fully implements the GC protocol, so there's no
need for this workaround anymore.
@erlend-aasland erlend-aasland force-pushed the sqlite-remove-gc-hack branch from 209196e to 70511f3 Compare June 1, 2021 10:55
@erlend-aasland
Copy link
Contributor Author

FYI, rebased onto main.

@vstinner
Copy link
Member

vstinner commented Jun 1, 2021

Did you run ./python -m test test_sqlite -R 3:3 to check for reference leaks?

@vstinner
Copy link
Member

vstinner commented Jun 1, 2021

Windows (x64) CI job failed:


2021-06-01T11:02:07.4501179Z ======================================================================
2021-06-01T11:02:07.4503272Z ERROR: test_open_uri (sqlite3.test.dbapi.ConnectionTests)
2021-06-01T11:02:07.4581189Z ----------------------------------------------------------------------
2021-06-01T11:02:07.4598093Z Traceback (most recent call last):
2021-06-01T11:02:07.4600441Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 238, in unlink
2021-06-01T11:02:07.4633846Z     _unlink(filename)
2021-06-01T11:02:07.4636229Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 278, in _unlink
2021-06-01T11:02:07.4637841Z     _waitfor(os.unlink, filename)
2021-06-01T11:02:07.4639071Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 246, in _waitfor
2021-06-01T11:02:07.4640337Z     func(pathname)
2021-06-01T11:02:07.4642275Z PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '@test_1508_tmp�'
2021-06-01T11:02:07.4644053Z 
2021-06-01T11:02:07.4645203Z ======================================================================
2021-06-01T11:02:07.4646483Z ERROR: test_open_with_path_like_object (sqlite3.test.dbapi.ConnectionTests)
2021-06-01T11:02:07.4648280Z Checks that we can successfully connect to a database using an object that
2021-06-01T11:02:07.4650124Z ----------------------------------------------------------------------
2021-06-01T11:02:07.4652596Z Traceback (most recent call last):
2021-06-01T11:02:07.4655019Z   File "D:\a\cpython\cpython\lib\sqlite3\test\dbapi.py", line 182, in test_open_with_path_like_object
2021-06-01T11:02:07.4670040Z     cx.execute('create table test(id integer)')
2021-06-01T11:02:07.4672512Z sqlite3.OperationalError: table test already exists
2021-06-01T11:02:07.4688676Z 
2021-06-01T11:02:07.4708289Z ======================================================================
2021-06-01T11:02:07.4724689Z ERROR: test_open_with_path_like_object (sqlite3.test.dbapi.ConnectionTests)
2021-06-01T11:02:07.4728000Z Checks that we can successfully connect to a database using an object that
2021-06-01T11:02:07.4811284Z ----------------------------------------------------------------------
2021-06-01T11:02:07.4831832Z Traceback (most recent call last):
2021-06-01T11:02:07.4834116Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 238, in unlink
2021-06-01T11:02:07.4864854Z     _unlink(filename)
2021-06-01T11:02:07.4867085Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 278, in _unlink
2021-06-01T11:02:07.4868728Z     _waitfor(os.unlink, filename)
2021-06-01T11:02:07.4870312Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 246, in _waitfor
2021-06-01T11:02:07.4871577Z     func(pathname)
2021-06-01T11:02:07.4873516Z PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '@test_1508_tmp�'
2021-06-01T11:02:07.4875042Z 
2021-06-01T11:02:07.4876184Z ======================================================================
2021-06-01T11:02:07.4877721Z ERROR: test_trace_callback_content (sqlite3.test.hooks.TraceCallbackTests)
2021-06-01T11:02:07.4879746Z ----------------------------------------------------------------------
2021-06-01T11:02:07.4881969Z Traceback (most recent call last):
2021-06-01T11:02:07.4884465Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 238, in unlink
2021-06-01T11:02:07.4898652Z     _unlink(filename)
2021-06-01T11:02:07.4900840Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 278, in _unlink
2021-06-01T11:02:07.4931605Z     _waitfor(os.unlink, filename)
2021-06-01T11:02:07.4947677Z   File "D:\a\cpython\cpython\lib\test\support\os_helper.py", line 246, in _waitfor
2021-06-01T11:02:07.4949998Z     func(pathname)
2021-06-01T11:02:07.5029259Z PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '@test_1508_tmp�'
2021-06-01T11:02:07.5046117Z 
2021-06-01T11:02:07.5047969Z ----------------------------------------------------------------------

@vstinner
Copy link
Member

vstinner commented Jun 1, 2021

Ah, there is also a warning on Windows (same CI job):

2021-06-01T11:11:08.6974894Z Warning -- files was modified by test_sqlite
2021-06-01T11:11:08.6975521Z   Before: []
2021-06-01T11:11:08.6977644Z   After:  ['@test_1748_tmp�'] 

@erlend-aasland
Copy link
Contributor Author

Did you run ./python -m test test_sqlite -R 3:3 to check for reference leaks?

Yes, I did :)

@erlend-aasland
Copy link
Contributor Author

Windows (x64) CI job failed:

Yes, this is the reason for the draft/WIP status. On Windows, the GC uses a lot more time to break the ref. cycle between the statement cache and the connection. From what I can see, it seems that it just loops longer; there's a load of traverse calls before the cycle is broken. On Mac and Linux, the cycle is broken much faster. This results in the objects living slightly longer on Windows, so sqlite3 is clinging on to TESTFN, preventing unlink from removing it (as seen in the first traceback). Calling unlink repeatedly until it succeeds works, because eventually GC manages to clean up, and sqlite3 releases the file. It seems that the GC just behaves differently on Windows.

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

In general, I dislike relying on implicit resource management. I would prefer to emit a ResourceWarning if a resource is not released explicitly (by calling a close() method for example). Maybe some tests should call the close() method explicitly?

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented Jun 2, 2021

Maybe some tests should call the close() method explicitly?

That's the workaround I've added in #24203: see d2078bc.

It's exactly the same problem:

The LRU statement cache (pysqlite_Connection.statement_cache) has a "back-reference" to sqlite3.Connection (pysqlite_Cache.factory), because statement creation happens in sqlite3.Connection.__call__. I find that design strange and unfortunate.

Previously (or should I say currently), sqlite3.Connection.__init__ decrefs self after creating the LRU cache, and then tells sqlite3.Cache to not decref the statement factory (pysqlite_Cache.factory) when the cache is deallocated. In practice the cache is using a borrowed ref. to the connection.

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

Keeping Python objects in memory is fine. What is not fine is to hold an OS release, like a file. Do you mean that calling close() does not close the file on disk?

@erlend-aasland
Copy link
Contributor Author

Keeping Python objects in memory is fine. What is not fine is to hold an OS release, like a file. Do you mean that calling close() does not close the file on disk?

It does close it, but not until GC is done.

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

with sqlite.connect(path) as cx: doesn't close the connection at the end of the with block? Is it a deliberate behavior?

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented Jun 2, 2021

with sqlite.connect(path) as cx: doesn't close the connection at the end of the with block? Is it a deliberate behavior?

__exit__ does not close the connection, it only tries to commit or rollback what happened inside it:

  • https://docs.python.org/3/library/sqlite3.html#using-the-connection-as-a-context-manager
  • /*[clinic input]
    _sqlite3.Connection.__exit__ as pysqlite_connection_exit
    type as exc_type: object
    value as exc_value: object
    traceback as exc_tb: object
    /
    Called when the connection is used as a context manager.
    If there was any exception, a rollback takes place; otherwise we commit.
    [clinic start generated code]*/
    static PyObject *
    pysqlite_connection_exit_impl(pysqlite_Connection *self, PyObject *exc_type,
    PyObject *exc_value, PyObject *exc_tb)
    /*[clinic end generated code: output=0705200e9321202a input=bd66f1532c9c54a7]*/
    {
    const char* method_name;
    PyObject* result;
    if (exc_type == Py_None && exc_value == Py_None && exc_tb == Py_None) {
    method_name = "commit";
    } else {
    method_name = "rollback";
    }
    result = PyObject_CallMethod((PyObject*)self, method_name, NULL);
    if (!result) {
    return NULL;
    }
    Py_DECREF(result);
    Py_RETURN_FALSE;
    }

The connection object is decref'ed after __exit__ (AFAICS), which closes the conneciton. But, arriving at connection dealloc takes a lot of time because of the garbage collector using a lot of time to break the ref. cycle. Maybe I'm terrible at explaining this, but what's happening (as I see it) is:

  1. __enter__
  2. do stuff
  3. __exit__ => triggers connection clean up
  4. traverse loop (ref. cycle)
  5. unlink => fails, because GC is still traversing
  6. traverse done => connection dealloc/clean => sqlite3 finally releases TESTFN, but it's too late

My computer time today is very fragmented, so my replies will be short and hopefully not too messy :)

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

exit does not close the connection, it only tries to commit or rollback what happened inside it

Well, again, IMO sqlite3 must behave as the io module: emit a ResourceWarning if a connection is not closed explicitly.

All tests must explicitly close the connection.

Closing a file must not depend if a sqlite object is part of a reference cycle or not. There are too many ways to create reference cycles in Python.

@erlend-aasland
Copy link
Contributor Author

Well, again, IMO sqlite3 must behave as the io module: emit a ResourceWarning if a connection is not closed explicitly.

All tests must explicitly close the connection.

Closing a file must not depend if a sqlite object is part of a reference cycle or not. There are too many ways to create reference cycles in Python.

In that case, I believe explicitly closing the connection and explicitly triggering gc.collect is the best thing to do here.

@pablogsal
Copy link
Member

pablogsal commented Jun 2, 2021

NOBODY EXPECTS THE SPANISH INQUISITION (I'm Spanish so I can say that :) )

Spanish_Inquisition

I think there is some confusion between @vstinner and @erlend-aasland. Let me try to clarify what are the expectations so is easier to reconduct the discussion:

  • It seems that the object calls close on destruction.
  • Being in a gc cycle prevents destruction.
  • The context manager doesn't close the connection.

One of the things that @vstinner is saying (and I very much agree with) is that we should never depend on calling gc.collect() to clean resources in the test suite. Is very unreliable and is almost always hiding bugs. For instance, in this case as the connexion is closed on destruction, is preferable to schedule a cleanup method (calling close() on the connection) but not relying on it being closed on destruction, as that is implicit resource management, which is bad.

Furthermore, @vstinner thinks that this implicit resource management (closing on destruction) is so bad that a warning should be emmited if we ever reach that code with the connection open, and all users should explicitly call close() or alternative we should change the context manager to close the connection, which is probably backwards compact.

In short:

  • Don't use gc.collect() in the test suite to do cleanups.
  • We should think how to ensure the connection is closed using explicit resource management.
  • We should understand:
    • Why there is a cycle in the first place
    • Why the test is hanging (this should not happen).

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

I understand that it is possible that the database file remains open after sqlite3_close_v2() is called, if some other sqlite objects still exist. Oh, this is counter intuitive. How can a programmer know if a database file is closed or not? Try to delete it on Windows? :-)

Would it help to emit a warning or even an hard exception if this case happen?

@erlend-aasland
Copy link
Contributor Author

Not the Spanish inquisition!!! 😂 Thanks, @pablogsal! :) I believe we're getting closer to understanding the various issues.

Victor:

Oh, this is counter intuitive. How can a programmer know if a database file is closed or not? Try to delete it on Windows? :-)

Yes, it is counter-intuitive, and no, there's no API for detecting if a database file is closed (AFAIK). fstat? :)

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

I understand that the close() method must destroy/release/close all sqlite objects before calling sqlite3_close_v2().

Said differently, would it be possible to implement a public or private close() method on each Python sqlite object which somehow prevents to close a database (sqlite3_close_v2), to release the resources, without having to destroy the Python object?

See my file object: even if the Python object is not destroyed, the inner resource is released. It requires all file methods to raise an exception if the file is closed.

@erlend-aasland
Copy link
Contributor Author

Anyway, I must take a break now; back at the computer in 5/6 hours :)

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

Anyway, I must take a break now; back at the computer in 5/6 hours :)

The Python project is developed asynchronously. You don't have to reply in less than 1 hour. It's perfectly fine to take 1 week or even 1 month to reply to a review. Your message sends notifications which are counter-productive. I go to the PR to see the comment... to say that you will reply later, ah. On IRC, I only get a notification that you wrote a comment, but I don't get the content.

Erlend E. Aasland added 2 commits June 2, 2021 20:32
- add wrapper for sqlite3_close_v2()
- explicitly free pending statements before close()
@erlend-aasland erlend-aasland marked this pull request as ready for review June 2, 2021 19:02
@erlend-aasland
Copy link
Contributor Author

Looks like I got it right. PTAL, @vstinner & @pablogsal. Thanks for your patience and guidance.

@erlend-aasland erlend-aasland changed the title [WIP] bpo-42213: Remove redundant cyclic GC hack in sqlite3 bpo-42213: Remove redundant cyclic GC hack in sqlite3 Jun 2, 2021
* up cursors, as they may have strong refs to statements. */
Py_CLEAR(self->statement_cache);
Py_CLEAR(self->statements);
Py_CLEAR(self->cursors);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you use the connection after you call close()? Are we checking for NULL for these veriables everywhere else? (Cannot check myself as I'm on my phone)

Copy link
Contributor Author

@erlend-aasland erlend-aasland Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll get a sqlite3.ProgrammingError. There's a lot of sanity checks in the code for this. The sqlite3 test suite runs fine, but of course, we do not have 100% code coverage yet. (Digression: see issue 43553 for improving sqlite3 code coverage.)

After close, if you try to use a cursor, fetch from a pending statement, or manipulate the connection, you'll get a sqlite3.ProgrammingError:

>>> cx = sqlite3.connect(":memory:")
>>> cu = cx.cursor()
>>> res = cu.execute("select 1")
>>> cx.close()
>>> res.fetchall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Cannot operate on a closed database.
>>> cu.execute("select 1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Cannot operate on a closed database.
>>> cx.create_function("test", 1, lambda x: x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Cannot operate on a closed database.
>>> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more tests wrt. this: 1b23df1
Also, there's some regression tests that exercise operations on closed connections (and closed cursors).

Let me know if you want me to add more tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a strange observation:
I did a ref leak test on Windows yesterday, and it segfaulted on test_bpo31770. After some testing, I reverted the change highlighted in this conversation; I removed the three Py_CLEAR, added pysqlite_do_all_statements(self, ACTION_FINALIZE, 1) back, but kept connection_close(). That fixed the ref leak test on Windows, and the test suite still worked; it did not "leak" open test database files. So, for the new PR, I'll keep this modification. We'll see how the CI fares :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc. @vstinner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI seems to be fine with this: erlend-aasland#9

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

_sqlite3.Connection.__enter__() doesn't call pysqlite_check_connection() (check if the DB is open). IMO it should.

The following methods check indirectly if the connection is closed.

*_sqlite3.Connection.execute(): call self.cursor().execute()

  • _sqlite3.Connection.executemany(): call self.cursor().executemany()
  • _sqlite3.Connection.executescript(): call self.cursor().executescript()
  • _sqlite3.Connection.__exit__(): call commit() or rollback()

Other methods call pysqlite_check_connection().

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. Can you try to split your PR in two parts? First PR to rewrite the C code, second PR to rewrite the tests.

@erlend-aasland
Copy link
Contributor Author

_sqlite3.Connection.__enter__() doesn't call pysqlite_check_connection() (check if the DB is open). IMO it should.

Fixed in 7d8014a.

Hum. Can you try to split your PR in two parts? First PR to rewrite the C code, second PR to rewrite the tests.

Yes, I could do that, but that would make the CI fail on the first PR.

...and sort imports
@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

Yes, I could do that, but that would make the CI fail on the first PR.

Hum, I am not strictly thinking about C vs Python, but more something like that:

First PR:

  • Add Py_CLEAR in pysqlite_connection_close_impl
  • Add connection_close()
  • pysqlite_connection_enter_impl() change
  • unit tests for closed connections/cursors

Second PR:

  • Remove decref_factory
  • Remove gc.collect() calls
  • Add managed_connect() in test

@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

This PR is no longer only about "Remove redundant cyclic GC hack in sqlite3" and "The sqlite3 module now fully implements the GC protocol; there's no need for this workaround anymore." It's now way more than that.

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented Jun 2, 2021

True. Can I re-use the same issue number for both PRs?

@erlend-aasland erlend-aasland deleted the sqlite-remove-gc-hack branch June 2, 2021 22:33
@vstinner
Copy link
Member

vstinner commented Jun 2, 2021

Can I use the same issue number?

Yes. It's a good practice to put related changes in the same bpo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants