-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-108590: Fix sqlite3 iterdump() for table columns containing invalid Unicode sequences #108683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-108590: Fix sqlite3 iterdump() for table columns containing invalid Unicode sequences #108683
Conversation
…e sequences This also reverts 400a1ce.
cc. @CorvinM |
Sorry 'bout the unneeded churn, Serhiy; I should have caught this earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still not good.
It modifies the input connection object, is not thread-safe, and even if iterdump()
will no longer fail, you need a way to feed its result back to SQLite without errors. There will also be new encoding errors in attempt to save the dump in file or print to stdout.
For now, it may be easier to revert the previous changes and start to work on proper fix from clear page. |
Good point; Corvin added the context manager for this; let's throw that back in.
The underlying calls should take care of that; you cannot share a |
I agree; I'm reverting the previous change in |
There is no encoding errors when |
Does the recreation from dump work here? I was encountering encoding errors when attempting it with surrogateescape but not with the old chr() for some reason. def test_dump_recreation(self):
self.cu.executescript("""
CREATE TABLE foo (id INTEGER, text TEXT, blob BLOB);
INSERT INTO foo VALUES (0, CAST(X'619f' AS TEXT), X'619f');
INSERT INTO foo VALUES (1, 'Hello SQLite!', X'98194eff46ab29f79064');
""")
original_dump = list(self.cx.iterdump())
with memory_database() as cx2:
query = "".join(original_dump)
cx2.executescript(query)
recreation_dump = list(cx2.iterdump())
self.assertEqual(original_dump, recreation_dump) |
Well, using the So the fix is valid, technically. But from a user perspective; requiring the user to pass |
Yes, I can recreate it with this: $ sqlite3 test.db
SQLite version 3.39.5 2022-10-14 20:58:05
Enter ".help" for usage hints.
sqlite> CREATE TABLE t(t TEXT);
sqlite> INSERT INTO t VALUES(CAST(X'619f' AS TEXT));
sqlite> ^D
$ cat > test.py
import sqlite3
cx = sqlite3.connect("test.db")
for line in cx.iterdump():
print(line)
cx.close()
$ ./python.exe test.py > dump1.sql
$ sqlite3 test.db .dump > dump2.sql
$ hexdump -C dump1.sql
00000000 42 45 47 49 4e 20 54 52 41 4e 53 41 43 54 49 4f |BEGIN TRANSACTIO|
00000010 4e 3b 0a 43 52 45 41 54 45 20 54 41 42 4c 45 20 |N;.CREATE TABLE |
00000020 74 28 74 20 54 45 58 54 29 3b 0a 49 4e 53 45 52 |t(t TEXT);.INSER|
00000030 54 20 49 4e 54 4f 20 22 74 22 20 56 41 4c 55 45 |T INTO "t" VALUE|
00000040 53 28 27 61 9f 27 29 3b 0a 43 4f 4d 4d 49 54 3b |S('a.');.COMMIT;|
00000050 0a |.|
00000051
$ hexdump -C dump2.sql
00000000 50 52 41 47 4d 41 20 66 6f 72 65 69 67 6e 5f 6b |PRAGMA foreign_k|
00000010 65 79 73 3d 4f 46 46 3b 0a 42 45 47 49 4e 20 54 |eys=OFF;.BEGIN T|
00000020 52 41 4e 53 41 43 54 49 4f 4e 3b 0a 43 52 45 41 |RANSACTION;.CREA|
00000030 54 45 20 54 41 42 4c 45 20 74 28 74 20 54 45 58 |TE TABLE t(t TEX|
00000040 54 29 3b 0a 49 4e 53 45 52 54 20 49 4e 54 4f 20 |T);.INSERT INTO |
00000050 74 20 56 41 4c 55 45 53 28 27 61 9f 27 29 3b 0a |t VALUES('a.');.|
00000060 43 4f 4d 4d 49 54 3b 0a |COMMIT;.|
00000068 Apart from the preamble and table name quoting, they're equal. The VALUES string is |
|
Unless I did something wrong I think this PR will fail the test I posted but works for print() What do you think about breaking consistency with sqlite cli for 3.11, 3.12 and just dumping |
FTR, with #108695 and the recreation sequence outlined in #108683 (comment), I'm not able to recreate the database. The resulting VALUES string becomes: |
Both correct.
I'm not sure if it is worth it for this corner case. At the moment, I think this may be best solved in documentation. Let's think about it for some days; there is no hurry in getting this fixed. |
Thanks for helping investigate this, @CorvinM. |
This partially reverts 400a1ce, except for the test, which is amended.