@@ -1123,6 +1123,10 @@ Connection objects
1123
1123
f.write('%s\n ' % line)
1124
1124
con.close()
1125
1125
1126
+ .. seealso ::
1127
+
1128
+ :ref: `sqlite3-howto-encoding `
1129
+
1126
1130
1127
1131
.. method :: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
1128
1132
@@ -1189,6 +1193,10 @@ Connection objects
1189
1193
1190
1194
.. versionadded :: 3.7
1191
1195
1196
+ .. seealso ::
1197
+
1198
+ :ref: `sqlite3-howto-encoding `
1199
+
1192
1200
.. method :: getlimit(category, /)
1193
1201
1194
1202
Get a connection runtime limit.
@@ -1410,39 +1418,8 @@ Connection objects
1410
1418
and returns a text representation of it.
1411
1419
The callable is invoked for SQLite values with the ``TEXT `` data type.
1412
1420
By default, this attribute is set to :class: `str `.
1413
- If you want to return ``bytes `` instead, set *text_factory * to ``bytes ``.
1414
1421
1415
- Example:
1416
-
1417
- .. testcode ::
1418
-
1419
- con = sqlite3.connect(":memory: ")
1420
- cur = con.cursor()
1421
-
1422
- AUSTRIA = "Österreich"
1423
-
1424
- # by default, rows are returned as str
1425
- cur.execute("SELECT ?", (AUSTRIA,))
1426
- row = cur.fetchone()
1427
- assert row[0] == AUSTRIA
1428
-
1429
- # but we can make sqlite3 always return bytestrings ...
1430
- con.text_factory = bytes
1431
- cur.execute("SELECT ?", (AUSTRIA,))
1432
- row = cur.fetchone()
1433
- assert type(row[0]) is bytes
1434
- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
1435
- # database ...
1436
- assert row[0] == AUSTRIA.encode("utf-8")
1437
-
1438
- # we can also implement a custom text_factory ...
1439
- # here we implement one that appends "foo" to all strings
1440
- con.text_factory = lambda x: x.decode("utf-8") + "foo"
1441
- cur.execute("SELECT ?", ("bar",))
1442
- row = cur.fetchone()
1443
- assert row[0] == "barfoo"
1444
-
1445
- con.close()
1422
+ See :ref: `sqlite3-howto-encoding ` for more details.
1446
1423
1447
1424
.. attribute :: total_changes
1448
1425
@@ -1601,7 +1578,6 @@ Cursor objects
1601
1578
COMMIT;
1602
1579
""")
1603
1580
1604
-
1605
1581
.. method :: fetchone()
1606
1582
1607
1583
If :attr: `~Cursor.row_factory ` is ``None ``,
@@ -2580,6 +2556,47 @@ With some adjustments, the above recipe can be adapted to use a
2580
2556
instead of a :class: `~collections.namedtuple `.
2581
2557
2582
2558
2559
+ .. _sqlite3-howto-encoding :
2560
+
2561
+ How to handle non-UTF-8 text encodings
2562
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2563
+
2564
+ By default, :mod: `!sqlite3 ` uses :class: `str ` to adapt SQLite values
2565
+ with the ``TEXT `` data type.
2566
+ This works well for UTF-8 encoded text, but it might fail for other encodings
2567
+ and invalid UTF-8.
2568
+ You can use a custom :attr: `~Connection.text_factory ` to handle such cases.
2569
+
2570
+ Because of SQLite's `flexible typing `_, it is not uncommon to encounter table
2571
+ columns with the ``TEXT `` data type containing non-UTF-8 encodings,
2572
+ or even arbitrary data.
2573
+ To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
2574
+ encoded text, for example a table of Czech-English dictionary entries.
2575
+ Assuming we now have a :class: `Connection ` instance :py:data: `!con `
2576
+ connected to this database,
2577
+ we can decode the Latin-2 encoded text using this :attr: `~Connection.text_factory `:
2578
+
2579
+ .. testcode ::
2580
+
2581
+ con.text_factory = lambda data: str(data, encoding="latin2")
2582
+
2583
+ For invalid UTF-8 or arbitrary data in stored in ``TEXT `` table columns,
2584
+ you can use the following technique, borrowed from the :ref: `unicode-howto `:
2585
+
2586
+ .. testcode ::
2587
+
2588
+ con.text_factory = lambda data: str(data, errors="surrogateescape")
2589
+
2590
+ .. note ::
2591
+
2592
+ The :mod: `!sqlite3 ` module API does not support strings
2593
+ containing surrogates.
2594
+
2595
+ .. seealso ::
2596
+
2597
+ :ref: `unicode-howto `
2598
+
2599
+
2583
2600
.. _sqlite3-explanation :
2584
2601
2585
2602
Explanation
0 commit comments