-
-
Notifications
You must be signed in to change notification settings - Fork 226
Compile sqlite with FTS3 enhanced query syntax enabled #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
The failures look like #567 — we can ignore those.
I know this was a while ago, but
I don't think this is right - it seems to me that this compile-time flag impacts the default setting of the connection-level flag, and it can still be enabled via $ uvx [email protected]
Python 3.12.8 (main, Jan 14 2025, 22:49:14) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> s = sqlite3.connect(":memory:")
>>> c = s.cursor()
>>> s.getconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
False
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', x'deadbeefdeadbeef')")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.OperationalError: fts3tokenize disabled
>>> s.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', x'deadbeefdeadbeef')")
<sqlite3.Cursor object at 0x78ac33227d40>
>>> ^D
$ uvx [email protected]
Python 3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> s = sqlite3.connect(":memory:")
>>> c = s.cursor()
>>> s.getconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
True
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', x'deadbeefdeadbeef')")
<sqlite3.Cursor object at 0x7d354b927cc0> Are you actually using the two-argument form of In everyone's defense, the SQLite documentation seems very wrong about this - it claims that SQLite 3.11 continues to enable the two-argument form by default but requires the blob to be a bound parameter instead of a literal, and it doesn't discuss the I reported the doc bug at https://sqlite.org/forum/forumpost/923cdbe766, and I might raise this to Debian if I get confirmation that I'm understanding this right. |
* astral-sh#309: -DSQLITE_ENABLE_DBSTAT_VTAB * astral-sh#449: serialize/deserialize (on by default, was just a compile-time detection issue) * astral-sh#550/astral-sh#562: -DSQLITE_ENABLE_FTS3_PARENTHESIS and -DSQLITE_ENABLE_FTS3_TOKENIZER
OK, I misunderstood this slightly but I think the overall point about leaving the compile-time flag off still applies. The SQLite docs are correct that the flag only affects whether you must use a bound parameter when calling $ uvx [email protected]
Python 3.12.8 (main, Jan 14 2025, 22:49:14) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> s = sqlite3.connect(":memory:")
>>> c = s.cursor()
>>> s.getconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
False
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', ?)", (0xdeadbeefdeadbeef.to_bytes(8, "little"),))
<sqlite3.Cursor object at 0x7e4aac023ec0>
>>> c.execute("SELECT fts3_tokenizer('mytokenizer')").fetchone()
(None,)
>>> c.execute("SELECT fts3_tokenizer('notmytokenizer')").fetchone()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
sqlite3.OperationalError: unknown tokenizer: notmytokenizer I would expect that anyone who's registering a tokenizer would prefer to use a bound parameter instead of string interpolation, honestly. The SQLite docs are incorrect about the effect of the I plan on turning the compile-time flag off in python-build-standalone unless there's a good reason (like existing code) that needs it, and I'll file a new Debian bug about this. |
…le options As noted in the discussion in astral-sh#562, compiling SQLite with the -DSQLITE_ENABLE_FTS3_TOKENIZER flag is equivalent to using `connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)` at runtime. The purpose of this option, in either syntax, is to disable a security measure to provide backwards compatibility for older code. Specifically, the `fts3_tokenizer()` function can accept or return a native-code pointer to a structure containing callback functions, which makes it an attractive target for SQL injection attacks to escalate to arbitrary native code execution. The more-secure behavior is to require the use of bound parameters with this function; the backwards-compatible behavior allows the function to be called with blob literals or computed values. Because of a documentation shortcoming, some applications thought they needed this option on at compile time, and so Debian's SQLite build, used by e.g. the `python` container on Dockerhub, has it on. But there is no functionality that is only enabled by having this option on at compile time. Ideally, applications should use bound parameters when calling this function. If that code change is hard, they can alternatively set the option themselves at runtime to preserve compatibility with existing code, but that still doesn't need anything turned on at compile time. So the right decision for us is not to enable this flag at compile time and preserve the secure behavior. Add a test that `fts3_tokenizer()` is usable with bound parameters but not with blob literals, and also add tests for a couple of other preivously-requested SQLite flags for compatibility with other implementations: * astral-sh#309: -DSQLITE_ENABLE_DBSTAT_VTAB * astral-sh#449: serialize/deserialize (on by default, was just a compile-time detection issue) * astral-sh#550/astral-sh#562: -DSQLITE_ENABLE_FTS3_PARENTHESIS
…le options As noted in the discussion in astral-sh#562, compiling SQLite with the -DSQLITE_ENABLE_FTS3_TOKENIZER flag is equivalent to using `connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)` at runtime. The purpose of this option, in either syntax, is to disable a security measure to provide backwards compatibility for older code. Specifically, the `fts3_tokenizer()` function can accept or return a native-code pointer to a structure containing callback functions, which makes it an attractive target for SQL injection attacks to escalate to arbitrary native code execution. The more-secure behavior is to require the use of bound parameters with this function; the backwards-compatible behavior allows the function to be called with blob literals or computed values. Because of a documentation shortcoming, some applications thought they needed this option on at compile time, and so Debian's SQLite build, used by e.g. the `python` container on Dockerhub, has it on. But there is no functionality that is only enabled by having this option on at compile time. Ideally, applications should use bound parameters when calling this function. If that code change is hard, they can alternatively set the option themselves at runtime to preserve compatibility with existing code, but that still doesn't need anything turned on at compile time. So the right decision for us is not to enable this flag at compile time and preserve the secure behavior. Add a test that `fts3_tokenizer()` is usable with bound parameters but not with blob literals, and also add tests for a couple of other preivously-requested SQLite flags for compatibility with other implementations: * astral-sh#309: -DSQLITE_ENABLE_DBSTAT_VTAB * astral-sh#449: serialize/deserialize (on by default, was just a compile-time detection issue) * astral-sh#550: -DSQLITE_ENABLE_FTS3_PARENTHESIS
…le options (#791) As noted in the discussion in #562, compiling SQLite with the -DSQLITE_ENABLE_FTS3_TOKENIZER flag is equivalent to using `connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)` at runtime. The purpose of this option, in either syntax, is to disable a security measure to provide backwards compatibility for older code. Specifically, the `fts3_tokenizer()` function can accept or return a native-code pointer to a structure containing callback functions, which makes it an attractive target for SQL injection attacks to escalate to arbitrary native code execution. The more-secure behavior is to require the use of bound parameters with this function; the backwards-compatible behavior allows the function to be called with blob literals or computed values. Because of a documentation shortcoming, some applications thought they needed this option on at compile time, and so Debian's SQLite build, used by e.g. the `python` container on Dockerhub, has it on. But there is no functionality that is only enabled by having this option on at compile time. Ideally, applications should use bound parameters when calling this function. If that code change is hard, they can alternatively set the option themselves at runtime to preserve compatibility with existing code, but that still doesn't need anything turned on at compile time. So the right decision for us is not to enable this flag at compile time and preserve the secure behavior. Add a test that `fts3_tokenizer()` is usable with bound parameters but not with blob literals, and also add tests for a couple of other preivously-requested SQLite flags for compatibility with other implementations: * #309: -DSQLITE_ENABLE_DBSTAT_VTAB * #449: serialize/deserialize (on by default, was just a compile-time detection issue) * #550: -DSQLITE_ENABLE_FTS3_PARENTHESIS
Fixes #550
This adds three compilation flags to the embedded sqlite module:
-DSQLITE_ENABLE_FTS3
to explicitly enable FTS3 support (which AFAIK is already enabled by default)-DSQLITE_ENABLE_FTS3_PARENTHESIS
to enable the enhanced FTS3 query syntax (which isn't enabled by default)-DSQLITE_ENABLE_FTS3_TOKENIZER
to enable the two-argument version of thefts3_tokenizer()
interface. It still has to be enabled on the connection-level usingConnection.setconfig
All those flags are enabled by default in the Debian build of SQLite, and probably in other distributions as well.