Skip to content

Conversation

sandhose
Copy link
Contributor

Fixes #550

This adds three compilation flags to the embedded sqlite module:

  • -DSQLITE_ENABLE_FTS3 to explicitly enable FTS3 support (which AFAIK is already enabled by default)
  • -DSQLITE_ENABLE_FTS3_PARENTHESIS to enable the enhanced FTS3 query syntax (which isn't enabled by default)
  • -DSQLITE_ENABLE_FTS3_TOKENIZER to enable the two-argument version of the fts3_tokenizer() interface. It still has to be enabled on the connection-level using Connection.setconfig

All those flags are enabled by default in the Debian build of SQLite, and probably in other distributions as well.

Copy link
Member

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

The failures look like #567 — we can ignore those.

@zanieb zanieb merged commit 4ef6f72 into astral-sh:main Mar 17, 2025
382 of 390 checks passed
@geofft
Copy link
Collaborator

geofft commented Sep 4, 2025

I know this was a while ago, but

  • -DSQLITE_ENABLE_FTS3_TOKENIZER to enable the two-argument version of the fts3_tokenizer() interface. It still has to be enabled on the connection-level using Connection.setconfig

I don't think this is right - it seems to me that this compile-time flag impacts the default setting of the connection-level flag, and it can still be enabled via setconfig, and turning on this flag means it does not need to be enabled via setconfig. Compare these versions from before/after this change:

$ uvx [email protected]
Python 3.12.8 (main, Jan 14 2025, 22:49:14) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> s = sqlite3.connect(":memory:")
>>> c = s.cursor()
>>> s.getconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
False
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', x'deadbeefdeadbeef')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: fts3tokenize disabled
>>> s.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', x'deadbeefdeadbeef')")
<sqlite3.Cursor object at 0x78ac33227d40>
>>> ^D
$ uvx [email protected]
Python 3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> s = sqlite3.connect(":memory:")
>>> c = s.cursor()
>>> s.getconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
True
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', x'deadbeefdeadbeef')")
<sqlite3.Cursor object at 0x7d354b927cc0>

Are you actually using the two-argument form of fts3_tokenizer in your code? (It seems kind of annoying to use from Python, to be honest.) If so, is running connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER) a workable option for your code? It sounds like this is disabled by default for a relatively good security reason, to prevent a SQL injection from turning into an arbitrary native-code pointer injection. If applications that need it can easily opt in, I'd like to turn it back off by default.

In everyone's defense, the SQLite documentation seems very wrong about this - it claims that SQLite 3.11 continues to enable the two-argument form by default but requires the blob to be a bound parameter instead of a literal, and it doesn't discuss the #define. In reality, SQLite 3.11 disables the two-argument form by default, and if you re-enable it (either with the #define or with setconfig), it doesn't distinguish between bound parameters and literals, as demonstrated above.

I reported the doc bug at https://sqlite.org/forum/forumpost/923cdbe766, and I might raise this to Debian if I get confirmation that I'm understanding this right.

geofft added a commit to geofft/python-build-standalone that referenced this pull request Sep 4, 2025
* astral-sh#309: -DSQLITE_ENABLE_DBSTAT_VTAB
* astral-sh#449: serialize/deserialize (on by default, was just a compile-time detection issue)
* astral-sh#550/astral-sh#562: -DSQLITE_ENABLE_FTS3_PARENTHESIS and -DSQLITE_ENABLE_FTS3_TOKENIZER
@geofft
Copy link
Collaborator

geofft commented Sep 5, 2025

OK, I misunderstood this slightly but I think the overall point about leaving the compile-time flag off still applies.

The SQLite docs are correct that the flag only affects whether you must use a bound parameter when calling fts3_tokenize or whether it also accepts literals/computed values. As you can see my example code above does not use a bound parameter... if you do use a bound parameter, you can register a tokenizer without the compile-time flag and without the setconfig runtime flag.

$ uvx [email protected]
Python 3.12.8 (main, Jan 14 2025, 22:49:14) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> s = sqlite3.connect(":memory:")
>>> c = s.cursor()
>>> s.getconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)
False
>>> c.execute("SELECT fts3_tokenizer('mytokenizer', ?)", (0xdeadbeefdeadbeef.to_bytes(8, "little"),))
<sqlite3.Cursor object at 0x7e4aac023ec0>
>>> c.execute("SELECT fts3_tokenizer('mytokenizer')").fetchone()
(None,)
>>> c.execute("SELECT fts3_tokenizer('notmytokenizer')").fetchone()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: unknown tokenizer: notmytokenizer

I would expect that anyone who's registering a tokenizer would prefer to use a bound parameter instead of string interpolation, honestly.

The SQLite docs are incorrect about the effect of the setconfig flag (it currently claims that it turns on/off the fts3_tokenizer function entirely as opposed to turning off/on the requirement to use bound parameters), which they will fix. They also don't mention the effect of the compile-time flag, which is indeed to just choose the default setting of the setconfig flag.

I plan on turning the compile-time flag off in python-build-standalone unless there's a good reason (like existing code) that needs it, and I'll file a new Debian bug about this.

geofft added a commit to geofft/python-build-standalone that referenced this pull request Sep 17, 2025
…le options

As noted in the discussion in astral-sh#562, compiling SQLite with the
-DSQLITE_ENABLE_FTS3_TOKENIZER flag is equivalent to using
`connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)` at
runtime. The purpose of this option, in either syntax, is to disable a
security measure to provide backwards compatibility for older code.
Specifically, the `fts3_tokenizer()` function can accept or return a
native-code pointer to a structure containing callback functions, which
makes it an attractive target for SQL injection attacks to escalate to
arbitrary native code execution. The more-secure behavior is to require
the use of bound parameters with this function; the backwards-compatible
behavior allows the function to be called with blob literals or computed
values. Because of a documentation shortcoming, some applications
thought they needed this option on at compile time, and so Debian's
SQLite build, used by e.g. the `python` container on Dockerhub, has it
on. But there is no functionality that is only enabled by having this
option on at compile time. Ideally, applications should use bound
parameters when calling this function. If that code change is hard, they
can alternatively set the option themselves at runtime to preserve
compatibility with existing code, but that still doesn't need anything
turned on at compile time. So the right decision for us is not to enable
this flag at compile time and preserve the secure behavior.

Add a test that `fts3_tokenizer()` is usable with bound parameters but
not with blob literals, and also add tests for a couple of other
preivously-requested SQLite flags for compatibility with other
implementations:

* astral-sh#309: -DSQLITE_ENABLE_DBSTAT_VTAB
* astral-sh#449: serialize/deserialize (on by default, was just a compile-time detection issue)
* astral-sh#550/astral-sh#562: -DSQLITE_ENABLE_FTS3_PARENTHESIS
geofft added a commit to geofft/python-build-standalone that referenced this pull request Sep 17, 2025
…le options

As noted in the discussion in astral-sh#562, compiling SQLite with the
-DSQLITE_ENABLE_FTS3_TOKENIZER flag is equivalent to using
`connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)` at
runtime. The purpose of this option, in either syntax, is to disable a
security measure to provide backwards compatibility for older code.
Specifically, the `fts3_tokenizer()` function can accept or return a
native-code pointer to a structure containing callback functions, which
makes it an attractive target for SQL injection attacks to escalate to
arbitrary native code execution. The more-secure behavior is to require
the use of bound parameters with this function; the backwards-compatible
behavior allows the function to be called with blob literals or computed
values. Because of a documentation shortcoming, some applications
thought they needed this option on at compile time, and so Debian's
SQLite build, used by e.g. the `python` container on Dockerhub, has it
on. But there is no functionality that is only enabled by having this
option on at compile time. Ideally, applications should use bound
parameters when calling this function. If that code change is hard, they
can alternatively set the option themselves at runtime to preserve
compatibility with existing code, but that still doesn't need anything
turned on at compile time. So the right decision for us is not to enable
this flag at compile time and preserve the secure behavior.

Add a test that `fts3_tokenizer()` is usable with bound parameters but
not with blob literals, and also add tests for a couple of other
preivously-requested SQLite flags for compatibility with other
implementations:

* astral-sh#309: -DSQLITE_ENABLE_DBSTAT_VTAB
* astral-sh#449: serialize/deserialize (on by default, was just a compile-time detection issue)
* astral-sh#550: -DSQLITE_ENABLE_FTS3_PARENTHESIS
geofft added a commit that referenced this pull request Sep 18, 2025
…le options (#791)

As noted in the discussion in #562, compiling SQLite with the
-DSQLITE_ENABLE_FTS3_TOKENIZER flag is equivalent to using
`connection.setconfig(sqlite3.SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER)` at
runtime. The purpose of this option, in either syntax, is to disable a
security measure to provide backwards compatibility for older code.
Specifically, the `fts3_tokenizer()` function can accept or return a
native-code pointer to a structure containing callback functions, which
makes it an attractive target for SQL injection attacks to escalate to
arbitrary native code execution. The more-secure behavior is to require
the use of bound parameters with this function; the backwards-compatible
behavior allows the function to be called with blob literals or computed
values. Because of a documentation shortcoming, some applications
thought they needed this option on at compile time, and so Debian's
SQLite build, used by e.g. the `python` container on Dockerhub, has it
on. But there is no functionality that is only enabled by having this
option on at compile time. Ideally, applications should use bound
parameters when calling this function. If that code change is hard, they
can alternatively set the option themselves at runtime to preserve
compatibility with existing code, but that still doesn't need anything
turned on at compile time. So the right decision for us is not to enable
this flag at compile time and preserve the secure behavior.

Add a test that `fts3_tokenizer()` is usable with bound parameters but
not with blob literals, and also add tests for a couple of other
preivously-requested SQLite flags for compatibility with other
implementations:

* #309: -DSQLITE_ENABLE_DBSTAT_VTAB
* #449: serialize/deserialize (on by default, was just a compile-time
detection issue)
* #550: -DSQLITE_ENABLE_FTS3_PARENTHESIS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compile SQLite with -DENABLE_FTS3_PARENTHESIS
3 participants