You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For reference, note that on this particular installation (MariaDB 10.3), the default collation for utf8mb4 is utf8mb4_general_ci, NOT utf8mb4_unicode_ci.
The bug in this case is that when using mysqlclient, it is IMPOSSIBLE to set the correct collation. There is an option charset, which whether it is provided, blank, or empty, issues a command as such:
The problem is that certain SQL queries, such as a CAST as generated by the Django Cast function, will infer the connection's collation, and can fail if the connection's collation does not match the table:
(1267, "Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '='")
The simple documented solution is to also specify the collation:
SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci
I have tried several workarounds to set the collation, which currently fail, because set_character_set is always run AFTER the workaround, therefore negating it.
Workaround 1
Use the init_command option to send SET NAME utf8mb4 COLLATE utf8mb4_unicode_ci after connecting. Unfortunately, from examining the query log, charset is set AFTER init_command, therefore negating it. The query log looks as such:
31 Query SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci -- run by init_command31 Query SET NAMES utf8mb4 -- from mysqlclient
Workaround 2
Use the init_connect option on the MySQL server. Unfortunately, the story is the same, as the charset is issued after the init_connect script is run, therefore negating it.
32 Query SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci -- run by the server32 Query SET NAMES utf8mb4 -- from mysqlclient
Possible fixes
Fix 1: Do not set charset if empty.
If charset is empty, or possibly None, do not set it. This way, it does not override the server's default behavior or the init_command
It seems this behavior (which is the source of my bug) was introduce in 2.1: #509
Fix 2: Add a collation option.
This deviates from the MySQL C connector behavior, but it would be really nice to be able to specify both "charset" and "collation". Special implementation in Python might be required for this, e.g. to issue a SQL command after connecting.
Environment
MySQL Server
MariaDB (Debian 11, Windows 10)
MariaDB 10.3.35
MySQL Client
Debian, Windows
Python tested on multiple versions: 3.7+
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
Use case: the database is configured with a character set, and a non-default collation for that character set.
For reference, note that on this particular installation (MariaDB 10.3), the default collation for
utf8mb4
isutf8mb4_general_ci
, NOTutf8mb4_unicode_ci
.The bug in this case is that when using
mysqlclient
, it is IMPOSSIBLE to set the correct collation. There is an optioncharset
, which whether it is provided, blank, or empty, issues a command as such:SET NAME
will ALWAYS use the default collation for the charset if the collation is not specified, which is documented here: https://mariadb.com/kb/en/set-names/, and much more deeply documented here: https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html.In this case we are specifying
charset="utf8mb4"
. The command is being generated is:SET NAMES utf8mb4
When COLLATION is omitted, this uses the default collation for
utf8mb4
whichutf8mb4_general_ci
. This is hard-coded into MariaDB and cannot be changed: https://mariadb.com/kb/en/change-is_default-value-for-information_schemacollations/The problem is that certain SQL queries, such as a CAST as generated by the Django Cast function, will infer the connection's collation, and can fail if the connection's collation does not match the table:
The simple documented solution is to also specify the collation:
SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci
I have tried several workarounds to set the collation, which currently fail, because
set_character_set
is always run AFTER the workaround, therefore negating it.Workaround 1
Use the
init_command
option to sendSET NAME utf8mb4 COLLATE utf8mb4_unicode_ci
after connecting. Unfortunately, from examining the query log,charset
is set AFTERinit_command
, therefore negating it. The query log looks as such:Workaround 2
Use the
init_connect
option on the MySQL server. Unfortunately, the story is the same, as thecharset
is issued after theinit_connect
script is run, therefore negating it.Possible fixes
Fix 1: Do not set charset if empty.
If charset is empty, or possibly
None
, do not set it. This way, it does not override the server's default behavior or theinit_command
mysqlclient/MySQLdb/connections.py
Lines 195 to 197 in 5c04abf
It seems this behavior (which is the source of my bug) was introduce in 2.1: #509
Fix 2: Add a
collation
option.This deviates from the MySQL C connector behavior, but it would be really nice to be able to specify both "charset" and "collation". Special implementation in Python might be required for this, e.g. to issue a SQL command after connecting.
Environment
MySQL Server
MySQL Client
Debian, Windows
Python tested on multiple versions: 3.7+
The text was updated successfully, but these errors were encountered: