Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysqli Driver - Undesired Collations Cannot be Altered for collation_connection Variable #6350

Open
RyanPrussin opened this issue Apr 1, 2024 · 0 comments

Comments

@RyanPrussin
Copy link

RyanPrussin commented Apr 1, 2024

Bug Report

It is currently not possible to change collations from the default collation via DBAL's Mysqli driver, particularly for the collation_connection variable in MySQL / MariaDB. This is not ideal, as from what I understand it is supposed to be the client's responsibility to manage this setting and therefore from the perspective of the server it must be managed via DBAL / mysqli when connecting to the DB.

Q A
DBAL Version 3.6.1 (likely affects all versions)
DB Version 10.11.6-MariaDB

Summary

Some background: there are 3 collation variables that need to be set when changing from the default collation. The query below displays all 3, plus all of the charset settings for reference:

SHOW VARIABLES
WHERE LOWER(Variable_name) REGEXP 'character_set_'
   OR LOWER(Variable_name) REGEXP 'collation';

Outputs when using DBAL:

Variable_name Value Works? My Notes
character_set_client utf8mb4 Currently set without issue
character_set_connection utf8mb4 Currently modified on client / DBAL
character_set_database utf8mb4 Set on DB create or via ALTER query
character_set_results utf8mb4 Currently modified without issue
character_set_server utf8mb4 Must be modified by server
character_set_system utf8mb3 Cannot be changed (this is normal)
collation_connection utf8mb4_general_ci Must be modified on client / DBAL
collation_database utf8mb4_unicode_520_ci Set on DB create or via ALTER query
collation_server utf8mb4_unicode_520_ci Must be modified by server

We want collation_connection to match the collations used by the server (in this case, utf8mb4_unicode_520_ci); when using DBAL this is currently not possible.

To be fair, mysqli doesn't make this easy to begin with; it's documentation states in multiple places that you should always use mysqli::set_charset(...) over the query form, e.g. Set Names utf8;. See the excerpt below pulled from this page for clarifications on why:
image

Despite this, invoking mysqli::set_charset(...) will also reset the collation back to the server default for the supplied charset, and mysqli appears to offer no API for updating the collation along with set_charset(). Additionally, it seems that the server default cannot be modified in many version of MariaDB / MySQL; MariaDB only introduced functionality in 11.2 to support changing this default collation, and MySQL only introduced the configuration setting default_collation_for_utf8mb4 in 8.0.11.

Here is a helpful comment from the PHP docs explaining this further, and another great comment elaborating on the details.

There are two methods I found via mysqli to accomplish this goal, and both involve running this single query (or alternatively a grouping of other queries, but I think this is the most minimalistic approach) AFTER executing mysqli::set_charset('utf8mb4'):

SET NAMES utf8mb4 COLLATE utf8mb4_unicode_520_ci;
  1. Use the mysqli option MYSQLI_INIT_COMMAND to add the query to execute automatically upon connecting to the DB.
    --OR--
  2. Immediately after you have connected to the DB via mysqli, execute the query via $mysqli->query(...).

DBAL theoretically supports the 1st method above, and can be configured as follows via yaml:

doctrine:
    dbal:
        host: '%db.host%'
        port: '%db.port%'
        dbname: '%db.dbname%'
        user: '%db.user%'
        password: '%db.password%'
        driver: '%db.driver%'
        charset: '%db.charset%'
        server_version: '%db.server_version%'
        options:
            # This part tells Doctrine what to execute on connecting to MariaDB
            !php/const MYSQLI_INIT_COMMAND: 'SET NAMES %db.charset% COLLATE %db.collation%'
        default_table_options:
            charset: '%db.charset%'
            collation: '%db.collation%'
            engine: InnoDB

The problem is that this MYSQLI_INIT_COMMAND above is successfully executed on connection in the Doctrine\DBAL\Driver\Mysqli\Driver class, but is immediately overwritten in the same class after connecting when DBAL sets the charset during the $this->compilePostInitializers($params) loop, due to that loop invoking mysqli::set_charset(...) which resets the collation back to the default collation for the supplied charset.

This leaves the only current way to update the collation via the DBAL Connection class as a manual execution of that query as the first query at every location in code before executing commands, which is less than ideal (or perhaps there is a way using a wrapper Connection, but this seems like something that ought to be built into the default Connection class and DBAL configuration directly in my opinion).

Suggested Fix 1:

  • doctrine.yaml should support a collation option for the purpose of ensuring that collation_connection value can be adjusted by the end-user to other collations.
  • In the MySQL driver class, after the Charset initializer is executed, if a collation was set by the user, the following query should be run to apply that collation:
SET NAMES {$sameCharset} COLLATE {$suppliedCollation}

Suggested Fix 2 (possibly a bad idea -- needs more investigation):

  • Using mysqli, the default settings just work if you skip mysqli::set_charset(). Unfortunately, DBAL does not allow you to do this, and will force a series of defaults in these cases; these defaults could be removed. My suspicion is that under the hood the set_charset() call might be required by mysqli due to the underlying functionality of mysqli::real_escape_string, but it doesn't look like this function is used in DBAL's code base so perhaps there is little actual risk involved?

Current behaviour

Doctrine DBAL does not allow you to change the collation_connection DB variable.

How to reproduce

  1. Use MariaDB (or possible MySQL pre-8.0 as well -- untested).
  2. Try to set the charset to utf8mb4 and the collation to utf8mb4_unicode_520_ci.
  3. Execute the following query to observe that it does not work:
SHOW VARIABLES
WHERE LOWER(Variable_name) REGEXP 'character_set_'
   OR LOWER(Variable_name) REGEXP 'collation';

Expected behaviour

DBAL should allow you to change configuration settings such that collation_connection is properly set to a collation other than your DB's default value.

Additional Notes:

  • This behavior likely affects MariaDB more than MySQL as MariaDB has only started to add tools to address this in 11.2 which came out this past year.
  • If this problem does affect MySQL, it is probably most impactful in versions before MySQL 8.0, as I believe some charset + collation defaults were upgraded in that version. Since that version has been out for 6 years already, I would suspect the problem is less prevalent for MySQL users on average.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant