Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema diffing breaks since 3.2.0 when defining custom collation on column #5349

Closed
ruudk opened this issue Apr 8, 2022 · 12 comments
Closed

Comments

@ruudk
Copy link

ruudk commented Apr 8, 2022

Bug Report

Q A
doctrine/dbal 3.3.5
doctrine/doctrine-bundle 2.6.2
doctrine/doctrine-migrations-bundle 3.2.2
doctrine/orm 2.11.2
doctrine/migrations 3.5.0

Summary

After upgrading from dbal 3.1.5 I noticed that diffing schema changes (by Doctrine Migrations Bundle) would always result in updates that never go away after applying them.

This issue has been reported numerous times:

Still, after following those issues and waiting for a new release, the problem is not gone for me.

Current behaviour

I know this is a DBAL repository, and it should only talk about DBAL, but my table configuration is coming from ORM. To give all the context, this is my ORM configuration. Afterwards I'll share a DBAL only reproducer.

#[Table(name: 'content_organizer_brand', options: ['charset' => 'utf8mb4', 'collate' => 'utf8mb4_unicode_ci'])]
#[Index(name: 'name', columns: ['name'])]
#[Entity]
class DoctrineOrganizerBrand
{
    #[Id]
    #[Column(name: 'id', type: 'string', length: 36, options: ['collate' => 'utf8mb4_unicode_ci'])]
    private string $id;

    //...
}

#[Table(name: 'content_organizer_brand_location', options: ['charset' => 'utf8mb4', 'collate' => 'utf8mb4_bin'])]
#[Entity]
class DoctrineOrganizerBrandLinkedLocation
{
    #[Id]
    #[Column(name: 'id', type: 'string', length: 36, options: ['collate' => 'utf8mb4_bin'])]
    private string $id;

    #[Column(name: 'location_id', type: 'integer')]
    private int $locationId;

    #[ManyToOne(targetEntity: DoctrineOrganizerBrand::class, inversedBy: 'linkedLocations')]
    #[JoinColumn(name: 'organizer_brand_id', nullable: false)]
    private DoctrineOrganizerBrand $organizerBrand;

    // ..
}

When running bin/console doctrine:schema:update --dump-sql I get the following:

$ bin/console doctrine:schema:update --dump-sql

 The following SQL statements will be executed:

     ALTER TABLE content_organizer_brand_location CHANGE organizer_brand_id organizer_brand_id VARCHAR(36) NOT NULL;

How to reproduce

This is a DBAL only reproducer that shows the same problem:

<?php

declare(strict_types=1);

use Doctrine\DBAL\DriverManager;
use Doctrine\DBAL\Schema\Schema;
use Doctrine\DBAL\Types\Types;

include __DIR__ . '/vendor/autoload.php';

$host = '127.0.0.1';
$port = 3306;
$username = 'root';
$password = 'mysql';
$database = 'test';

$dsn = 'mysql://' . $username . ':' . $password . '@' . $host . ':' . $port . '/' . $database . '?serverVersion=8.0';
$defaultTableOptions = [
    'charset' => 'utf8mb4',
    'collate' => 'utf8mb4_unicode_ci',
    'engine' => 'InnoDB',
];

$connection = DriverManager::getConnection(['url' => $dsn]);
$connection->executeQuery(<<<SQL
DROP DATABASE $database;
CREATE DATABASE $database;
SQL
);

$connection = DriverManager::getConnection(['url' => $dsn]);
$connection->executeQuery(<<<SQL
CREATE TABLE `content_organizer_brand` (
  `id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `logo_url` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `content_organizer_brand_location` (
  `id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  `organizer_brand_id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `location_id` int NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_19E72965A6049BAB` (`organizer_brand_id`),
  CONSTRAINT `FK_19E72965A6049BAB` FOREIGN KEY (`organizer_brand_id`) REFERENCES `content_organizer_brand` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

SQL
);

$schemaManager = $connection->createSchemaManager();
$fromSchema = $schemaManager->createSchema();

$toSchema = new Schema([], [], $schemaManager->createSchemaConfig()->setDefaultTableOptions($defaultTableOptions));
$table = $toSchema->createTable('content_organizer_brand')
    ->addOption('charset', 'utf8mb4')
    ->addOption('collate', 'utf8mb4_unicode_ci');
$table->addColumn(
    'id',
    Types::STRING,
    [
        'length' => 36,
        'notnull' => true,
        'platformOptions' => [
            'charset' => 'utf8mb4',
            'collate' => 'utf8mb4_unicode_ci',
        ],
        'customSchemaOptions' => [
            'collate' => 'utf8mb4_unicode_ci',
        ],
    ]
);

$table->addColumn(
    'name',
    Types::STRING,
    [
        'length' => 255,
        'notnull' => true,
    ]
);
$table->addColumn(
    'logo_url',
    Types::STRING,
    [
        'length' => 255,
        'notnull' => true,
    ]
);
$table->setPrimaryKey(['id']);
$table->addIndex(['name'], 'name');

$table = $toSchema->createTable('content_organizer_brand_location')
    ->addOption('charset', 'utf8mb4')
    ->addOption('collate', 'utf8mb4_bin');
$table->addColumn(
    'id',
    Types::STRING,
    [
        'length' => 36,
        'notnull' => true,
        'platformOptions' => [
            'charset' => 'utf8mb4',
            'collate' => 'utf8mb4_bin',
        ],
    ]
);
$table->addColumn(
    'organizer_brand_id',
    Types::STRING,
    [
        'length' => 36,
        'notnull' => true,
        'platformOptions' => [
            'charset' => 'utf8mb4',
            'collate' => 'utf8mb4_unicode_ci',
        ],
        'customSchemaOptions' => [
            'collate' => 'utf8mb4_unicode_ci',
        ],
    ]
);
$table->addColumn(
    'location_id',
    Types::INTEGER,
    [
        'notnull' => true,
    ]
);
$table->setPrimaryKey(['id']);
$table->addForeignKeyConstraint(
    'content_organizer_brand',
    ['organizer_brand_id'],
    ['id'],
    []
);
$up = $schemaManager->createComparator()->compareSchemas($fromSchema, $toSchema);
$diffSql = $up->toSql($connection->getDatabasePlatform());
if ($diffSql !== []) {
    echo "The schema should be sync, but the diff keeps on reporting changes:\n";
    var_dump($diffSql);
    exit(1);
}

$down = $schemaManager->createComparator()->compareSchemas($toSchema, $fromSchema);
$diffSql = $down->toSql($connection->getDatabasePlatform());
if ($diffSql !== []) {
    echo "The schema should be sync, but the diff keeps on reporting changes:\n";
    var_dump($diffSql);
    exit(1);
}

This results in the following queries:

The schema should be sync, but the diff keeps on reporting changes:
array(2) {
  [0]=>
  string(110) "ALTER TABLE content_organizer_brand_location CHANGE organizer_brand_id organizer_brand_id VARCHAR(36) NOT NULL"
}

Even when I apply them, it keeps on reporting them.

Expected behaviour

No changes.

@derrabus
Copy link
Member

derrabus commented Apr 8, 2022

In you column definitions, can you please use collate instead of collation?

@ruudk
Copy link
Author

ruudk commented Apr 8, 2022

@derrabus Thanks, I updated the issue and used collate everywhere, that seems to improve it, as it reduces the diff to this:

ALTER TABLE content_organizer_brand_location CHANGE organizer_brand_id organizer_brand_id VARCHAR(36) NOT NULL;

@ruudk
Copy link
Author

ruudk commented Apr 8, 2022

I'm not sure if those column options should be called collate actually... Are you sure?
I thought collate was the deprecated option and collation is the term to use.

Also good to mention as to why I'm specifying collation for the id columns when they are the same as the table default: Well, if you are coming from a legacy application you often have old tables that are still in a different charset/collation. The only way to get ORM's JoinColumns to work, is when the foreign column and local column have the same charset/collation. That used to work when defining the charset/collation on the column that is referenced, it would then use that for the other column as well.

When the column specifies the same as the Table defaults, they are removed by Doctrine\DBAL\Platforms\MySQL\Comparator::normalizeColumns. When comparing them to the database schema, it's always different that causes the diff tool to propose an update.

I think this normalization is naive. When a user explicitly added the values, they should not be removed (normalized).

@ruudk
Copy link
Author

ruudk commented Apr 8, 2022

After spending the whole day debugging, I came to the realization that the only reason I have custom charset/collation on the columns is that I want to setup the JoinColumns between tables in 2 different charset/collations.

This is actually a bug in ORM, and should be fixed like this:

@derrabus
Copy link
Member

derrabus commented Apr 8, 2022

I'm not sure if those column options should be called collate actually... Are you sure? I thought collate was the deprecated option and collation is the term to use.

I've just had this problem the other day, so I wanted to check if this fixes your problem as well. Apparently, it does. I think, @greg0ire has worked on this previously. This still seems to be a footgun. 😓

@ruudk
Copy link
Author

ruudk commented Apr 8, 2022

I think it does not solve the problem it just masks it... The column should use collation while the table should use collate. When you use collate on the column, it just adds it as an extra SQL argument. It's not comparing it against the table collate.

The root cause of my problems (in this issue) are drilled down to the following:

  • When your column charset and collation are the same as the table's default charset and collate, they are seen as default and removed.
  • If the SchemaTool compares them against the toSchema (based on your config) it sees it as a difference, and tries to add back the charset and collation to the column, but that already exists.

I think the real problem is that DBAL has no idea of knowing if the column (from the database) has explicitly defined it's charset/collation, or that it used the table's default.

It runs this SQL query:

SELECT 
COLUMN_NAME AS Field, COLUMN_TYPE AS Type, IS_NULLABLE AS `Null`, COLUMN_KEY AS `Key`, COLUMN_DEFAULT AS `Default`, EXTRA AS Extra, COLUMN_COMMENT AS Comment, 
CHARACTER_SET_NAME AS CharacterSet, 
COLLATION_NAME AS Collation 
FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'content_organizer_brand' ORDER BY ORDINAL_POSITION ASC

and from there, there is no difference between explicit and implicit charset/collation.

But when you use SHOW CREATE TABLE content_organizer_brand, it DOES know about this:

CREATE TABLE `content_organizer_brand_location` (
  `id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  `organizer_brand_id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `location_id` int NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_19E72965A6049BAB` (`organizer_brand_id`),
  CONSTRAINT `FK_19E72965A6049BAB` FOREIGN KEY (`organizer_brand_id`) REFERENCES `content_organizer_brand` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

If DBAL would use that information, it would know that the column already had the correct charset/collation.

@morozov
Copy link
Member

morozov commented Apr 12, 2022

If I use "collation" instead of "collate" in the organizer_brand_id column definition, the issue is not reproducible.

@morozov
Copy link
Member

morozov commented Apr 12, 2022

In you column definitions, can you please use collate instead of collation?

Shouldn't it be the opposite? See #5249.

@ruudk
Copy link
Author

ruudk commented Apr 15, 2022

Hello everyone, thanks for your help so far.

I created a reproducer repository where I was table to reproduce the problem for ORM specific:

$ php orm.php
The following SQL statements will be executed:
CREATE TABLE users (id INT AUTO_INCREMENT NOT NULL,
  firstName VARCHAR(255) CHARACTER SET ascii NOT NULL COLLATE `ascii_general_ci`,
  lastName VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_unicode_ci`,
  email VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_bin`,
  PRIMARY KEY(id)) DEFAULT CHARACTER SET ascii COLLATE `ascii_general_ci` ENGINE = InnoDB;
CREATE TABLE tags (id INT AUTO_INCREMENT NOT NULL,
  name VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_bin`,
  title VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_unicode_ci`,
  description VARCHAR(255) CHARACTER SET ascii NOT NULL COLLATE `ascii_general_ci`,
  PRIMARY KEY(id)) DEFAULT CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` ENGINE = InnoDB;

Updating database schema...
2 queries were executed

Database schema updated successfully!
Checking if database schema is sync...

Database schema is not sync!

The following differences were found:
ALTER TABLE users CHANGE firstName firstName VARCHAR(255) CHARACTER SET ascii NOT NULL COLLATE `ascii_general_ci`;
ALTER TABLE tags CHANGE name name VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_bin`,
  CHANGE title title VARCHAR(255) CHARACTER SET utf8mb4 NOT NULL COLLATE `utf8mb4_unicode_ci`;

I also tried to reproduce it for DBAL but couldn't get the exact same result but I think I'm not setting the schema like ORM does.

The conclusion is that as soon as your column defines charset / collation that is exactly the same as the table default, it will always be a difference, that you cannot ever get to sync. This is caused by the MySQL Comparator::normalizeColumns method.

I think that the normalization should be altered, so that it knows when a user defines charset / collation explicitly. If it is defined explicitly, it should not be removed from the column1 (the one from the real database).

Please have a look, thanks 💙

@morozov
Copy link
Member

morozov commented Apr 15, 2022

I created a reproducer repository where I was able to reproduce the problem for ORM specific […]

@ruudk, if you believe the problem exists and needs to be fixed in the DBAL, please reproduce it using only the DBAL APIs like you did in the issue description but without using the deprecated "collate" option as suggested in #5349 (comment).

Once the problem is reproduced, we can discuss the solution.

@morozov
Copy link
Member

morozov commented Apr 28, 2022

I'm going to close the issue as non-reproducible. Please feel free to provide the steps to reproduce and reopen.

@morozov morozov closed this as completed Apr 28, 2022
@morozov morozov closed this as not planned Won't fix, can't repro, duplicate, stale Jul 15, 2022
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants