Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffered a major mysql database encoding issue upgrading to rails 6.1.2 ! #41403

Closed
thomasdarde opened this issue Feb 11, 2021 · 9 comments
Closed

Comments

@thomasdarde
Copy link
Contributor

Steps to reproduce

Updating to rails 6.1.2 (a minor update) did create encoding issues with a mysql database.
We are still gathering data on how to reproduce exactly this issue

Expected behavior

A minor update of rails should not change this kind of very sensitive data

Actual behavior

The database was loaded with incorrect encoding leading to bad read, and bad writes to the database, this only showed in production

I suspect this change is linked to the issue :
9f39d00#diff-868f1dccfcbed26a288bf9f3fd8a39c863a4413ab0075e12b6805d9798f556d1

Still investigating

System configuration

Rails version: 6.1.2
Ruby version: 2.7.2

@eileencodes
Copy link
Member

Sorry that happened @thomasdarde. However we need more information here to determine if the cause is Rails. I haven't heard of this problem with other apps upgrade and using mysql. Please provide the errors/stacktrace you saw and your version of mysql. Also check your database configuration in production for settings that are unique to your app/env.

@jeremy
Copy link
Member

jeremy commented Feb 11, 2021

Ref #41232. /cc @kamipo

@thomasdarde
Copy link
Contributor Author

First thank you everyone for your support, I'm sorry for openging the tickets with as little information but wanted for other people to find an existing ticket if they found the same situation.

Mysql Version (aws rds) is 5.7.25, i'm using mysql2 gem

I'm using this line in my database.yml

encoding: utf8mb4

The schema is also with this kind of encoding information for each table : charset: "utf8mb4", collation: "utf8mb4_unicode_ci"

In the mysql adaptater of activerecord @config[:encoding] is not in use at all anymore (in rails 6.1.2)
It's still not clear for me how this information is passed to mysql2 gem.

@masaosg
Copy link

masaosg commented Feb 12, 2021

I had the same encoding problem with MySQL 5.7 after upgrading to Rails 6.1.2, but in my case, it was because I used the wrong client library version (MySQL 8.x client library) to build the mysql2 gem.
It seems that the library from version 8.x treats encoding: utf8mb4 as the default encoding and does not explicitly set it when connecting to the server, resulting in encoding mismatch when paired with MySQL < 8.0 servers.
After I downgraded the client headers and shared libraries to MySQL 5.7 and rebuilt the mysql2 gem, it worked flawlessly with the latest Rails version.

Some newer Linux distros only provide MySQL 8.0 client libraries by default (like Ubuntu 20.04 or later), so I think maybe OP has the same problem.

@kamipo kamipo self-assigned this Feb 13, 2021
@kamipo
Copy link
Member

kamipo commented Feb 15, 2021

Can you share the result on the environment?

    conn = ActiveRecord::Base.connection
    pp(
      server_version: conn.raw_connection.server_info[:version],
      client_version: conn.raw_connection.info[:version],
      encoding: conn.instance_variable_get(:@config)[:encoding],
      collation: conn.instance_variable_get(:@config)[:collation],
      character_set_client: conn.show_variable("character_set_client"),
      character_set_results: conn.show_variable("character_set_results"),
      character_set_connection: conn.show_variable("character_set_connection"),
      collation_connection: conn.show_variable("collation_connection"),
    )

If you have an encoding issue, the server variables may not be set by the connection (it works on the our test case with non utf8mb4 encoding though).

def test_character_set_connection_is_configured
run_without_connection do |orig_connection|
configuration_hash = orig_connection.except(:encoding, :collation)
ActiveRecord::Base.establish_connection(configuration_hash.merge!(encoding: "cp932"))
connection = ActiveRecord::Base.connection
assert_equal "cp932", connection.show_variable("character_set_client")
assert_equal "cp932", connection.show_variable("character_set_results")
assert_equal "cp932", connection.show_variable("character_set_connection")
assert_equal "cp932_japanese_ci", connection.show_variable("collation_connection")
expected = "こんにちは".encode(Encoding::CP932)
assert_equal expected, connection.query_value("SELECT 'こんにちは'")
end
end
def test_collation_connection_is_configured
assert_equal "utf8mb4_unicode_ci", @connection.show_variable("collation_connection")
assert_equal 1, @connection.query_value("SELECT 'こんにちは' = 'コンニチハ'")
assert_equal "utf8mb4_general_ci", ARUnit2Model.connection.show_variable("collation_connection")
assert_equal 0, ARUnit2Model.connection.query_value("SELECT 'こんにちは' = 'コンニチハ'")
end

@thomasdarde
Copy link
Contributor Author

Hello, thanks a lot for this incredible support .
This is the log:

 {:server_version=>"5.7.25-log",
 :client_version=>"8.0.23",
 :encoding=>"utf8mb4",
 :collation=>nil,
 :character_set_client=>"utf8mb4",
 :character_set_results=>"utf8mb4",
 :character_set_connection=>"utf8mb4",
 :collation_connection=>"utf8mb4_general_ci"}

So it seams I'm on the same situation than @masaosg , I will apply the same medication.
(also this is suprising but in a console in production, this line takes more than 30 seconds to return something) :
conn = ActiveRecord::Base.connection

I will close this thread as a solution is now available if someone has the same issue

@JasonBarnabe
Copy link
Contributor

I have this problem without a server/client version mismatch.

{:server_version=>"10.3.23-MariaDB-1:10.3.23+maria~focal",
 :client_version=>"10.3.27",
 :encoding=>"utf8mb4",
 :collation=>nil,
 :character_set_client=>"utf8",
 :character_set_results=>"utf8",
 :character_set_connection=>"utf8",
 :collation_connection=>"utf8_general_ci"}

If I run conn.execute('set names utf8mb4') then I get:

{:server_version=>"10.3.23-MariaDB-1:10.3.23+maria~focal",
 :client_version=>"10.3.27",
 :encoding=>"utf8mb4",
 :collation=>nil,
 :character_set_client=>"utf8mb4",
 :character_set_results=>"utf8mb4",
 :character_set_connection=>"utf8mb4",
 :collation_connection=>"utf8mb4_general_ci"}

and I can use utf8mb4 characters again.

In my case this happens on a CircleCI test run, where it's not easy to change the DB settings.

@rafaelfranca rafaelfranca reopened this Feb 16, 2021
@rafaelfranca
Copy link
Member

I think we should revert that change adding the SET NAMES back and instead fix the syntax of that statement to fix #41232. For 7.0 I'm fine to not include the SET NAMES statement but for 6.1.x we should not remove that.

rafaelfranca added a commit that referenced this issue Feb 16, 2021
@rafaelfranca
Copy link
Member

Reverted in 4f5e6b5 for 6-1-stable. main still have that change and maybe we will need to revisit it for 7.0

rafaelfranca added a commit that referenced this issue Feb 17, 2021
…aster"

This reverts commit 8b3fc5c, reversing
changes made to 668c140.

See #41403.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants