You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m trying to retrieve data from a MySQL v5.1 database in which most of the character settings are latin1. The trouble is with characters in the \x80 - \x9f range, the ones that are different between canonical latin1 (ISO-8859-1) and MySQL’s latin1 (cp1252). I’m using mysqlclient as part of a Django application but I’ve tried to replicate the problem with a more direct example below. The replication isn’t perfect, the Django application throws an error: "'utf-8' codec can't decode byte 0x92 in position 5: invalid start byte" when there is no charset option set. It prints "Women . s" (i.e. whitespace in place of the apostrophe) when the charset = 'latin1' option is provided. However, I hope the example below is close enough that the solution will work for Django as well.
In the example below, the word "Women’s", with the apostrophe as \x92, is being stored in a MySQL v5.7 database.
First, set up a local MySQL v5.7 server so it's similar to the MySQL v5.1 server that we're trying to hit.
mysql> CREATE DATABASE testchar;
mysql> USE testchar;
mysql> CREATE TABLE testchar (id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT, data_value VARCHAR(80) CHARACTER SET latin1 ) ENGINE=MYISAM DEFAULT CHARSET=latin1;
mysql> INSERT INTO testchar(data_value) VALUES (UNHEX("576F6D656E9273"));
It looks ok with a CONVERT statement but not with a SELECT statement
Should see the connection, result, and client values change to latin1. This is close to what the v5.1 server is set to.
Now try the query
>>> c.execute("select * from testchar")
>>> c.fetchall()
((1, 'Women\x92s'),)
The same thing happens if I pass charset="latin1" to the db.connect(... statement.
Is there something I’m doing wrong in making this connection, or some setting I need to add to make this work as expected?
I did try posting to Stack Overflow, and got a response, but the suggestions didn’t fix the problem. Any suggestions you can provide would be great, thanks.
The text was updated successfully, but these errors were encountered:
You must not do this. It changes only server side variables.
In worst case, this cause charset mismatch between server and client, and cause SQL-injection attack.
I strongly recommend you to convert latin1 to utf8. I don't test charset other than utf8 well.
Using the SET NAMES was just a way to make the session variables in the connection to the test database match what we're seeing on the live database. We're not using SET NAMES in our live connection.
And unfortunately, I don't have control over the database we're accessing, and it's not clear that it will be upgraded. My fallback option is to schedule a nightly dump-and-convert to another DB, or a convoluted hack using Python's bytearray. I'll take a look at your suggestion for setting conn.encoding but I suspect Django doesn't give me an option to do that.
I’m trying to retrieve data from a MySQL v5.1 database in which most of the character settings are latin1. The trouble is with characters in the \x80 - \x9f range, the ones that are different between canonical latin1 (ISO-8859-1) and MySQL’s latin1 (cp1252). I’m using mysqlclient as part of a Django application but I’ve tried to replicate the problem with a more direct example below. The replication isn’t perfect, the Django application throws an error: "'utf-8' codec can't decode byte 0x92 in position 5: invalid start byte" when there is no charset option set. It prints "Women . s" (i.e. whitespace in place of the apostrophe) when the
charset = 'latin1'
option is provided. However, I hope the example below is close enough that the solution will work for Django as well.In the example below, the word "Women’s", with the apostrophe as \x92, is being stored in a MySQL v5.7 database.
First, set up a local MySQL v5.7 server so it's similar to the MySQL v5.1 server that we're trying to hit.
It looks ok with a CONVERT statement but not with a SELECT statement
Now set up the python environment
Should see lots of utf-8 settings. In order to simulate what is shown when this is done on the actual v5.1 server, need to set names.
Should see the connection, result, and client values change to latin1. This is close to what the v5.1 server is set to.
Now try the query
The same thing happens if I pass
charset="latin1"
to thedb.connect(...
statement.Is there something I’m doing wrong in making this connection, or some setting I need to add to make this work as expected?
I did try posting to Stack Overflow, and got a response, but the suggestions didn’t fix the problem. Any suggestions you can provide would be great, thanks.
The text was updated successfully, but these errors were encountered: