Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-ASCII field name can be used only with utf-8 connections #210

Closed
llevequemxcf opened this issue Nov 17, 2017 · 4 comments
Closed

non-ASCII field name can be used only with utf-8 connections #210

llevequemxcf opened this issue Nov 17, 2017 · 4 comments

Comments

@llevequemxcf
Copy link

llevequemxcf commented Nov 17, 2017

Using the module with a latin-1 connection I get this Exception :

Traceback (most recent call last):
  File "export_data.py", line 26, in <module>
    cursor.execute(sql)
  File "/home/toto/.local/lib/python3.5/site-packages/MySQLdb/cursors.py", line 252, in execute
    self.errorhandler(self, exc, value)
  File "/home/toto/.local/lib/python3.5/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
  File "/home/toto/.local/lib/python3.5/site-packages/MySQLdb/cursors.py", line 249, in execute
    res = self._query(query)
  File "/home/toto/.local/lib/python3.5/site-packages/MySQLdb/cursors.py", line 413, in _query
    rowcount = self._do_query(q)
  File "/home/toto/.local/lib/python3.5/site-packages/MySQLdb/cursors.py", line 377, in _do_query
    self._do_get_result()
  File "/home/toto/.local/lib/python3.5/site-packages/MySQLdb/cursors.py", line 189, in _do_get_result
    self.description = self._result and self._result.describe() or None
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

Looking at _mysql.c, I found out the problem is on line 1265 :
t = Py_BuildValue("(siiiiii)",

In the format string, the s character is interpreted by the python 2 API as an encoded string whereas for python 3 it is interpreted as a wish to decode into a string using utf-8.

Therefore, for python 3 the 's' should be replaced by a 'y' so we will get back a python bytes object. We could then decode this byte python side where we know the encoding used by the database connection.

@methane
Copy link
Member

methane commented Nov 17, 2017

Encodings other than UTF-8 is not tested, maintained.
I'll look it when I have time, but I strongly recommend you to use UTF-8.

@methane
Copy link
Member

methane commented Nov 17, 2017

BTW, how can I reproduce it?
Do you sure about column name is always encoded in connection encoding?

@methane
Copy link
Member

methane commented Nov 17, 2017

Note that my terminal encoding is utf-8:

$ mysql -uroot --default-character-set=utf8 test

mysql> create table nonascii_column (`こんにちは` int);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into nonascii_column (`こんにちは`) values (42);
Query OK, 1 row affected (0.01 sec)

mysql> ^DBye

$ mysql -uroot --default-character-set=latin1 test -e 'SELECT * FROM nonascii_column'
+-----------------+
| こんにちは |
+-----------------+
|              42 |
+-----------------+

So, regardless connection encoding, column name is arbitrary bytes.

@llevequemxcf
Copy link
Author

Well it would be interesting to see what happens when you try to query your nonascii_column table with mysqlclient-python. Here are the result of my test with a utf-8 terminal :

$ mysql -p toto

MariaDB [toto]> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)

MariaDB [toto]> create table nonascii_column (`é` int);
Query OK, 0 rows affected (0.41 sec)
MariaDB [toto]> use information_schema
MariaDB [information_schema]> SELECT HEX(COLUMN_NAME) FROM COLUMNS WHERE TABLE_NAME='nonascii_column';
+------------------+
| HEX(COLUMN_NAME) |
+------------------+
| C3A9             |
+------------------+
1 row in set (0.00 sec)

We can see that the column is stored as UTF-8.

Now in python :

Python 2.7.13 (default, Jan 19 2017, 14:48:08) 
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from MySQLdb import connect
>>> conn = connect(host='127.0.0.1', user='root',
... password='toto', database='toto')
>>> with conn as cursor:
...     cursor.execute("SELECT * FROM nonascii_column")
...     print(cursor.description)
... 
0L
(('\xe9', 3, 0, 11, 11, 0, 1),)

Here we see that the python client sees it at latin-1

@methane methane changed the title In Python 3, only supports utf-8 connections non-ASCII field name can be used only with utf-8 connections Dec 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants