You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(tmp-a84c01591caa48e) [16:56|jerith@meklon] ~/.virtualenvs/tmp-a84c01591caa48e% python foo.py
Traceback (most recent call last):
File "foo.py", line 10, in <module>
react(main)
File "/Users/jerith/.virtualenvs/tmp-a84c01591caa48e/lib/python3.8/site-packages/twisted/internet/task.py", line 909, in react
finished = main(_reactor, *argv)
File "foo.py", line 7, in main
d = treq.get(URL, params={b'foo': 'Hello'.encode('utf-32')})
File "/Users/jerith/.virtualenvs/tmp-a84c01591caa48e/lib/python3.8/site-packages/treq/api.py", line 24, in get
return _client(**kwargs).get(url, headers=headers, **kwargs)
File "/Users/jerith/.virtualenvs/tmp-a84c01591caa48e/lib/python3.8/site-packages/treq/client.py", line 119, in get
return self.request('GET', url, **kwargs)
File "/Users/jerith/.virtualenvs/tmp-a84c01591caa48e/lib/python3.8/site-packages/treq/client.py", line 167, in request
query=parsed_url.query + tuple(_coerced_query_params(params))
File "/Users/jerith/.virtualenvs/tmp-a84c01591caa48e/lib/python3.8/site-packages/treq/client.py", line 351, in _coerced_query_params
value = value.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
I didn't realize that arbitrary bytes were ever supported as part of params. I don't think that's really desirable, but I didn't mean to break it. I will have to add test coverage for this.
It looks like there is a similar issue if an oddly-encoded querystring is passed as part of the url parameter. You'll get a UnicodeDecodeError when passing a URL like:
It looks like we can avoid this by passing lazy=True when decoding the URL. Then Hyperlink won't try to decode the URL-encoded segments. treq should definitely do this. That doesn't help with params, though.
Unfortunately, #265 (comment) isn't the solution either. Decoding UTF-32 bytes to unicode using charmap (latin-1) and then encoding as UTF-8 gives mojibake rather than the same UTF-32 bytes.
I see two options:
Rewrite treq's URL-generation routines to use the low-level hyperlink.URL rather than hyperlink.DecodedURL.
Make hyperlink.DecodedURL accept bytes in addition to text.
Could you provide a little color on what you're trying to do to help with this?
I work with a wide variety of third-party messaging APIs, many of which are poorly designed and require all the message fields to be in url query parameters. This is fine when we're sending English-language text with no special characters (which fits into 7-bit ASCII), but can be a problem for languages like Swahili (often written in Arabic script) or Amharic when the API we're talking to expects UTF-16 or some weirdly-packed GSM encoding.
Starting from 20.4.1, I can no longer make requests with weirdly-encoded query parameters.
Here's a small example program that makes such a request:
When run with treq 20.3.0:
And with treq 20.4.1:
#265 (comment) seems like relevant context here.
The text was updated successfully, but these errors were encountered: