Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when passing a non-ascii string in "data" #3875

Closed
jamshid opened this issue Feb 15, 2017 · 2 comments
Closed

UnicodeEncodeError when passing a non-ascii string in "data" #3875

jamshid opened this issue Feb 15, 2017 · 2 comments

Comments

@jamshid
Copy link

jamshid commented Feb 15, 2017

Using requests 2.13.0 I can't seem to pass a non-ascii value as "data" on a PUT.

This might be a duplicate of issue https://github.com/kennethreitz/requests/issues/2638.

How should I send a unicode string in a request body? Doing an .encode('utf-8') seems to workaround the problem in first example, but that still fails in the second.

>>>AUTH = requests.auth.HTTPBasicAuth("john", "password")
>>>requests.put('http://localhost:8084/',headers={'Content-type':'text/plain; charset=utf-8'}, data=u'\u24B6\u24B7\u24B8\u24B9',auth=AUTH)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 124, in put
    return request('put', url, data=data, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib64/python2.7/httplib.py", line 1020, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1054, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1016, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 871, in _send_output
    self.send(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 840, in send
    self.sock.sendall(data)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Btw the same PUT with AWS S3 auth (V2 or V4) has similar error but different stack (I added the "!!!" prints for debugging):

>>> AUTH=AWS4Auth('john', 'password', 'eu-west-1', 's3')
>>> requests.put('http://localhost:8084/',headers={'Content-type':'text/plain'}, data=u'\u24B6\u24B7\u24B8\u24B9',auth=AUTH)
!!!1 u'PUT / HTTP/1.1\r\nHost: localhost:8084\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.13.0\r\ncontent-type: text/plain; charset=utf-8\r\nContent-Length: 12\r\nx-amz-date: 20170215T023125Z\r\nx-amz-content-sha256: 7ec37a06579472c0743b58bd45af589cca817f65bbd8c6e528bc5e3092166396\r\nAuthorization: AWS4-HMAC-SHA256 Credential=john/20170215/eu-west-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=e83f7e17d7c6e25a940965962df32e9c690681b158f6d9ad9f484bf9c09bb963\r\n\r\n'
!!!2 '\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9'
!!!3 u'PUT / HTTP/1.1\r\nHost: localhost:8084\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.13.0\r\ncontent-type: text/plain; charset=utf-8\r\nContent-Length: 12\r\nx-amz-date: 20170215T023125Z\r\nx-amz-content-sha256: 7ec37a06579472c0743b58bd45af589cca817f65bbd8c6e528bc5e3092166396\r\nAuthorization: AWS4-HMAC-SHA256 Credential=john/20170215/eu-west-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=e83f7e17d7c6e25a940965962df32e9c690681b158f6d9ad9f484bf9c09bb963\r\n\r\n\u24b6\u24b7\u24b8\u24b9'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 124, in put
    return request('put', url, data=data, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib64/python2.7/httplib.py", line 1020, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1054, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1016, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 865, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
@nateprewitt
Copy link
Member

Hey @jamshid, I believe this is a problem with how you're using unicode in Python 2.7 rather than something requests specific. The repro below fails with the same exception without any requests code. If you pass properly encoded utf-8, it should solve this issue.

with open('testfile.out', 'w') as f:
    f.write(u'\u24B6\u24B7\u24B8\u24B9')

As for why it's failing on the second, that exception is raised when you try to encode the utf-8 encoded string a second time. It looks like AWS4Auth is doing that for you here.

All of this points to a misuse of unicode in Python 2 and possibly a bug with how AWS4Auth interacts with unicode bodies. I don't believe there's much to be done here in requests.

@jamshid
Copy link
Author

jamshid commented Feb 15, 2017

Thanks, @nateprewitt! Turns out AWS4Auth is adding headers as unicode strings, but Python 2.7 httplib.py or requests do not expect that and fail when concatenating the (binary?) body string. There's an easy fix to AWS4Auth for it:
tedder/requests-aws4auth#29

@jamshid jamshid closed this as completed Feb 15, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants