Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when passing a non-ascii string in "data" #29

Open
jamshid opened this issue Feb 15, 2017 · 2 comments
Open

UnicodeEncodeError when passing a non-ascii string in "data" #29

jamshid opened this issue Feb 15, 2017 · 2 comments

Comments

@jamshid
Copy link

jamshid commented Feb 15, 2017

Sending a non-ascii request body using Python 2.7 fails when using requests-aws4auth. I thought it was a general requests bug at first (https://github.com/kennethreitz/requests/issues/3875) but it only happens with requests-aws4auth. I'm seeing this on Python 2.7.5 on centos 7.2 and macOS.

After some debugging, it seems to be triggered by string literals being forced to "unicode" in /usr/lib/python2.7/site-packages/requests_aws4auth/aws4auth.py.

from __future__ import unicode_literals

FIX/WORKAROUND: comment out that line.


The problem is requests doesn't seem to expect the HTTP request headers to contain unicode strings. Python 2.7 "unicode+str" weirdness causes request_headers + request_body to fail because request_body is already a binary(?) string.

Btw I don't think aws4auth should be doing an .encode('utf-8') -- it should already be "bytes", right? At least HTTPBasicAuth and S3Auth expect the client calling requests.put() to pass data already encoded to utf-8 bytes.

Finally, maybe this is still a bug in requests or python httplib.py? Should it allow unicode string headers, containing only ascii (or iso-8859-1?), and /usr/lib64/python2.7/httplib.py _send_output() should force msg to str before appending the request body?


Reproduction:

>>> import requests
>>> requests.__version__
'2.13.0'
>>> import requests_aws4auth
>>> requests_aws4auth.__version__
'0.9'
>>> AUTH=requests_aws4auth.AWS4Auth('testkey', 'secret', 'eu-west-1', 's3')
>>> requests.put('http://example.com/',headers={'Content-type':'text/plain; charset="UTF-8"'}, data=u'\u24B6\u24B7\u24B8\u24B9'.encode('utf-8'),auth=AUTH)

That should work, and it does when using requests.auth.HTTPBasicAuth or S3 V2 signature package awsauth.S3Auth. But requests-aws4auth gets exception:

>>> requests.put('http://example.com/',headers={'Content-type':'text/plain; charset="UTF-8"'}, data=u'\u24B6\u24B7\u24B8\u24B9'.encode('utf-8'),auth=AUTH)
!!!1 u'PUT / HTTP/1.1\r\nHost: example.com\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.13.0\r\nContent-type: text/plain; charset="UTF-8"\r\nContent-Length: 12\r\nx-amz-date: 20170215T040027Z\r\nx-amz-content-sha256: 7ec37a06579472c0743b58bd45af589cca817f65bbd8c6e528bc5e3092166396\r\nAuthorization: AWS4-HMAC-SHA256 Credential=john/20170215/eu-west-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=833120dd7cbe023d12c8bd24c6a746ba8ebcf8279346c0e58485e56c1a9ab5a5\r\n\r\n'
!!!2 '\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9'
!!!3 u'PUT / HTTP/1.1\r\nHost: example.com\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.13.0\r\nContent-type: text/plain; charset="UTF-8"\r\nContent-Length: 12\r\nx-amz-date: 20170215T040027Z\r\nx-amz-content-sha256: 7ec37a06579472c0743b58bd45af589cca817f65bbd8c6e528bc5e3092166396\r\nAuthorization: AWS4-HMAC-SHA256 Credential=john/20170215/eu-west-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=833120dd7cbe023d12c8bd24c6a746ba8ebcf8279346c0e58485e56c1a9ab5a5\r\n\r\n\u24b6\u24b7\u24b8\u24b9'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 124, in put
    return request('put', url, data=data, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib64/python2.7/httplib.py", line 1020, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1054, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1016, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 865, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

The "!!!" lines are debugging output I added to /usr/lib64/python2.7/httplib.py _send_output()

        if isinstance(message_body, str):
            print('!!!1 '+repr(msg))
            print('!!!2 '+repr(message_body))
            print('!!!3 '+repr(msg + message_body.decode('utf-8')))
            msg += message_body
@reywood
Copy link

reywood commented Mar 11, 2017

Here's a workaround that doesn't involved modifying the requests-aws4auth source code. Use the following wrapper class in place of the AWS4Auth class. It encodes the headers created by AWS4Auth into byte strings thus avoiding the UnicodeDecodeError downstream.

from requests_aws4auth import AWS4Auth

class AWS4AuthEncodingFix(AWS4Auth):
    def __call__(self, request):
        request = super(AWS4AuthEncodingFix, self).__call__(request)

        for header_name in request.headers:
            self._encode_header_to_utf8(request, header_name)

        return request

    def _encode_header_to_utf8(self, request, header_name):
        value = request.headers[header_name]

        if isinstance(value, unicode):
            value = value.encode('utf-8')

        if isinstance(header_name, unicode):
            del request.headers[header_name]
            header_name = header_name.encode('utf-8')

        request.headers[header_name] = value

@akuchling
Copy link

I'm also seeing this bug with requests 2.18.4 (the latest as of today) and requests-aws4auth 0.9 on Python 2.7, when the body of the HTTP request isn't 7-bit-clean ASCII. It looks like requests doesn't expect header names to be Unicode, and at some point it ends up combining the Unicode headers with a UTF-8 encoded body, failing to decode the body with the default 'ascii' encoding.

Another fix would be to remove the from __future__ import unicode_literals declaration, but that's farther-reaching than just encoding the header keys and values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants