Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError when using webob in middleware #1351

Open
danilito19 opened this issue Sep 23, 2021 · 1 comment
Open

UnicodeEncodeError when using webob in middleware #1351

danilito19 opened this issue Sep 23, 2021 · 1 comment

Comments

@danilito19
Copy link

danilito19 commented Sep 23, 2021

I have a wsgi-based middleware for my bottle app

import os
import sys
from bottle import Bottle, run, redirect
import webob # WebOb==1.8.7

app = Bottle(__name__)


@app.route("/")
def index():
    redirect("/home")

class VeryBasicMiddleware:
    def __init__(self, wsgi_app, app_name="WSGI Application"):
        self.app_name = app_name
        self.wsgi_app = wsgi_app

    def __call__(self, environ, start_response):
        # I do some middleware work here

        my_request = webob.BaseRequest(environ)
        # call request.path here is fine
        my_request.path
        response = webob.Request(environ).get_response(self.wsgi_app)

        # call request.path here gives UnicodeEncodeError
        my_request.path

        # do more middleware work here

        return self.wsgi_app(environ, start_response)


app = VeryBasicMiddleware(app)

if __name__ == "__main__":
    host = sys.argv[1] if len(sys.argv) > 1 else "localhost"
    port = int(sys.argv[2]) if len(sys.argv) > 2 else 8000
    run(app=app, host=host, port=port)

As you can see this middleware relies on webob.

If a request with some weird chars is made, for example curl http://localhost:8000/full/assets/pdf/1394002_161121%20%E2%91%A0IFA%20ver%20%2012.2_SH0202_PDF%20version.pdf

It receives the following error:

Traceback (most recent call last):
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "apps/bottle_app.py", line 22, in __call__
    return self.app(environ, start_response)
  File "apps/bottle_app.py", line 42, in __call__
    my_request.path
  File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/request.py", line 476, in path
    bpath = bytes_(self.path_info, self.url_encoding)
  File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/descriptors.py", line 70, in fget
    return req.encget(key, encattr=encattr)
  File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/request.py", line 165, in encget
    return bytes_(val, 'latin-1').decode(encoding)
  File "/Users/dani/vulnenv/lib/python3.8/site-packages/webob/compat.py", line 33, in bytes_
    return s.encode(encoding, errors)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2460' in position 32: ordinal not in range(256)

The error does happen in webob, but I dug deep into each of the calls, and found that the issue is that in my middleware's call webob.Request(environ).get_response(self.wsgi_app), when it reaches the bottle.py stack, bottle is changing the value of environ['PATH_INFO'], here:

    def _handle(self, environ):
        path = environ['bottle.raw_path'] = environ['PATH_INFO']
        
        if py3k:
            try:
                environ['PATH_INFO'] = path.encode('latin1').decode('utf8')
            except UnicodeError:
                return HTTPError(400, 'Invalid path string. Expected UTF-8')

I have temporarily solved by problem by storing the original path in my middleware, like so

    def __call__(self, environ, start_response):
        # self.orig_path_info = environ.get("PATH_INFO")

        # I do some work here

        my_request = webob.BaseRequest(environ)
        # call request.path here is fine
        my_request.path
        response = webob.Request(environ).get_response(self.wsgi_app)

        # restore path info so call to `my_request.path` works
        environ["PATH_INFO"] = self.orig_path_info
        # call request.path now works
        my_request.path

        # do more work here

        return self.wsgi_app(environ, start_response)

But this seems like a subpar solution. Is this a bug? It could very well be a webob issue, I'm aware, but I actually have an identical Flask framework application with an identical middleware and that framework does not have this issue.

@defnull
Copy link
Member

defnull commented Sep 24, 2021

Some values (e.g. headers or the path string) are bytes on the HTTP layer with no encoding information attached, but the application needs these as str, which is unicode in Python 3. The WSGI spec uses latin-1 to decode these bytestrings, which is a wrong in most cases, but at least latin-1 can be reversed without loosing information. Bottle then re-interprets these as utf-8 because that is what all modern browsers do.

Bottle is not a middleware, so it assumes it has authority over the environ dictionary and stores the re-encoded value back into the environ dict. This could be done better in Bottle (e.g. storing the re-encoded value in a special key instead of overwriting PATH_INFO) but it is more undefined behavior than a bug, really. You should make a copy of the environ dictionary before passing it to bottle if you want to still use it afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants