Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsgi: Work around CPython bug when parsing non-ASCII headers #574

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Jun 4, 2019

  1. wsgi: Work around CPython bug when parsing non-ASCII headers

    Under CPython 3, when an out-of-spec client sends non-ascii header
    *names*, email.feedparser halts parsing and assumes that the non-ASCII
    header must be part of a message body. This is despite the fact that
    http.client's parse_headers (which is called by the server protocol's
    parse_request) has already determined the boundary between headers and
    body and has *only* sent the headers to be parsed. This causes the first
    such header and *all* subsequent headers to be silently ignored.
    
    See also: https://bugs.python.org/issue37093
    
    Under CPython 2, httplib would happily parse non-ASCII headers so long
    as there was a colon in the header line. As a result, py2 applications
    may have been written that not only allowed but even encouraged the use
    of UTF-8 in user-defined header names and values.
    
    Support such applications in moving to py3 by checking for a payload on
    the parsed headers; if found, parse it for more headers. A few things
    worth pointing out about this:
    
    - The parsing does not handle line folding, but our code didn't handle
      this well on py2 either. Abort parsing.
    - Header lines without a colon will also abort parsing, but this is
      maybe preferable to py2's behavior where the offending line is
      interpreted as the separator between headers and body and is silently
      discarded, and the request is allowed to continue. At least on py3,
      the body will start after the first blank line rather than part way
      through the (bad) headers.
    - Building a WSGI environment normally involves upper-casing the header
      names, which should be safe due to their case-insensitivity, but it
      gets more complicated when considering non-ASCII headers:
      * While WSGI requires that the header names and values be interpreted
        as Latin-1 on py3, that isn't necessarily the encoding preferred by
        the application.
      * Even if the application wants Latin-1, upper-casing some
        Latin-1-encodable code points yields a code point that is not
        Latin-1-encodable, and so should not be used in a WSGI environment.
      So, preserve the existing py2 behavior on py3: Only upper-case 'a'-'z'
    
    Drive-by: Be more explicit about when we're branching because of py2/3
    differences so when we eventiually drop support for py2, we can remove
    the old path with confidence.
    tipabu committed Jun 4, 2019
    Configuration menu
    Copy the full SHA
    944d50a View commit details
    Browse the repository at this point in the history