wsgi: Work around CPython bug when parsing non-ASCII headers #574

Under CPython 3, when an out-of-spec client sends non-ascii header *names*, email.feedparser halts parsing and assumes that the non-ASCII header must be part of a message body. This is despite the fact that http.client's parse_headers (which is called by the server protocol's parse_request) has already determined the boundary between headers and body and has *only* sent the headers to be parsed. This causes the first such header and *all* subsequent headers to be silently ignored. See also: https://bugs.python.org/issue37093 Under CPython 2, httplib would happily parse non-ASCII headers so long as there was a colon in the header line. As a result, py2 applications may have been written that not only allowed but even encouraged the use of UTF-8 in user-defined header names and values. Support such applications in moving to py3 by checking for a payload on the parsed headers; if found, parse it for more headers. A few things worth pointing out about this: - The parsing does not handle line folding, but our code didn't handle this well on py2 either. Abort parsing. - Header lines without a colon will also abort parsing, but this is maybe preferable to py2's behavior where the offending line is interpreted as the separator between headers and body and is silently discarded, and the request is allowed to continue. At least on py3, the body will start after the first blank line rather than part way through the (bad) headers. - Building a WSGI environment normally involves upper-casing the header names, which should be safe due to their case-insensitivity, but it gets more complicated when considering non-ASCII headers: * While WSGI requires that the header names and values be interpreted as Latin-1 on py3, that isn't necessarily the encoding preferred by the application. * Even if the application wants Latin-1, upper-casing some Latin-1-encodable code points yields a code point that is not Latin-1-encodable, and so should not be used in a WSGI environment. So, preserve the existing py2 behavior on py3: Only upper-case 'a'-'z' Drive-by: Be more explicit about when we're branching because of py2/3 differences so when we eventiually drop support for py2, we can remove the old path with confidence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wsgi: Work around CPython bug when parsing non-ASCII headers #574

wsgi: Work around CPython bug when parsing non-ASCII headers #574

Commits on Jun 4, 2019

wsgi: Work around CPython bug when parsing non-ASCII headers #574

Are you sure you want to change the base?

wsgi: Work around CPython bug when parsing non-ASCII headers #574

Commits on Jun 4, 2019