Under CPython 3, when an out-of-spec client sends non-ascii header
*names*, email.feedparser halts parsing and assumes that the non-ASCII
header must be part of a message body. This is despite the fact that
http.client's parse_headers (which is called by the server protocol's
parse_request) has already determined the boundary between headers and
body and has *only* sent the headers to be parsed. This causes the first
such header and *all* subsequent headers to be silently ignored.
See also: https://bugs.python.org/issue37093
Under CPython 2, httplib would happily parse non-ASCII headers so long
as there was a colon in the header line. As a result, py2 applications
may have been written that not only allowed but even encouraged the use
of UTF-8 in user-defined header names and values.
Support such applications in moving to py3 by checking for a payload on
the parsed headers; if found, parse it for more headers. A few things
worth pointing out about this:
- The parsing does not handle line folding, but our code didn't handle
this well on py2 either. Abort parsing.
- Header lines without a colon will also abort parsing, but this is
maybe preferable to py2's behavior where the offending line is
interpreted as the separator between headers and body and is silently
discarded, and the request is allowed to continue. At least on py3,
the body will start after the first blank line rather than part way
through the (bad) headers.
- Building a WSGI environment normally involves upper-casing the header
names, which should be safe due to their case-insensitivity, but it
gets more complicated when considering non-ASCII headers:
* While WSGI requires that the header names and values be interpreted
as Latin-1 on py3, that isn't necessarily the encoding preferred by
the application.
* Even if the application wants Latin-1, upper-casing some
Latin-1-encodable code points yields a code point that is not
Latin-1-encodable, and so should not be used in a WSGI environment.
So, preserve the existing py2 behavior on py3: Only upper-case 'a'-'z'
Drive-by: Be more explicit about when we're branching because of py2/3
differences so when we eventiually drop support for py2, we can remove
the old path with confidence.