Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for 'Failed to parse headers' warning #1439

Merged
merged 9 commits into from Sep 19, 2018
3 changes: 3 additions & 0 deletions CHANGES.rst
Expand Up @@ -13,6 +13,9 @@ dev (master)

* Drop support for EOL Python 2.6 (Pull #1429 and Pull #1430)

* Fixed bug where responses with header Content-Type: message/* erroneously
raised HeaderParsingError, resulting in a warning being logged. (Pull #1439)

* ... [Short description of non-trivial change.] (Issue #)


Expand Down
3 changes: 3 additions & 0 deletions CONTRIBUTORS.txt
Expand Up @@ -266,6 +266,9 @@ In chronological order:
* Hugo van Kemenade <https://github.com/hugovk>
* Drop support for EOL Python 2.6

* Tim Bell <https://github.com/timb07>
* Bugfix for responses with Content-Type: message/* logging warnings

* Justin Bramley <https://github.com/jbramleycl>
* Add ability to handle multiple Content-Encodings

Expand Down
10 changes: 8 additions & 2 deletions src/urllib3/util/response.py
Expand Up @@ -59,8 +59,14 @@ def assert_header_parsing(headers):
get_payload = getattr(headers, 'get_payload', None)

unparsed_data = None
if get_payload: # Platform-specific: Python 3.
unparsed_data = get_payload()
if get_payload:
# get_payload is actually email.message.Message.get_payload;
# we're only interested in the result if it's not a multipart message
if not headers.is_multipart():
payload = get_payload()

if isinstance(payload, (bytes, str)):
unparsed_data = payload

if defects or unparsed_data:
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
Expand Down
42 changes: 34 additions & 8 deletions test/with_dummyserver/test_socketlevel.py
Expand Up @@ -1289,12 +1289,12 @@ def socket_handler(listener):
)
class TestBrokenHeaders(SocketDummyServerTestCase):

def _test_broken_header_parsing(self, headers):
def _test_broken_header_parsing(self, headers, unparsed_data_check=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these test cases changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the addition of unparsed_data_check specifically?

I looked at the various Stackoverflow questions where this had come up, and realised we already had a test case where unparsed_data was non-empty, so I thought it would be good to check for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more the input data, it looks like you've changed the number of CRLF within the test case, what was the motivation for those changes? I'd be more comfortable adding a new testcase for unparsed_data than changing these existing ones.

Copy link
Contributor Author

@timb07 timb07 Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the changes in the values of headers passed into _test_broken_header_parsing() and the subsequent generation of the value to pass into self.start_response_handler(), here's what two of the test cases were actually checking:

b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nContent-type: text/plain\r\n: Value\r\n\r\nAnother: Header\r\n\r\n'
b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nContent-type: text/plain\r\n:\r\n\r\nAnother: Header\r\n\r\n'

Looking in more detail, the '\r\n' at the end of each bytestring in this call:

        self._test_broken_header_parsing([
            b': Value\r\n',
            b'Another: Header\r\n',
        ])

combined with b'\r\n'.join(headers) in _test_broken_header_parsing() result in duplicate CRLFs between each of the headers supplied in the method call.

The duplicate CRLF before Another: Header meant that Another: Header wasn't a part of the headers anymore, and wasn't being checked. In all the present test cases that didn't affect the result of the test, but that was more by coincidence. For instance, in the example above, if the order of the two bytestrings in the list was reversed, the test would fail, that is, no warning would be logged.

The best fix seemed to me to remove the CRLFs from the test bytestrings, and ensure the headers are terminated with an explicit CRLFCRLF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's next for this PR? Are you happy with my explanations, or would you prefer me to rework the tests?

self.start_response_handler((
b'HTTP/1.1 200 OK\r\n'
b'Content-Length: 0\r\n'
b'Content-type: text/plain\r\n'
) + b'\r\n'.join(headers) + b'\r\n'
) + b'\r\n'.join(headers) + b'\r\n\r\n'
)

pool = HTTPConnectionPool(self.host, self.port, retries=False)
Expand All @@ -1306,26 +1306,52 @@ def _test_broken_header_parsing(self, headers):
for record in logs:
if 'Failed to parse headers' in record.msg and \
pool._absolute_url('/') == record.args[0]:
return
if unparsed_data_check is None or unparsed_data_check in record.getMessage():
return
self.fail('Missing log about unparsed headers')

def test_header_without_name(self):
self._test_broken_header_parsing([
b': Value\r\n',
b'Another: Header\r\n',
b': Value',
b'Another: Header',
])

def test_header_without_name_or_value(self):
self._test_broken_header_parsing([
b':\r\n',
b'Another: Header\r\n',
b':',
b'Another: Header',
])

def test_header_without_colon_or_value(self):
self._test_broken_header_parsing([
b'Broken Header',
b'Another: Header',
])
], 'Broken Header')


class TestHeaderParsingContentType(SocketDummyServerTestCase):

def _test_okay_header_parsing(self, header):
self.start_response_handler((
b'HTTP/1.1 200 OK\r\n'
b'Content-Length: 0\r\n'
) + header + b'\r\n\r\n'
)

pool = HTTPConnectionPool(self.host, self.port, retries=False)
self.addCleanup(pool.close)

with LogRecorder() as logs:
pool.request('GET', '/')

for record in logs:
assert 'Failed to parse headers' not in record.msg

def test_header_text_plain(self):
self._test_okay_header_parsing(b'Content-type: text/plain')

def test_header_message_rfc822(self):
self._test_okay_header_parsing(b'Content-type: message/rfc822')


class TestHEAD(SocketDummyServerTestCase):
Expand Down