Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect headers parsing #5838

Closed
1 task done
mashony opened this issue Jun 28, 2021 · 6 comments
Closed
1 task done

Incorrect headers parsing #5838

mashony opened this issue Jun 28, 2021 · 6 comments
Labels

Comments

@mashony
Copy link

mashony commented Jun 28, 2021

Describe the bug

Faced with issue that if there is newline symbol (\n) in headers aiohttp drops other headers in GET request. Seems like should be validation for custom headers when sending request.
For websocket connection this bug leads to handshake error WSServerHandshakeError.

it's also possible to send extra data after new line which could be parsed by server as extra headers:

"someheader": "some\nsome: header"

this bug starts reproducing from aiohttp==3.6.3; yarl==1.5.1; multidict==4.7.6 and could be reproduced with latest aiohtp version
everything works as expected with aiohttp==3.6.2

To Reproduce

  1. Create simple client/server app (e.g. https://gist.github.com/mashony/cde3175da1e60d34d18cfe3a4669f41e)
  2. Send GET custom headers with \n characters
headers = {
    "SomeOtherHeader": "somecoolstr",
    "CustomHeader": "somebrokenstr\n",
    "AHeader": "somecoolstr",
}
  1. Print server`s request headers
Request.headers {'Host': '0.0.0.0:8080', 'SomeOtherHeader': 'somecoolstr', 'CustomHeader': 'somebrokenstr'}
  1. To reproduce handshake error for websocket connection use custom headers in ws_connect (e.g. https://gist.github.com/mashony/ea8891ffdd6457d18a853f61503e8a2b)

Expected behavior

Do not drop headers after \n characters. That should be parsed as "CustomHerder": "somebrokenstr" like it was so in 3.6.2 version

Or raise validation error or warning message that tells about incorrect characters in headers or filter out such characters on client side

Logs/tracebacks

client

Websocket handshake error:

Traceback (most recent call last):
  File "cli.py", line 32, in <module>
    loop.run_until_complete(main())
  File "python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "cli.py", line 19, in main
    async with session.ws_connect(URL, headers=headers) as ws:
  File "python3.8/site-packages/aiohttp/client.py", line 1014, in __aenter__
    self._resp = await self._coro
  File "python3.8/site-packages/aiohttp/client.py", line 723, in _ws_connect
    resp = await self.request(method, url,
aiohttp.client_exceptions.WSServerHandshakeError: 400, message='Invalid response status', url=URL('ws://0.0.0.0:8080/ws')

server

Traceback (most recent call last):
  File "python3.8/site-packages/aiohttp/web_protocol.py", line 314, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 546, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message="Bad status line 'invalid HTTP method'"

Python Version

$ python --version
Python 3.8.5

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.4.post0
Summary: Async http client/server framework (asyncio)

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 5.1.0
Summary: multidict implementation

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.6.3
Summary: Yet another URL library

OS

Ubuntu 20.04.2 LTS

Related component

Server, Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@mashony mashony added the bug label Jun 28, 2021
@Hanaasagi
Copy link
Member

Hanaasagi commented Jun 29, 2021

Send the request with headers

headers = {
    "SomeOtherHeader": "somecoolstr",
    "CustomHeader": "somebrokenstr\n",
    "AHeader": "somecoolstr",
}

And use the tcpdump to capture the HTTP request data.

Notice this line.

        0x0090:  720a 0d0a 4148 6561 6465 723a 2073 6f6d  r...AHeader:.som

From the ASCII table:

  • 0x72 is r
  • 0x0a is LF
  • 0x0d is CR
  • 0x0a is LF
  • 0x41 is A
  • 0x48 is H

It seems that the server doen't handle CRLF correctly. A combination of CRLFCRLF means that the header ends and the body begins.

4.1 Message Types

HTTP messages consist of requests from client to server and responses from server to client.

   HTTP-message   = Request | Response     ; HTTP/1.1 messages

Request (section 5) and Response (section 6) messages use the generic message format of RFC 822 [9] for transferring entities (the payload of the message). Both types of message consist of a start-line, zero or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

In the interest of robustness, servers SHOULD ignore any empty line(s) received where a Request-Line is expected. In other words, if the server is reading the protocol stream at the beginning of a message and receives a CRLF first, it should ignore the CRLF.

Certain buggy HTTP/1.0 client implementations generate extra CRLF's after a POST request. To restate what is explicitly forbidden by the BNF, an HTTP/1.1 client MUST NOT preface or follow a request with an extra CRLF.

Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4


Test through aiohttp's vendor lib http-parser. Pass the raw HTTP request data

GET / HTTP/1.1\r\nHost: 0.0.0.0:8080\r\nSomeOtherHeader: somecoolstr\r\nCustomHeader: somebrokenstr\n\r\nAHeader: somecoolstr\r\nAccept: */*\r\nAccept-Encoding: gzip, deflate\r\nUser-Agent: Python/3.9 aiohttp/3.7.4\r\n\r\n

图片

@Hanaasagi
Copy link
Member

In RFC 2616:

Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT.

However, this RFC has been obsoleted by RFC 7230 which forbids folding:

Line folding has been deprecated in RFC7230 3.2.4 Field Parsing. A sender MUST NOT generate a message that includes line folding (i.e., that has any field-value that contains a match to the obs-fold rule) unless the message is intended for packaging within the message/http media type.

Reference: https://datatracker.ietf.org/doc/html/rfc7230#section-8.3.1


nodejs/http-parser support line folding. Send request with following headers will get 200 OK.

headers = {
    "SomeOtherHeader": "somecoolstr",
    "CustomHeader": "somebrokenstr\n ",  # tailing space
    "AHeader": "somecoolstr",
}

But I suggest that it is better to use base64 encoding.

@Dreamsorcerer
Copy link
Member

So, should the client side code be raising an error or encoding the headers in this situation?
Seems like the client shouldn't allow sending invalid headers.

@Hanaasagi
Copy link
Member

If check the characters in HTTP request header, it will have an impact on performance. Maybe it should be mentioned in the doc which characters should not be included in header and tell the user how to encode.

@Dreamsorcerer
Copy link
Member

I think this is exactly what dev mode is for:
https://docs.python.org/3/library/devmode.html#devmode

We can just enable the extra checks when run with -X dev.

@asvetlov
Copy link
Member

Duplicate of #4818
Fixed on master and 3.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants