Customize max_line_size and max_field_size in HttpParser? #2988

alexandru · 2018-05-10T08:57:05Z

We're using aiohttp 2.x with Python 3.5.2 and trying to upgrade to the latest aiohttp 3.2.0 with Python 3.6.5.

We're getting errors such as these:

[2018-05-10 08:31:55 +0000] [12471] [ERROR] Error handling request
Traceback (most recent call last):
  File "/project/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 235, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 297, in aiohttp._http_parser.HttpParser.feed_data
  File "aiohttp/_http_parser.pyx", line 425, in aiohttp._http_parser.cb_on_header_value
aiohttp.http_exceptions.LineTooLong: 400, message='Got more than 8190 bytes (9383) when reading Header value is too long.'

This was happening in aiohttp 2.x as well, because I believe the standard specifies that length as the maximum, however the real world doesn't always agree with standards 😒

To fix this, with aiohttp 2.x we were monkey-patching HttpParser in order to override max_line_size and max_field_size. Here's the horrible things we've been doing: gist — this file being imported in our gunicorn_worker.py.

Now that we've upgraded to 3.x, this monkey patching no longer works. Don't know why.

Any way to customize max_line_size and max_field_size?
Possibly without monkey patching, but not necessarily, atm I just want the thing to work?

Environment:

Python 3.6.5
aiohttp 3.2.0
Gunicorn 19.8.1 (via aiohttp.worker.GunicornWebWorker)

See the full requirements.txt if interested.

Cheers,

The text was updated successfully, but these errors were encountered:

asvetlov · 2018-05-10T09:33:17Z

Nobody implemented it yet.
We have the very similar issue in backlog: #2304

pposca · 2018-05-26T22:26:34Z

Where would be the best place to expose max_line_size and max_field_size ? Using an Application init param, like with client_max_size?

asvetlov · 2018-05-31T12:16:03Z

@pposca let's push params into BaseRequest first.
After that, we can think how to configure it the best way.
The library has several configuration parameters, would be nice to have a generalized solution for this.
The objection to Application.__init__ is that the class is used for sub-applications too, but parameters like max_line_size make sense only for root application.
Maybe we can live with it, or perhaps we will end up with config parameter which should be None for any sub-app. I don't know, let's discuss it later.

pposca · 2018-05-31T18:55:27Z

@asvetlov OK. There is also a max_headers param that seems related with max_line_size and max_field_size. Shouldn't it be added as well? It has sense for me and it would be free.

asvetlov · 2018-05-31T19:13:56Z

Agree, please support max_headers as well

EvgenyKhaliper · 2018-06-18T09:34:21Z

Guys, any progress with it ?

asvetlov · 2018-06-18T09:40:32Z

The issue is waiting for a champion

alexpantyukhin · 2018-06-19T08:01:20Z

I looked at it a little and found there is such chain Application._make_handler -> Server.call -> RequestHandler -> HttpRequestParser . _make_handler already takes some parameters with kwargs. Maybe it possible to merge kwargs with some config dictionary which are passed into __init__.

bmbouter · 2020-04-08T18:41:32Z

Our project is affected by the issue, and I'd like to open a PR to resolve it. Since 2018 I believe the need is no longer on Application._make_handler and instead to be configured on the ApplicationRunner created here instead. Do you agree?

Rather than adding options one-by-one, is there a way we can have many options specified by an application running with GunicornWebWorker? Here are some ideas:

Idea 1: A subclass pass extra kwargs to `ApplicationRunner`

To do this we could add an internal method named, e.g. _get_app_runner_instance(self) which return a configured instance of ApplicationRunner.

diff --git a/aiohttp/worker.py b/aiohttp/worker.py
index 73ba6e38..21fb671d 100644
--- a/aiohttp/worker.py
+++ b/aiohttp/worker.py
@@ -64,6 +64,17 @@ class GunicornWebWorker(base.Worker):
 
         sys.exit(self.exit_code)
 
+    def _get_application_runner_instance(self, app, **kwargs):
+        access_log = self.log.access_log if self.cfg.accesslog else None
+        runner = web.AppRunner(app,
+                               logger=self.log,
+                               keepalive_timeout=self.cfg.keepalive,
+                               access_log=access_log,
+                               access_log_format=self._get_valid_log_format(
+                                   self.cfg.access_log_format),
+                               **kwargs)
+        return runner
+
     async def _run(self) -> None:
         if isinstance(self.wsgi, Application):
             app = self.wsgi
@@ -73,13 +84,8 @@ class GunicornWebWorker(base.Worker):
             raise RuntimeError("wsgi app should be either Application or "
                                "async function returning Application, got {}"
                                .format(self.wsgi))
-        access_log = self.log.access_log if self.cfg.accesslog else None
-        runner = web.AppRunner(app,
-                               logger=self.log,
-                               keepalive_timeout=self.cfg.keepalive,
-                               access_log=access_log,
-                               access_log_format=self._get_valid_log_format(
-                                   self.cfg.access_log_format))
+
+        runner = self._get_application_runner_instance(app)
         await runner.setup()
 
         ctx = self._create_ssl_context(self.cfg) if self.cfg.is_ssl else None```

To allow GunicornWebWorker add additional "default" config options in the future without those config options also being added to every subclass there ever was, subclasses should use super() to inject the kwargs. Here would be an example of a subclass of CustomGunicornWebWorker that overrides max_line_size and max_field_size:

class CustomGunicornWebWorker(GunicornWebWorker):

    def _get_application_runner_instance(self, app):
        extra_kwargs = {"max_line_size" = 16000, "max_field_size": 16000}
        return super()._get_application_runner_instance(app, **extra_kwargs)

Idea 2: Have GunicornWebWorker accept a new `application_runner_kwargs` somehow

The application GunicornWebWorker patch to apply the patch (below) would be straightforward, but where does application_runner_kwargs come from?

diff --git a/aiohttp/worker.py b/aiohttp/worker.py
index 73ba6e38..b2c7194f 100644
--- a/aiohttp/worker.py
+++ b/aiohttp/worker.py
@@ -79,7 +79,8 @@ class GunicornWebWorker(base.Worker):
                                keepalive_timeout=self.cfg.keepalive,
                                access_log=access_log,
                                access_log_format=self._get_valid_log_format(
-                                   self.cfg.access_log_format))
+                                   self.cfg.access_log_format),
+                               **application_runner_kwargs)
         await runner.setup()
 
         ctx = self._create_ssl_context(self.cfg) if self.cfg.is_ssl else None

These changes allow the configuration of web.AppRunner object inside application code. This makes possible to use custom configs of Aiohttp server (like 'max_line_size' parameters setup) in pair with Gunicorn. Associated issue: aio-libs#2988 (reference)

kylemacfarlane · 2021-08-24T21:53:47Z

I just ran into this when using aiohttp as a client to read from a server which is returning a gigantic CSP header.

The source of the problem seems to be here where a hardcoded non-customisable parser is used:

aiohttp/aiohttp/client_proto.py

Lines 159 to 168 in 43c8168

    
           self._parser = HttpResponseParser( 
        
               self, 
        
               self._loop, 
        
               read_bufsize, 
        
               timer=timer, 
        
               payload_exception=ClientPayloadError, 
        
               response_with_body=not skip_payload, 
        
               read_until_eof=read_until_eof, 
        
               auto_decompress=auto_decompress, 
        
           )

I can't figure out how to get a override this method properly. It seems like you need a custom connector class providing a custom connection class providing a custom protocol which finally provides a custom parser, but I can't figure it out.

Monkey patching HttpResponseParser directly doesn't work, perhaps because of the C extensions.

In the end this monkey patch works but I hate it:

from typing import Optional

from aiohttp.client_exceptions import ClientPayloadError
from aiohttp.client_proto import ResponseHandler
from aiohttp.helpers import BaseTimerContext
from aiohttp.http import HttpResponseParser


def set_response_params(
    self,
    *,
    timer: Optional[BaseTimerContext] = None,
    skip_payload: bool = False,
    read_until_eof: bool = False,
    auto_decompress: bool = True,
    read_timeout: Optional[float] = None,
    read_bufsize: int = 2 ** 16
) -> None:
    self._skip_payload = skip_payload

    self._read_timeout = read_timeout
    self._reschedule_timeout()

    self._parser = HttpResponseParser(
        self,
        self._loop,
        read_bufsize,
        timer=timer,
        payload_exception=ClientPayloadError,
        response_with_body=not skip_payload,
        read_until_eof=read_until_eof,
        auto_decompress=auto_decompress,
        # These three lines are the important change
        max_line_size=8190 * 4,
        max_headers=32768 * 4,
        max_field_size=8190 * 4
    )

    if self._tail:
        data, self._tail = self._tail, b""
        self.data_received(data)


ResponseHandler.set_response_params = set_response_params

bmbouter mentioned this issue Apr 27, 2020

GunicornWebWorker subclasses can config AppRunner #4720

Closed

5 tasks

ClericPy mentioned this issue Mar 17, 2021

Cannot easily access max_field_size in client response #2304

Open

loven-doo mentioned this issue Apr 12, 2021

coroutine is allowed to return web.AppRunner #5611

Merged

5 tasks

patchback bot mentioned this issue Oct 28, 2021

[PR #5611/9a1c8623 backport][3.8] coroutine is allowed to return web.AppRunner #6150

Merged

5 tasks

starflows mentioned this issue Apr 27, 2022

#2304 add max_line_size, max_headers and max_field_size arguments to ClientSession #6720

Closed

5 tasks

starflows mentioned this issue May 9, 2022

Client session new args max_line_size and max_field_size #6744

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customize max_line_size and max_field_size in HttpParser? #2988

Customize max_line_size and max_field_size in HttpParser? #2988

alexandru commented May 10, 2018 •

edited

asvetlov commented May 10, 2018

pposca commented May 26, 2018

asvetlov commented May 31, 2018

pposca commented May 31, 2018 •

edited

asvetlov commented May 31, 2018

EvgenyKhaliper commented Jun 18, 2018

asvetlov commented Jun 18, 2018

alexpantyukhin commented Jun 19, 2018

bmbouter commented Apr 8, 2020

kylemacfarlane commented Aug 24, 2021

Customize max_line_size and max_field_size in HttpParser? #2988

Customize max_line_size and max_field_size in HttpParser? #2988

Comments

alexandru commented May 10, 2018 • edited

asvetlov commented May 10, 2018

pposca commented May 26, 2018

asvetlov commented May 31, 2018

pposca commented May 31, 2018 • edited

asvetlov commented May 31, 2018

EvgenyKhaliper commented Jun 18, 2018

asvetlov commented Jun 18, 2018

alexpantyukhin commented Jun 19, 2018

bmbouter commented Apr 8, 2020

Idea 1: A subclass pass extra kwargs to ApplicationRunner

Idea 2: Have GunicornWebWorker accept a new application_runner_kwargs somehow

kylemacfarlane commented Aug 24, 2021

alexandru commented May 10, 2018 •

edited

pposca commented May 31, 2018 •

edited

Idea 1: A subclass pass extra kwargs to `ApplicationRunner`

Idea 2: Have GunicornWebWorker accept a new `application_runner_kwargs` somehow