check `max_content_length` consistently #2620

davidism · 2023-03-14T17:06:48Z

max_content_length is now checked consistently for anything that accesses request.stream, such as request.data, request.form, and request.json. The check is done in get_input_stream.

If request.max_content_length is set, it is checked immediately when accessing request.stream, and raises a 413 Content Too Large error (RequestEntityTooLarge). If Content-Length is not set (a streaming request), max_content_length is used in LimitedStream so that it may raise once the limit is met.

LimitedStream now implements io.RawIOBase instead of the even more basic io.IOBase. Only readinto needs to be implemented, read, readline, and readlines have default C implementations based on that. The default readall required an extra read to check for EOF, so that had to be implemented as well.

LimitedStream has a new is_max parameter to distinguish whether the limit is a client-reported Content-Length or an application-set max_content_length. If the limit is a maximum, reading from an exhausted stream raises a 413 error instead of returning empty. Reading empty does not raise a ClientDisconnected error, since there's no way to tell if the client stopped sending because the stream was done or because it really disconnected.

The readlines implementation handles the size parameter correctly now. It reads at least one line, but stops reading more lines if the size was reached while reading the last line. Therefore, more than size bytes may be read. It is still safe, since each read goes through readinto, so it will still be limited.

fixes #1513
fixes #2620
fixes #690

It is not possible for the WSGI application to safely exhaust the input stream. It's up to the WSGI/HTTP server which handles the socket, or the client application which handles what happens when the server closes the input stream. Therefore it's expected that the client might show a "connection reset" failure rather than a 413 error message.

All "exhaust input" handling is removed from the application side of Werkzeug. It now always raises an error once Content-Length or max_content_length is reached, and will not read past that. However, the development server will now continue to read up to 10GB, 10MB at a time, up to 1000 reads, for 10 seconds. This allows the client to see the 413 error in all but extreme cases. The development server is already not appropriate for production, so adding this does not seem unreasonable.

The server is still not able to handle HTTP/1.1 keep-alive connections. It was too complicated to handle exhausting the input while not accidentally reading the next request line.

I really think we should take all these things we've been adding on top of http.server and push them back upstream, if anyone wants to work on that.

davidism added this to the 2.3.0 milestone Mar 14, 2023

check max_content_length consistently

4f7048e

davidism force-pushed the max_content_length branch from 40d5cc7 to 4f7048e Compare March 14, 2023 17:09

davidism merged commit 77c420b into main Mar 14, 2023

davidism deleted the max_content_length branch March 14, 2023 17:42

github-actions bot locked as resolved and limited conversation to collaborators Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check `max_content_length` consistently #2620

check `max_content_length` consistently #2620

davidism commented Mar 14, 2023 •

edited

check max_content_length consistently #2620

check max_content_length consistently #2620

Conversation

davidism commented Mar 14, 2023 • edited

check `max_content_length` consistently #2620

check `max_content_length` consistently #2620

davidism commented Mar 14, 2023 •

edited