Add support for sendfile() based DATA frames. #236

Lukasa · 2016-06-04T23:12:04Z

Right now there is no support for HTTP/2 using sendfile to send DATA frames. This is potentially inefficient for implementations that are able to send really sizeable DATA frames. Given that sendfile allows users to send fixed length data, it would be nice to provide some way to say "send these bytes, then do a sendfile with this fobj and this length".

@njsmith's h11 library has this capacity, but this relies on the fact that h11 has no internal buffer: each "event" call returns the bytes required for that event directly without writing into a buffer. This allows for the subtle changing of return value in comparison to send, which is not so naturally achievable with h2.

Coming up with a good design here is a bit tricky. It may be that this should be an optional switch on the H2Connection class that affects the return value of data_to_send(), changing it to be an iterable of bytes and sentinel objects where each sentinel object is . Alternatively, we could go further and say that not only does the optional switch need to be enabled but there is also a separate "get the data I need" function that conforms to this new API.

Another possible option that leaps out to me is to have a subclass (eww) or some other type that implements this support as a wrapper around the base H2Connection object.

The final possible option is a dramatic API change. That changes the signature of receive_data to return two values: the events iterable and any bytes that may need to be sent. If we do that, we can then remove hyper-h2's internal buffer and then delegate entirely to the calling code by having all do_event() type functions simply return the data they generate rather than storing it internally. That's a very large API break, but it allows supporting this use-case more cleanly by simply emulating what h11 does.

I'd like opinions from @python-hyper/contributors. Any thoughts on API choices? Is this worth supporting at all?

The text was updated successfully, but these errors were encountered:

Lukasa · 2016-06-04T23:13:59Z

Another option is to have data_to_send() be called repeatedly until exhaustion, and each time it will return either bytes or the sentinel object. That allows for keeping the API change relatively small and isn't entirely absurdly surprising. I'd still want to ensure that sendfile support is explicitly opted in to, because otherwise the API represents a potentially surprising footgun. Essentially, you set a flag to say "I know what I'm doing", and then do it.

Lukasa · 2016-06-04T23:48:22Z

While we're here, we should note the caveats of sendfile in a H2 environment. In particular, we have the following constraints:

by default the maximum frame size is 16,348 bytes (2**14 bytes), and may never be larger than 16,777,215 bytes (2**24 - 1 bytes). This means that we can never sendfile more than ~17MB at once: definitely a decent amount of data, but not an awesome amount.
frequently this may be lowered by the stream and connection flow control window, which defaults to 65535 bytes and can never be larger than 2,147,483,647 bytes.

This somewhat increases the overhead and decreases some of the utility of sendfile.

jchampio · 2016-06-05T00:00:38Z

This somewhat increases the overhead and decreases some of the utility of sendfile.

Agreed. But I think the ability to send large resources from disk without buffering them in Python would be a good thing in and of itself, even if there were no speed/performance improvements.

Lukasa · 2016-06-05T17:01:46Z

The other concern about sendfile, as pointed out by @glyph, is that sendfile does not work when combined with a userspace TLS implementation like OpenSSL (at least at this time). That further limits the utility of this change to HTTP/2 in plaintext.

jchampio · 2016-06-05T17:28:17Z

Ugh... that is an excellent point. The danger of doing my development work in plaintext is that I sometimes forget that browers don't speak it...

So, for now, such a feature would only generally be useful for people who have non-browser clients and don't want/need TLS. Still a use case that exists, but much more niche than I was originally imagining.

Lukasa · 2016-06-06T14:23:59Z

Yup, agreed. I think it's still worth doing, especially as there's the possibility of AF_ALG-based sockets being used in the future which would be compatible with sendfile.

njsmith · 2016-06-06T16:16:28Z

I have to admit that h11's sendfile support is more of a cute trick than anything carefully considered. I am somewhat dubious about whether one can get all the other python overhead low enough for sendfile to actually make any difference (cf Amdahl's law), but would be very interested to hear if anyone tries it. In the mean time h11 mostly just supports sendfile because I realized it would be trivial to do :-)

rbtcollins · 2016-06-06T19:54:22Z

The main attraction for sendfile is zero-copy overhead (which can be gotten other ways), but yeah any crypto will imply at least one read of the memory. I'm extremely dubious that directly supporting sendfile makes sense: I'd much rather make sure we can deliver a zero-spurious-copies guarantee, that is that we won't do anything daft like take slices of the input data - and instead make sure we use memoryviews and the like right up until its handed off to TLS.

njsmith · 2016-06-06T20:42:13Z

I'd much rather make sure we can deliver a zero-spurious-copies guarantee, that is that we won't do anything daft like take slices of the input data - and instead make sure we use memoryviews and the like right up until its handed off to TLS.

So in a sense, this is exactly how h11's sendfile support works. Normally for convenience, h11.Connection.send returns a single flat buffer, which necessarily implies some copying to concatenate framing and payload. But there's also a send_with_data_passthrough method which returns a list of bytes-like objects. (Regular send is just a tiny wrapper that calls this method and then returns b"".join(fragments).) And we guarantee that if you pass in a payload buffer, then it will pass through the machinery and exactly the same object will come out the other side. If you pass in a memory view, then you get out that same memory view with no copying.

Then we also make the guarantee that the only thing we do with payload buffers is call len on them, so you can also pass through opaque handles pointing to data on disk if you want. But that's the easy part: once you have an interface that can support zero copying, then you can also support sendfile.

The problem for h2 is that the data_to_send interface can't do either, because it always flattens data into a single buffer, which in python generally requires two copies (once to append to the buffer, and again to pull stuff out).

glyph · 2016-06-06T20:58:31Z

We used to be very careful about this sort of thing in Twisted, and it was one of the major regressions in Python 3 that memoryview was missing a bunch of necessary features to get the appropriate zero-copy (or, really, one-copy) guarantees that we wanted. I am fairly sure that it's all fixed now, but someone would need to do the investigation to see what version of python 3 introduced all the necessary operations; this may involve carrying a compat shim or dropping <3.4 or something like that.

glyph · 2016-06-06T20:58:56Z

(Calling it a "regression" because the buffer builtin, now gone in favor of memoryview, did handle these cases correctly)

Lukasa · 2016-06-07T08:29:46Z

So data_to_send really only implies one spurious copy unless you use the optional amt argument. If you leave that argument as None we don't copy the buffer, we just hand it off to you, no questions asked. That means the only copy is from the byte object originally provided to send_data into the serialized buffer. However, that's also exactly one copy, which all things considered is pretty damn good.

Lukasa · 2017-03-23T11:57:46Z

I'm postponing this to 4.0.0.

Lukasa added Enhancement Question Performance labels Jun 4, 2016

Lukasa added this to the 3.0.0 milestone Jun 4, 2016

jchampio mentioned this issue Jun 5, 2016

[RFC] Discussion PR for HTTP/2 support (DO NOT MERGE). aio-libs/aiohttp#882

Closed

Lukasa modified the milestones: 4.0.0, 3.0.0 Mar 23, 2017

Kriechi removed this from the 4.0.0 milestone Sep 21, 2020

synodriver mentioned this issue Nov 14, 2021

can not use sendfile under http/2 nonebot/nonecorn#3

Open

synodriver mentioned this issue Dec 10, 2023

Re-evaluate "http.response.zerocopysend" django/asgiref#423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for sendfile() based DATA frames. #236

Add support for sendfile() based DATA frames. #236

Lukasa commented Jun 4, 2016

Lukasa commented Jun 4, 2016

Lukasa commented Jun 4, 2016 •

edited

jchampio commented Jun 5, 2016

Lukasa commented Jun 5, 2016

jchampio commented Jun 5, 2016

Lukasa commented Jun 6, 2016

njsmith commented Jun 6, 2016

rbtcollins commented Jun 6, 2016

njsmith commented Jun 6, 2016

glyph commented Jun 6, 2016

glyph commented Jun 6, 2016

Lukasa commented Jun 7, 2016

Lukasa commented Mar 23, 2017

Add support for sendfile() based DATA frames. #236

Add support for sendfile() based DATA frames. #236

Comments

Lukasa commented Jun 4, 2016

Lukasa commented Jun 4, 2016

Lukasa commented Jun 4, 2016 • edited

jchampio commented Jun 5, 2016

Lukasa commented Jun 5, 2016

jchampio commented Jun 5, 2016

Lukasa commented Jun 6, 2016

njsmith commented Jun 6, 2016

rbtcollins commented Jun 6, 2016

njsmith commented Jun 6, 2016

glyph commented Jun 6, 2016

glyph commented Jun 6, 2016

Lukasa commented Jun 7, 2016

Lukasa commented Mar 23, 2017

Lukasa commented Jun 4, 2016 •

edited