Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationale for API Choices? #78

Open
mentalisttraceur opened this issue Feb 13, 2019 · 4 comments
Open

Rationale for API Choices? #78

mentalisttraceur opened this issue Feb 13, 2019 · 4 comments

Comments

@mentalisttraceur
Copy link

Is there documentation anywhere for the rationale for the design of the h11 API, specifically how it deviates from other similar projects?

I've noticed for example that h11.Connection chooses to expose events using a simple next_event method, while two functionally similar abstractions, namely h2.connection.H2Connection and msgpack.Unpacker, work like iterators, which allows for getting events with either for event in ... loops or the builtin next.

Given other evidence that @njsmith does really careful API design with an eye for human friendliness and making correct code easier to write, I suspect that there is some good intuition, experience, or thinking behind this choice, and I would appreciate the documentation going more in-depth on it, as well as any other similarly considered choices.

@Lukasa
Copy link
Member

Lukasa commented Feb 14, 2019

This will be fun; I think @njsmith’s API is better. Specifically, even though it adds another interface hook, it turns the implementation into a “single step” parser: only one thing happens at any given time.

This is really, really useful. Specifically it ensures that users cannot make the mistake of responding to an event that is no longer valid due to an event later in the iterator.

h11’s API could very easily be wrapped in an iterator, so in general the new approach is better. (My new HTTP/2 implementation is a single step parser)

@njsmith
Copy link
Member

njsmith commented Feb 14, 2019

Yeah, the thing about h11.Connection is that it's stateful, and each call to .next_event() mutates this state. This is most noticeable in the .our_state and .their_state accessors, but also in that the set of legal operations in general depends on the state, like Cory mentions.

This wasn't any brilliant insight on my part. Originally .receive_data immediately processed everything and returned a list of event objects. And then the first time I tried to use it, I hit a nasty bug where my program did something like:

events = conn.receive_data(...)
for event in events:
    if conn.our_state is ...:
        ...

...and this was looking at events and conn.our_state from different points in time, and it blew up because they weren't consistent.

Oh, here's the PR: #4

Now... why is it called next_event and not __next__? That I'm not so sure of :-). I'm not sure if we ever thought about it.

@mentalisttraceur
Copy link
Author

mentalisttraceur commented Feb 14, 2019

Thanks @Lukasa and @njsmith: The very last bit in the last answer is what I had most in mind: "why is it called next_event and not __next__?", and your answers give some insights into the history behind it that clarifies that.

  1. I am looking at it from the perspective of already seeing the merit of everything else covered in these answers (because the overall merit of h11 not parsing data until each event is requested is pretty clear and mentioned in a few scattered places already), which means I have a "hindsight-like" perspective going into considering alternative interfaces to the same underlying implementation wisdom.

  2. I was thinking more of msgpack.Unpacker's approach, which seems to combine the two: nothing is parsed until the next "event" (serialized object) is requested, but the way you request that is by __next__ on the unpacker.

  3. I hadn't quite thought through the implications on human thinking of how h2 and early h11 code did things: of course if it was originally written to return a list or other iterator on pre-parsed events, then the exposure of an iterator-style interface would be liked to get mentally "linked" with pre-parsing, so it makes sense that both would be dropped at once.

Well, that satisfies my curiosity personally for now. We can close this if you'd like, or keep it open if you want to use it to keep track of any relevant additions to official documentation or whatever else.

@mentalisttraceur
Copy link
Author

@njsmith: I have a API design question I want to bounce off you, and it seems relevant-enough for here:

What do you think about a "bring your own IO"-library requiring a .send_succeeded(), instead of a .send_failed()?

Near as I can tell, there is one downside that is universal, and one that is specifically relevant to the way h11.Connection does things, and I'm curious if you had similar rationale:

The universal downside is that it puts somewhat tedious boilerplate code in the happy path of things working as expected.

The downside that applies specifically to the h11 style of doing things is that the caller doesn't get data-to-send in neat "this is all ready to be sent at once" chunks - the user gets bytes for every event fed into send, and it's meaningless to acknowledge them individually unless you're also feeding them into your IO routines individually, which can be inefficient.

The distinction is relevant because for example I think with h2-style connections with a .data_to_send() method, acknowledgement-of-success rather than reporting-of-failure makes a little more sense, because you can let the connection object do buffering for you and only acknowledge when you've successfully sent everything, while I think the above basically rules it out for h11-style connections.

The hypothetical advantage is that if the connection object refused to let you send more things while data-to-send acknowledgment was pending, it might force some users into doing error-handling logic that they'd otherwise skip.

But the cost is lower code "signal-to-noise" because of more boilerplate with explicit acknowledgments, more tedious interactive use (when/if that's relevant), and between that and other factors, reduction-of-programmer-joy (which I know is important because that is literally how Python wins my preference: by bringing enjoyment by having nice abstractions for many things that in other languages would require boilerplate and annoyance and tedious repetitive typing).

And honestly the person that was going to skip the tedium of error-checking is probably going to react to being forced into tedium with greater lack of desire to write tedious code, and thus many minds would be more incentivized to just "lie" and acknowledge data as successfully sent as soon as they grab it.

I was briefly thinking the tedium could be circumvented with some sort of context manager approach where __enter__ gives you the data and __exit__ treats whether or not it is exited with an exception as a sign of whether or not to mark the data send as failed, but on further thought it's only an improvement in situations where the sending fails with an exception (fairly common) and the exception is handled but in a way that doesn't account for the bad connection state (probably less common? - in my limited experience people either actually handle exceptions properly, even if that means a higher layer restarts everything from scratch, or they don't handle them at all expect perhaps to log them before blowing up).

So at this point I've talked myself into the idea that a send_succeeded instead of a send_failed is definitely not-good for IO-agnostic libraries unless the protocol(s) being modeled have, as a typical use-case, the queuing up of several logical "send" operations into one IO operation, and they provide the buffering for that internally, that it might be not-good even in those cases, and that even when it is good, it's probably best to provide a .safety_off() method that bypasses it for interactive use and quick-and-dirty work or situations where it's really not needed or appropriate.

Anyway, I think at least some of the above might be worthwhile/educational to have written up somewhere in connection to this project? Because it took me a little bit of playing with the would-be usage before I realized that a send_succeeded instead of a send_failed would be ergonomically bad in combination with h11.Connection.send.

But anyway, I'm personally curious if you had thought about doing a send_succeeded and what reasoning you took to get to send_failed instead if so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants