Clean up state machine #90

mehaase · 2018-11-16T18:12:27Z

There are a lot of complex code paths for closing a websocket connection, mostly because the first implementation was overly simplistic and I kept hacking away on it to handle each new problem that arose. This could be cleaned up and made a lot more readable.

mehaase · 2018-11-17T14:54:42Z

Brain dumping some of the issues that make the close code complex:

There are multiple stages of closing. We can say that CLOSING is when we are asked to start the handshake and CLOSED is when the underlying stream is torn down. (Or maybe not—see below.)

Wsproto and WebSocketConnection have different meanings of "closing" and "closed". Wsproto is CLOSING when it writes a close frame to its internal buffer. The websocket state is CLOSING when it somebody calls aclose() or a ConnectionClosed event is received but the response is not yet sent. Wsproto is CLOSED when it has sent and received a close frame. WebSocketConnection is closed when the websocket handshake is complete and the underlying stream has been torn down. We probably should not rely on wsproto's state property at all. Instead, we should observe state from events (like ConnectionClosed) and infer state from sequences like wsproto.close(); self._write_pending().

There are two layers to the protocol: the application layer (WebSocket) and the stream layer (i.e. typically TCP). We generally want to close the application layer with a handshake before closing the stream, but sometimes that's not possible, i.e. if the underlying stream is torn down before the handshake occurs.

I wonder if tearing down the stream inside WebSocketConnection is even the right thing to do? It might be surprising for somebody who calls wrap_client_stream(stream) and finds out later the stream was closed... maybe they wanted to reuse that stream for some other protocol? We could define WebSocketConnection closed state as when the handshake completes, then put the responsibility for stream teardown somewhere else, i.e. inside APIs like open_websocket(…). These APIs are already responsible for creating a stream, why shouldn't they be responsible for tearing it down?

The local and remote endpoints each have a state. E.g. if we close with code 4001 then we send a frame with 4001. The server completes the handshake by sending back a frame, but it can send back whatever code it wants, i.e. 4002. Should the local close code property show the close code we sent or the close code we received... or both?

Closing behavior is different depending on which side initiates. E.g. if the remote endpoint initiates closing, then we keep the internal message channel open until the user has read all of the messages queued in it. If the local endpoint initiates closing, then we close the receive channel and the user cannot read any more messages.

We have to be careful not to block the reader task. One mistake I made several times is trying to call self.aclose() from an event handler. This never works because all of the event handlers run inside the reader task and aclose() waits for the connection closed event, so the reader task is deadlocked. There needs to be some method for internal close that is safe to call from the reader task.

The next step is to try to draw a diagram to identify all of the possible ways the connection can close. This should help me understand how to rewrite this.

mehaase · 2019-05-15T20:39:35Z

I'm going to broaden the scope of this to look at the entire state machine in the library. I'm sure there are bugs (or undefined behavior) in certain spots, e.g. what happens if you call accept() and reject() on the same request? You'll probably get an obscure wsproto error.

mehaase mentioned this issue Nov 16, 2018

Configure message buffer (#82) #92

Closed

mehaase added the refactoring This is a refactoring with no change to functionality label Nov 17, 2018

mehaase added this to the 1.0 milestone Nov 17, 2018

mehaase mentioned this issue Jan 22, 2019

ConnectionClosed exception from serve_websocket() #96

Closed

mehaase mentioned this issue Feb 12, 2019

proof of concept for deterministic close race #101

Merged

mehaase self-assigned this May 10, 2019

mehaase changed the title ~~Clean up connection close code~~ Clean up state machine May 15, 2019

mehaase removed their assignment May 20, 2019

mehaase mentioned this issue May 22, 2019

Server does not automatically close TCP connection #115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up state machine #90

Clean up state machine #90

mehaase commented Nov 16, 2018

mehaase commented Nov 17, 2018 •

edited

mehaase commented May 15, 2019

Clean up state machine #90

Clean up state machine #90

Comments

mehaase commented Nov 16, 2018

mehaase commented Nov 17, 2018 • edited

mehaase commented May 15, 2019

mehaase commented Nov 17, 2018 •

edited