Notify client close #310

viktordick · 2020-08-12T17:50:15Z

Implementation of the idea discussed in #308. The tests don't pass yet and I should also implement additional tests for this, but hopefully this helps with the discussion.

viktordick · 2020-08-15T16:12:16Z

@bertjwregeer Is there a chance that this PR will be accepted if I fix the tests and add new ones for the new feature? Or do you think our request is too exotic and the added complexity should be avoided?

digitalresistor

I am not opposed to adding the feature, but not in this current form.

Adding an extra lock for something most applications won't use adds extra overhead that is not necessary
I'd want this to be a flag that is set in the configuration, so this needs an adjustment flag, so that app owners need to opt-in to this behaviour, I do not like the idea of this being a default
I would be okay with a new key being added to the environment that is a callable that can check if a channel is now closed because of client disconnection. This is the suggestion that @mmerickel made.

The task already has access to the channel, and since WSGITask is creating the environ, adding a function that is:

def check_alive(self):
    return self.channel.client_disconnected

would be simple.

And in handle_read() we set client_disconnected if the client has disconnected (i.e. the read returns 0). Then in Task we add a check in finish() to see if self.channel.client_disconnected is true, to raise ClientDisconnected so that the logic in HTTPChannel.service() does the right thing to tear everything down.

Regarding number 2, I don't mind the extra knobs being available in the WSGI environment by default, although adjustments is available inside of Task as well, so it could be added to the WSGI environment conditionally.

src/waitress/channel.py

viktordick · 2020-08-23T12:35:05Z

@bertjwregeer Thanks for the review.

Regarding 3:
I guess there are three possibilities. A) the application inserts a callable into the environment that waitress then calls (in the main thread, not in the worker thread!). B) Waitress inserts a flag into the environment once the connection is closed and the application checks for it. C) Waitress inserts a callable into the environment that the application can use to check for client disconnection.

I was thinking that A) would be easiest to deal with on the application side, especially if we are in a situation where we spawned another process or are currently waiting for an SQL database query to be performed. When triggered, we could then terminate the spawned process or the database worker process.

However, this has the mentioned disadvantage that the handler interrupts the main thread of waitress. And maybe it also adds unnecessary complexity (maybe the additional lock was necessary due to this approach and is not required with C, but I am not sure yet). I will try to think about this a bit more.

Regarding 1:
I think the additional lock is necessary because handle_read is running in the main thread while service is running in the task thread and before this change, they never were executed simultaneously for the same channel (while there was a request being processed, readable always returned False). I will try to think a little bit more about this - maybe by switching the strategy to C it is not necessary. But I do not think it is related to the strategy, it is simply because we have one thread that might create new requests (the main thread) and one that is processing requests, and they now may be running simultaneously which they did not do before.

Regarding 2:
I was not sure how to implement a flag that may be set using PasteDeploy before, so in this draft I used the global variable max_requests_per_channel, which should restore the default functionality if set to 0 (I guess the name should be changed to something like max_lookahead_requests ). If I understand correctly, I can set this as one of the parameters to waitress.serve. By giving it a default value of 0 there, this should restore the default functionality. I will try it.

viktordick · 2020-08-23T12:41:09Z

Addendum: Regarding the difference between strategies B and C, I guess with C we do not need to make sure to catch the environment of the currently running task as well as any possibly queued tasks. However, I guess we anyhow do not want to start any more tasks if the client disconnected, do we? In any case, C seems to be superior to B. (C was also the strategy that @mmerickel suggested).

viktordick · 2020-08-23T14:59:33Z

@bertjwregeer I tried to implement the requested changes. There is still a lot to be done regarding tests and documentation, but is the general idea now something that might be accepted?

Now that I looked over it again, the additional lock I had in the original attempt was due to race conditions with self.current_task, which was needed in order to implement strategy A as mentioned above. It is no longer needed with C (the remaining race condition is between received() adding requests to be processed and service() removing them, but self.requests.extend(requests) is already protected by the GIL).

viktordick · 2020-08-27T14:54:02Z

@bertjwregeer I changed the implementation so service() only processes one request and re-inserts itself using add_task if there are more requests. This requires a lock to cleanly communicate if the main thread should call add_task or if the task thread should do it and prevent race conditions. However, I am pretty sure that I could re-use the task_lock for this. It is no longer necessary to guard the complete service method with this lock since by only re-inserting the next task at the very end of the function, it is ensured that no two requests of the same channel are processed in the wrong order or in parallel.

However, this makes the difference to master rather large since most of the content in service() changed the indentation level.

One change I am not sure about is the handling of the output buffers. Due to the change, they are flushed after each request separately. Before, if the client sent multiple short requests in a pipeline such that they would be picked up with a single read(), the output buffers might only be flushed once all of them were processed (I think).

I did some obvious adjustments to the tests. I also dropped test_service_no_requests since it does not work anymore (service() assumes that there is a request to be processed), but it anyhow does not handle any real life situation, add_task is only called if there is a request to be processed.

There are two more failing tests that I will check next. And, of course, I need to add tests for the new functionality.

src/waitress/channel.py

src/waitress/task.py

src/waitress/channel.py

digitalresistor · 2020-08-31T06:39:49Z

@viktordick thanks for updating it, I haven't taken a look yet but I saw that @mmerickel provided some feedback. I've been busy moving, hopefully once things things calm down a little bit I'll get back to this and help figure out a best path forward.

viktordick · 2020-09-23T13:27:00Z

@bertjwregeer, @mmerickel So I tried to read through the code in order to get a feeling of when each variable is used and came up with the following:

`task.close_on_finish`

set by ClientDisconnected during trying to write part of the response if the socket is closed or if the task fails to write a header.
Sets close_when_flushed and closes all pending requests after processing the current task

`close_when_flushed`

makes the channel writable and not readable
in handle_write, if there are no more buffers to be flushed or if there is an error, it sets will_close

`will_close`

makes the channel writable and not readable
makes the final step in handle_write a call to handle_close, which cleans up buffers (these might even be on disk)
handle_close sets connected = False after everything is cleaned up

`connected`

Skips the execution of handle_write if False
If encountered to be False inside functions that might be called from within the task thread, it raises ClientDisconnected
Is never set to True inside channel, only in wasyncore.dispatcher when the channel is recycled for a new connection

In summary: connected is the signal that everything is closed and finished and nothing should be done on the channel until the dispatcher reuses it for a new connection. Using connected directly from the main thread on client disconnection might fail to clean up output buffers and depend on the garbage collector to do this, which might miss buffers that have overflown into files. Worse, it might contaminate the next connection that reuses the same Channel instance, although I have not dug this deep into the code to verify this.

Even though the tests were successful once I merged client_disconnected and not connected, I guess it is better to restore the separation? Maybe under another name and with additional comments explaining the difference.

Curiously, the tests were successful after I pushed 41155fc, but after I changed one comment that was not yet up-to-date, a test failed. There seems to be a race condition that I have been unable to reproduce until now. Not sure if it was already there before or if the problem is related to the handling of connected.

Could any of you please confirm my conclusion or show me where I am mistaken?

mmerickel · 2020-09-24T21:42:04Z

Thanks @viktordick for going through this, I think I agree with most of your analysis. A channel is never reused so there is no concern about the buffer cleanup or connected getting reset to True.

From what I'm seeing, connected is the right hook to use here to indicate that the connection is dead and the client is gone. This is what your current code is using. It sounds like this is a concern for you but I'm not clear why or what you think another variable will help capture better? Is there a lifecycle that you want to try to catch before it is closed?

I noticed that there's a potential race condition in channel.received where we are using _flush_some to write out a response. The underlying assumption in that block of code is that if we're reading, then no requests are active, but of course this PR breaks that assumption.

src/waitress/channel.py

viktordick · 2020-09-25T06:49:15Z

Thanks @viktordick for going through this, I think I agree with most of your analysis. A channel is never reused so there is no concern about the buffer cleanup or connected getting reset to True.
But why do we then need close_when_flushed and will_close? ClientDisconnected is raised if the task tries to write part of the response into the output buffers and the client has disconnected. But why is service then going through all the steps instead of simply setting connected to False, emptying self.requests and returning? Even if handle_close should be called from the main thread instead of the task thread (which is the case I guess), this only explains the existence of will_close, not of close_when_flushed - I think.

I could imagine some sort of graceful shutdown which might use close_when_flushed to process all currently running tasks, but not accept any more. However, I do not see any code that might implement something like this.

I noticed that there's a potential race condition in channel.received where we are using _flush_some to write out a response. The underlying assumption in that block of code is that if we're reading, then no requests are active, but of course this PR breaks that assumption.
Thanks, I will check this out

mmerickel · 2020-09-25T07:23:28Z

But why do we then need close_when_flushed and will_close

So there's a bunch of reasons to close that are triggered via client, wsgi app, or process:

client not using keep-alive (close_when_flushed = True)
unhandled wsgi app exception (close_when_flushed = True)
client unexpected hangup (connected = False)
graceful server-triggered shutdown (not well supported but triggered via Channel.cancel / will_close = true)

I could imagine some sort of graceful shutdown which might use close_when_flushed to process all currently running tasks, but not accept any more. However, I do not see any code that might implement something like this.

This is sort of supported right now via the Channel.cancel method which is invoked from the threadpool (TaskDispatcher). However it might not be as graceful as you'd expect. Right now it stops ASAP versus after draining the pending requests. I think as we get further into supporting a real graceful shutdown that behavior would change.

mmerickel · 2020-09-25T07:27:53Z

Also yes, will_close is being used to ensure we are invoking handle_close from within the event loop and not from the channel's thread. Like I said above I think that its meaning would change if we added a better graceful shutdown. Likely to something closer to "close after flushing the current list of requests plus one we might be reading right now, and do not start reading any new ones".

digitalresistor

I like where this is going. More graceful shutdown would be fantastic, and its something that I would like to add, so keeping that in mind while continuing down this path would be great.

src/waitress/channel.py

tests/test_task.py

viktordick · 2020-10-02T18:23:47Z

I secured the writing into the outbufs in reeceived with the outbuf_lock, which seems to work well.

I'd say this PR is ready for a re-review. One check seems to be stuck - I had this problem with some other PR on github once and it seemed to be necessary to close and re-open the PR to get the tests to run again. Should I do this here also?

digitalresistor · 2020-10-02T19:42:57Z

Maintainers have the ability to hit the re-run all jobs button. So no need to close/open this PR.

viktordick · 2020-10-10T11:54:03Z

So this may work for the second or third request in a pipelined setup, but we should endeavour to send a reply back as soon as possible.

Yes, that is what the code does now. If there are no other request currently being executed or queued to be executed, the 100-continue is sent as soon as the request that expects it finished submitting the header (as before). Only if it is a request that is pipelined after other requests which are still being executed, the 100-continue is delayed until all tasks before it have finished, which is necessary to keep the order of responses correct.

So the 100-continue is always written as soon as it is possible to write it without messing up the order of the responses.

src/waitress/channel.py

mmerickel · 2020-10-23T20:47:45Z

@viktordick this is awesome, in my mind the only thing missing is a changelog entry!

This inserts a callable `waitress.client_disconnected` into the environment that allows the task to check if the client disconnected while waiting for the response at strategic points in the execution, allowing to cancel the operation. It requires setting the new adjustment `channel_request_lookahead` to a value larger than 0, which continues to read requests from a channel even if a request is already being processed on that channel, up to the given count, since a client disconnect is detected by reading from a readable socket and receiving an empty result.

viktordick · 2020-10-26T06:54:59Z

@mmerickel Thanks! I wrote a changelog entry and squashed the commits.

@bertjwregeer Are you also happy with the result? There is still a change request from you recorded on the PR, but I think everything mentioned there has been addressed in the mean time.

digitalresistor · 2020-10-27T08:20:21Z

Thanks for being patient and working through this. I want to read this critically one last time and make sure that I've convinced myself that there are no locking issues/race conditions.

Thanks for all your hard work!

digitalresistor · 2020-10-31T07:03:46Z

Thanks so much @viktordick for your patience and working through this. This looks great, and I have adequately convinced myself that this won't break!

viktordick · 2020-10-31T11:02:08Z

Thanks a lot for the help and patience!

Is a release already planned sometime soon?

digitalresistor · 2020-11-27T06:50:23Z

@viktordick https://pypi.org/project/waitress/2.0.0b0/

I need testers to validate this doesn't break anything, but there is now a released version of waitress you can try in your environment.

viktordick · 2020-11-30T07:33:39Z

Thanks.
We are actually already using it - we simply use

git+https://github.com/perfact/waitress@2.0.0dev0.perfact.2

In our requirements.txt, which is essentially the current state of upstream master plus a commit that gives a custom tag. Although the number of systems that actually already use it is still small (I think no production system yet).

wondie · 2021-10-27T15:33:57Z

I couldn't use it in waitress serve in version 2.0. I get a stack trace ValueError: Unknown adjustment 'client_disconnected' when using it as serve(MyFlask.app, host="111.111.111", port=os.environ["PORT"], channel_request_lookahead=5, client_disconnected=run_on_exit). Is it yet implimented?

mmerickel · 2021-10-27T16:33:06Z

That's not how this feature works. You can't inject your own callable into the environ.

wondie · 2021-10-27T16:50:23Z

Then how is it used? I couldn't find any example on it.

mmerickel · 2021-10-27T16:55:05Z

When using waitress the wsgi environ is providing an extra key with a callable that can be invoked to see if the client is still connected. It is not a callback. You have to poll it in your request handling code.

viktordick marked this pull request as draft August 12, 2020 17:50

viktordick mentioned this pull request Aug 12, 2020

Is there a way to pass down to the application that the client disconnected? #308

Closed

digitalresistor requested changes Aug 15, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

digitalresistor reviewed Aug 16, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

viktordick force-pushed the notify-client-close branch from a9fa318 to 2982dd2 Compare August 23, 2020 14:43

mmerickel reviewed Aug 27, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

mmerickel reviewed Aug 27, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

mmerickel reviewed Aug 27, 2020

View reviewed changes

src/waitress/task.py Outdated Show resolved Hide resolved

mmerickel reviewed Aug 27, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

viktordick force-pushed the notify-client-close branch from e4b7083 to 1f0022f Compare September 9, 2020 06:43

viktordick marked this pull request as ready for review September 13, 2020 07:38

mmerickel reviewed Sep 24, 2020

View reviewed changes

src/waitress/channel.py Show resolved Hide resolved

digitalresistor reviewed Sep 28, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

tests/test_task.py Outdated Show resolved Hide resolved

viktordick force-pushed the notify-client-close branch 2 times, most recently from 3859888 to f3176a8 Compare October 2, 2020 18:07

viktordick requested a review from digitalresistor October 2, 2020 18:13

digitalresistor requested a review from mmerickel October 2, 2020 19:43

mmerickel reviewed Oct 18, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

mmerickel reviewed Oct 18, 2020

View reviewed changes

src/waitress/channel.py Show resolved Hide resolved

viktordick force-pushed the notify-client-close branch from b61399b to eedfedd Compare October 22, 2020 11:09

viktordick requested a review from mmerickel October 22, 2020 11:14

mmerickel reviewed Oct 22, 2020

View reviewed changes

src/waitress/channel.py Outdated Show resolved Hide resolved

viktordick requested a review from mmerickel October 23, 2020 10:57

mmerickel approved these changes Oct 23, 2020

View reviewed changes

viktordick force-pushed the notify-client-close branch from 94411f0 to a514a47 Compare October 25, 2020 08:36

digitalresistor merged commit 31d7498 into Pylons:master Oct 31, 2020

viktordick mentioned this pull request Nov 5, 2020

Proposing an easy method for graceful application shutdown #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notify client close #310

Notify client close #310

viktordick commented Aug 12, 2020

viktordick commented Aug 15, 2020

digitalresistor left a comment

viktordick commented Aug 23, 2020

viktordick commented Aug 23, 2020

viktordick commented Aug 23, 2020

viktordick commented Aug 27, 2020

digitalresistor commented Aug 31, 2020

viktordick commented Sep 23, 2020

mmerickel commented Sep 24, 2020

viktordick commented Sep 25, 2020

mmerickel commented Sep 25, 2020 •

edited

mmerickel commented Sep 25, 2020

digitalresistor left a comment

viktordick commented Oct 2, 2020

digitalresistor commented Oct 2, 2020

viktordick commented Oct 10, 2020

mmerickel commented Oct 23, 2020

viktordick commented Oct 26, 2020

digitalresistor commented Oct 27, 2020

digitalresistor commented Oct 31, 2020

viktordick commented Oct 31, 2020

digitalresistor commented Nov 27, 2020

viktordick commented Nov 30, 2020

wondie commented Oct 27, 2021

mmerickel commented Oct 27, 2021

wondie commented Oct 27, 2021

mmerickel commented Oct 27, 2021

Notify client close #310

Notify client close #310

Conversation

viktordick commented Aug 12, 2020

viktordick commented Aug 15, 2020

digitalresistor left a comment

Choose a reason for hiding this comment

viktordick commented Aug 23, 2020

viktordick commented Aug 23, 2020

viktordick commented Aug 23, 2020

viktordick commented Aug 27, 2020

digitalresistor commented Aug 31, 2020

viktordick commented Sep 23, 2020

task.close_on_finish

close_when_flushed

will_close

connected

mmerickel commented Sep 24, 2020

viktordick commented Sep 25, 2020

mmerickel commented Sep 25, 2020 • edited

mmerickel commented Sep 25, 2020

digitalresistor left a comment

Choose a reason for hiding this comment

viktordick commented Oct 2, 2020

digitalresistor commented Oct 2, 2020

viktordick commented Oct 10, 2020

mmerickel commented Oct 23, 2020

viktordick commented Oct 26, 2020

digitalresistor commented Oct 27, 2020

digitalresistor commented Oct 31, 2020

viktordick commented Oct 31, 2020

digitalresistor commented Nov 27, 2020

viktordick commented Nov 30, 2020

wondie commented Oct 27, 2021

mmerickel commented Oct 27, 2021

wondie commented Oct 27, 2021

mmerickel commented Oct 27, 2021

`task.close_on_finish`

`close_when_flushed`

`will_close`

`connected`

mmerickel commented Sep 25, 2020 •

edited