Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipe: add server backlog for concurrent Accept() #291

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeffhostetler
Copy link

@jeffhostetler jeffhostetler commented Jul 7, 2023

Teach pipe.go:ListenPipe() to create multiple instances of the server pipe in the kernel so that client connections are less likely to receive a windows.ERROR_PIPE_BUSY error. This is conceptually similar to the backlog argument of the Unix listen(2) function.

The current listenerRoutine() function works sequentially in response to calls to Accept(), such that there will only be at most one unbound server pipe in the NPFS present at any time. Even if the server application calls Accept() concurrently from a pool of application threads, the listenerRoutine() will process them sequentially.

In this model and because there is only one listenerRoutine() instance, there is an interval of time (immediately after a connection is made) where there are no available unbound/free server pipes. When ConnectNamedPipe() returns, listenerRoutine() sends the new pipe handle over a channel to the caller of Accept(). The application code then has an opportunity to dispatch/process it and then call Accept() again. Only at that point can listenerRoutine() create a new unbound server pipe in the file system and wait for the next connection. Anytime during this interval, a client application trying to connect will get a pipe busy error.

Code in DialPipe() hides this from GOLANG callers because it includes a busy retry loop. However, clients written in other languages without this assistance are likely to see the busy error and be forced to deal with it.

This change introduces an "accept queue" using a buffered channel and splits listenerRoutine() into a pool of listener worker threads. Each worker creates a new unbound pipe in the file system and waits for a client connection. The NPFS and kernel can then deliver the new connection to a random listener worker. The resulting connected pipe is delivered back to the caller Accept() as before.

A PipeConfig.QueueSize variable controls the number of listener worker threads and the maximum number of unbound/free pipes server pipes that will be present at any given time. Note that a listener worker will normally have an unbound/free pipe except during that same delivery interval. Having multiple active workers (and unbound pipes in the file system) gives us extra capacity to handle rapidly arriving connections and minimize the odds of a client seeing a busy error.

The server application is encouraged to call Accept() from a pool of application workers. The size of the application pool should be the same or larger than the queue size to take full advantage of the listener queue.

To preserve backwards compatibility, a queue size of 0 or 1 will behave as before.

Also for backwards compatibility, listener workers are required to wait for an Accept() call so that the worker has a return channel to send the connected pipe and error code. This implies that the number of unbound pipes will be the smaller of the queue size and the application pool size.

Finally, a Mutex was added to l.Close() to ensure that concurrent threads do not simultaneously try to shutdown the pipe.

@jeffhostetler jeffhostetler requested a review from a team as a code owner July 7, 2023 20:40
@jeffhostetler
Copy link
Author

I observed this problem while trying to send data from git.exe to a GOLANG server. The code calling CreateFile() saw the busy error (or some other error) and didn't expect to need to spin.

I did a little (incomplete) search of the issue backlog and found a few that it might be related:

@jeffhostetler
Copy link
Author

I added a test at the bottom of pipe_test.go that runs various geometries and shows the observed OK-vs-busy rate, but I wasn't sure if/how/when we wanted to throw an error. On my laptop, the legacy cases get busy errors about 33% of the time. With a moderate queue size, we don't get busy signals -- however they are still theoretically possible, I just didn't see any on my limited tests, so I hesitated asserting it.

Suggestions welcomed. Thanks!

@jeffhostetler
Copy link
Author

@microsoft/containerplat @msscotb @kevpar @helsaawy Hey, just a quick ping. I was wondering if anyone had had a chance to look at my PR and see if this functionality is of interest. (I just noticed that there is now a conflict with a recently merged change. I'll address that shortly.)

Thanks!

@jeffhostetler jeffhostetler force-pushed the concurrent-listen-pipe branch 2 times, most recently from c9410fc to 25b5c7c Compare August 9, 2023 20:42
Teach `pipe.go:ListenPipe()` to create multiple instances of the
server pipe in the kernel so that client connections are less likely
to receive a `windows.ERROR_PIPE_BUSY` error.  This is conceptually
similar to the `backlog` argument of the Unix `listen(2)` function.

The current `listenerRoutine()` function works sequentially and in
response to calls to `Accept()`, such that there will never be more
than one unbound server pipe in the NPFS present at any time.  Even if
the server application calls `Accept()` concurrrently from a pool of
application threads, the `listenerRoutine()` will process them
sequentially.

In this model, because there is only one `listenerRoutine()` instance,
there is an interval of time where there are no available unbound/free
server pipes.  This happens when `ConnectNamedPipe()` returns and
`listenerRoutine()` sends the new pipe handle via a channel to the
caller of `Accept()` where the application code has an opportunity to
use the pipe or give it to another goroutine and then call `Accept()`
again.  The subsequent `Accept()` call causes `listenerRoutine()` to
create a new unbound serer pipe instance in the file system and wait
for the next connection.  Anytime during this interval, a client will
get a pipe busy error.

Code in `DialPipe()` hides this from GOLANG callers because it
includes a busy-retry loop.  However, clients written in other
languages without this assistance are likely to see the busy error and
be forced deal with it.

This change introduces an "accept queue" using a buffered channel and
splits `listenerRoutine()` into a pool of listener worker threads.
Each worker creates a new unbound pipe instance in the file system and
waits for a client connection.  The NPFS and kernel handle connection
delivery to a random listener worker.  The resulting connected pipe is
then delivered back to the caller of `Accept()` as before.

A `PipeConfig.QueueSize` variable controls the number of listener
worker threads and the maximum number of unbound/free pipes server
pipes that will be present at any given time.  Note that a listener
worker will normally have an unbound/free pipe except during that same
delivery interval.  Having multiple active workers (and unbound pipes
in the file system) gives us extra capacity to handle rapidly arriving
connections and minimizes the odds of a client seeing a busy error.

The application is encouraged to call `Accept()` from a pool of
application workers.  The size of the application pool should be the
same or larger than the queue size to take full advantage of the
listener queue.

To preserve backwards compatibility, a queue size of 0 or 1 will
behave as before.

Also for backwards compatibility, listener workers are required to
wait for an `Accept()` call so that the worker has a return channel to
send the connected pipe and/or error code.  This implies that the
number of unbound pipes will be the smaller of the queue size and the
application pool size.

Finally, a Mutex was added to `l.Close()` to ensure that
concurrent threads do not simultaneously try to shutdown the
pipe.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants