Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue serving requests after failing to spawn a thread #220

Open
mleonhard opened this issue Feb 21, 2022 · 3 comments
Open

Continue serving requests after failing to spawn a thread #220

mleonhard opened this issue Feb 21, 2022 · 3 comments

Comments

@mleonhard
Copy link

When the server fails to spawn a thread, a mutex gets poisoned and the server stops handling requests. It does not close its socket and Server::run does not panic.

The mutex that gets poisoned is util::TaskPool.sharing.todo. TaskPool::spawn holds it while calling TaskPool::add_thread which can panic.

Let's change it to handle thread spawn failure.

  • Wait for an existing thread to become free and then hand the request to it.
  • If there are no threads, sleep for 100ms and then try to spawn again.
@bradfier
Copy link
Member

bradfier commented Mar 4, 2022

Being unable to spawn a thread is a transient condition so I agree that we could do with not stopping all further request handling when this happens.

We could replace the std::sync::Mutex with the one from parking_lot which doesn't become poisoned when the thread holding the lock panics but that would be a risk in the unlikely event the panic comes from somewhere other than the call to thread::spawn.

@rawler
Copy link
Collaborator

rawler commented Mar 4, 2022

I created a draft-PR with only a single commit, containing tests for the behavior when running out of threads or file-descriptors. I haven't had time to look at a proper fix, but if someone wants to have a go, maybe these tests might help get you started.

@mleonhard
Copy link
Author

@bradfier If you only change the code to not poison the mutex, then it continues to panic on thread spawn failure. The server will simply disconnect clients whenever it hits the thread limit. This is load-shedding behavior. Load shedding is usually unsuitable for servers that serve non-retrying clients like web browsers. Back-pressure is suitable. See the explanation in #221 (comment) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants