Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server-side configuration to tokio::spawn endpoint futures #695

Open
jgallagher opened this issue Jun 9, 2023 · 0 comments
Open

Add server-side configuration to tokio::spawn endpoint futures #695

jgallagher opened this issue Jun 9, 2023 · 0 comments
Assignees
Milestone

Comments

@jgallagher
Copy link
Contributor

We've run into async cancellation future problems in omicron several times; one of the more painful ones is oxidecomputer/omicron#3140. Quickly summarizing: the endpoint called a function which locked a tokio mutex and then performed a sequence of long-running operations. If the client disconnected early, the future encompassing the endpoint was cancelled, which unlocked the tokio mutex and left the data protected by the mutex in an unexpected and difficult-to-debug state.

Async cancellation is tricky at best, and currently dropshot's behavior means the futures returned by endpoint functions may be cancelled under the control of the client. We had a live discussion today, and decided that a reasonable short-term bandaid (as opposed to auditing all endpoints for cancel safety and figuring out some way to keep those audits up to date) is to give dropshot servers a configuration option to tokio::spawn the endpoint futures, preventing from being cancelled if the client disconnects. This is not foolproof and may have its own problems, but seems preferable to the kinds of data corruption we've seen.

For omicron in the short term, we can put in a server-wide configuration option. In the medium term, we could add annotations to allow individual endpoints to opt into specific behavior (either "spawn" or "drop on client disconnect"). This is very much a short term quick fix that will not address all problems, as dropshot clients are not the only source of future cancellations (tokio::select! is another common one).

@jgallagher jgallagher added this to the MVP milestone Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants