Skip to content
This repository has been archived by the owner on Sep 14, 2021. It is now read-only.

Continue serving requests when highlighting an individual file fails #20

Merged
merged 1 commit into from Apr 19, 2019

Conversation

slimsag
Copy link
Member

@slimsag slimsag commented Apr 19, 2019

When Syntect fails to highlight a file it often does so by panicking. Most
often these issues come from us making use of syntax definitions which simply
aren't supported by Syntect very well (e.g. they use some new or obscure ST3
feature). There is an upstream issue to return a Result type instead of
panicking
when this occurs.

Prior to this change, a user requesting syntect_server to highlight a bad file
(i.e. hitting a case in the syntax definition not supported by Syntect) would
result in syntect_server dying. This has been a known issue for a while, but
in practice hasn't been that bad because these cases are relatively rare and
Kubernetes / Docker restarts the process very quickly anyway. However, when it
does occur it terminates all ongoing highlighting requests which causes blips
that users see.

After this change, we handle these panics by catching and unwinding the stack.
This isn't perfect / ideal / idiomatic Rust code (see the catch_unwind docs),
but it does solve the problem and is a better approach than e.g. adding more
replicas of this service.

Fixes sourcegraph/sourcegraph#3164

When Syntect fails to highlight a file it often does so by panicking. Most
often these issues come from us making use of syntax definitions which simply
aren't supported by Syntect very well (e.g. they use some new or obscure ST3
feature). There is an upstream issue to [return a `Result` type instead of
panicking](trishume/syntect#98) when this occurs.

Prior to this change, a user requesting syntect_server to highlight a bad file
(i.e. hitting a case in the syntax definition not supported by Syntect) would
result in `syntect_server` dying. This has been a known issue for a while, but
in practice hasn't been that bad because these cases are relatively rare and
Kubernetes / Docker restarts the process very quickly anyway. However, when it
does occur it terminates all ongoing highlighting requests which causes blips
that users see.

After this change, we handle these panics by catching and unwinding the stack.
This isn't perfect / ideal / idiomatic Rust code (see the `catch_unwind` docs),
but it does solve the problem and is a better approach than e.g. adding more
replicas of this service.

Fixes sourcegraph/sourcegraph#3164
Copy link

@ijt ijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slimsag slimsag merged commit 5e1efbb into master Apr 19, 2019
@slimsag slimsag deleted the sg/no-exit-on-failure branch April 19, 2019 05:09
@slimsag
Copy link
Member Author

slimsag commented Apr 19, 2019

Published as sourcegraph/syntect_server:5e1efbb@sha256:6ec136246b302a6c8fc113f087a66d5f9a89a9f5b851e9abb917c8b5e1d8c4b1

slimsag added a commit that referenced this pull request Aug 30, 2019
slimsag added a commit that referenced this pull request Aug 30, 2019
* Revert "Continue serving requests when highlighting an individual file fails (#20)"

This reverts commit 5e1efbb.

* Revert "Dart+Kotlin support; various maintenance / improvements (#18)"

This reverts commit 40b42c9.
slimsag added a commit that referenced this pull request Oct 3, 2019
* use http-server-stabilizer + 4 worker subprocesses

This change makes syntect_server resilient to the two classes of problems we've
seen in production usage of it:

1. Some specific language grammar/file pairs can cause syntect to panic
   internally. This is usually because syntect doesn't implement a specific
   sublime-syntax feature in some way and it [panics instead of returning
   result types](trishume/syntect#98).

2. Much rarer, some specific language/grammar file pairs can cause syntect to
   get stuck in an infinite loop internally -- never to return and consuming an
   entire CPU core until it is restarted manually.

Previously we tried to solve #1 through stack unwinding (c5773da), but since
the 2nd issue above also appeared it proved to not be sufficient on its own.
It is still useful, though, because it can do per-request recovery of the first
failure scenario above and as such it will be added back in.

Even without stack unwinding, http-server-stabilizer helps both cases above by
running and monitoring replicas of syntect_server. See the README in
https://github.com/slimsag/http-server-stabilizer for details.

It is important to note that all this does is stop these individual file
failures from harming other requests to syntect_server. They are still issues
on their own, and logging and Prometheus monitoring is now in place for us to
identify when this is occurring and in which file it occurred so we can track
down the issue and make small reproduction cases to file and fix upstream.

Since only one instance of syntect_server was previously running and we now run
multiple, more memory is needed. Each instance requires about 1.1 GB at peak
(depending on which languages are used). The default is now to run 4 workers,
so 4.4 GB is the minimum required and 6 GB is suggested. In the event only one
worker is ran (via setting the env var `WORKERS=1`), stability is still greatly
improved since the 2nd failure case above can only last a short period of time
instead of until the container is restarted manually.

Part of sourcegraph/sourcegraph#5406

* Add Dart+Kotlin support; upgrade to Rocket v0.4

* Continue serving requests when highlighting an individual file fails (#20)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

syntect_server: Occasionally terminating with exit code 143 in production
2 participants