doc: update syntax-highlighter scalability info #32359

michaellzc · 2022-03-09T00:55:07Z

Follow up on https://sourcegraph.slack.com/archives/CHXHX7XAS/p1646783148867999

syntax-highlighter is a stateless service and has no external dependencies (e.g. pg), so it should be horizontally scaleable.

Test plan

sg run docsite

Go to http://localhost:5080/dev/background-information/architecture#diagram

syntech server should be surrounded by a box instead of a rectangle.

~~Should we experiment it on dogfod? https://github.com/sourcegraph/deploy-sourcegraph-dogfood-k8s/pull/4008~~ merged
dogfood syntaxhighliting seems fine to me
container logs look fine too

michaellzc · 2022-03-09T00:55:13Z

Current dependencies on/for this PR:

main
- PR doc: update syntax-highlighter scalability info #32359 (this PR)

This comment was auto-generated by Graphite.

slimsag · 2022-03-09T18:43:33Z

That's true, but the connection code does not currently allow horizontal scaling of this service. There are also some major issues with the service which make vertical scaling the better choice for now regardless of if we could.

Can provide more details if useful, but, yeah, we can't scale this service right now except vertically.

michaellzc · 2022-03-09T19:39:01Z

@slimsag

connection code does not currently allow horizontal scaling of this service

I would love to learn more about this.

Are you talking about how clients talk to syntax-highlighting? Like clients won't be able to take advantage of having multiple instances of syntax-highlighting running.

Assuming I am looking at the right place, it looks like a pretty standard HTTP client. That said, I don't know golang or our product code well enough.
https://github.com/sourcegraph/sourcegraph/blob/main/internal/gosyntect/gosyntect.go

There are also some major issues with the service which make vertical scaling the better choice for now regardless of if we could

My intention is more about the ability to run multiple replicas across multiple nodes/zones to avoid a single point of failure (the worker node/zone goes down) over scale horizontally.

slimsag · 2022-03-09T19:57:03Z

OK I see.

Yeah, I am talking about how clients connect. Right now we connect using the env var here so it's a simple HTTP connection to http://syntect-server:9238.

There is no sharding logic in code (like for example what we do for zoekt/gitserver), so you would need to have that point to a load balancer in order to get traffic directed to both pods I believe (but my k8s is rusty, maybe the ingress does that for us?)

If the intention is just redundancy in case one zone goes down, then yeah that should work.

The three things to note if we are talking about scaling for perf, though:

The service itself internally spins up ~8-16 servers, so it's actually running multiple replicas inside the single container (awful, I know, but that's what it does, more on this below)
Each replica in the container itself has a habit of stalling requests (not dying), so there's a process monitor in there that does some revival stuff based on whether or not requests are hanging for too long. It's nasty, but it works. More on that here
Although it's not stateful, the server has to compile a lot of regular expressions. As requests come in, it compiles these. These take a lot of CPU and memory. Something like ~1.5GB per process (so for 1 k8s replica with 16 internal workers, that's up to 16*1.5 == 24GB of memory)

Those are the reasons vertical scaling for this service is better than horizontal in general: the service becomes more reliable the more resources it is given due to internal worker processes.

But sounds like your goal is to have redundancy across AZs, which is great, so feel free to go ahead on that, we may need some LB in front that can divide up the requests or something.

The only thing I'd be worried about is if people in the future see we've got replicas: 2 in that chart and think "to improve perf or stability of the service, we can just increase that" and miss the "WORKERS" env var. Maybe just link to this message somewhere as a form of documentation to avoid that?

michaellzc · 2022-03-09T20:00:58Z

so you would need to have that point to a load balancer in order to get traffic directed to both pods I believe

;) k8s Service does all the magic you mention for you

To clarify, all inter service comm in Sourcegraph route through k8s Services, where it handles the load balancing and route traffic intelligently for you (if a pod is unhealthy it will stop routing traffic to the broken pod)

slimsag · 2022-03-09T20:01:55Z

lol, you wouldn't think I was on Delivery for ~3yrs and forgot that. My brain's RAM is running out :)

michaellzc · 2022-03-09T20:08:25Z

The only thing I'd be worried about is if people in the future see we've got replicas: 2 in that chart and think "to improve perf or stability of the service, we can just increase that" and miss the "WORKERS" env var. Maybe just link to this message somewhere as a form of documentation to avoid that?

Good point. Also, this would be a good addition to our scaling guide https://docs.sourcegraph.com/admin/install/kubernetes/scale

We probably won't be able to give recommendations based on repo size (at least I don't have the data), but maybe we can just say, increase replicas and tune WORKER env var accordingly based on prometheus metrics.

slimsag · 2022-03-09T20:10:46Z

I would phrase it as:

If you're trying to reduce the risk of a machine going down and taking down the service, increase replicas. If you're trying to increase stability/reliability/perf of the service, though, increase WORKERS and allocate more cpu/memory accordingly.

cla-bot bot added the cla-signed label Mar 9, 2022

michaellzc requested a review from tjdevries March 9, 2022 00:57

michaellzc marked this pull request as ready for review March 9, 2022 00:58

michaellzc force-pushed the doc_update_syntax-highlighter_scalabilit branch 3 times, most recently from b75139d to 9ce6c73 Compare March 9, 2022 01:13

danieldides approved these changes Mar 9, 2022

View reviewed changes

michaellzc requested a review from slimsag March 9, 2022 16:30

slimsag approved these changes Mar 9, 2022

View reviewed changes

doc: update syntax-highlighter scalability info

bff885b

michaellzc force-pushed the doc_update_syntax-highlighter_scalabilit branch from 9ce6c73 to bff885b Compare March 9, 2022 20:19

michaellzc enabled auto-merge (squash) March 9, 2022 20:22

michaellzc merged commit f92c72f into main Mar 9, 2022

michaellzc deleted the doc_update_syntax-highlighter_scalabilit branch March 9, 2022 20:23

thiswayman pushed a commit that referenced this pull request Apr 7, 2022

doc: update syntax-highlighter scalability info (#32359)

c08102a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: update syntax-highlighter scalability info #32359

doc: update syntax-highlighter scalability info #32359

michaellzc commented Mar 9, 2022 •

edited

michaellzc commented Mar 9, 2022

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022 •

edited

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022 •

edited

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022

slimsag commented Mar 9, 2022

doc: update syntax-highlighter scalability info #32359

doc: update syntax-highlighter scalability info #32359

Conversation

michaellzc commented Mar 9, 2022 • edited

Test plan

michaellzc commented Mar 9, 2022

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022 • edited

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022 • edited

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022

slimsag commented Mar 9, 2022

michaellzc commented Mar 9, 2022 •

edited

michaellzc commented Mar 9, 2022 •

edited

michaellzc commented Mar 9, 2022 •

edited