Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⭐ Q3B1: Scale Testing – Manually reproducing customer issues #42068

Closed
54 of 58 tasks
Tracked by #42069 ...
jhchabran opened this issue Sep 26, 2022 · 1 comment
Closed
54 of 58 tasks
Tracked by #42069 ...

⭐ Q3B1: Scale Testing – Manually reproducing customer issues #42068

jhchabran opened this issue Sep 26, 2022 · 1 comment
Assignees
Labels

Comments

@jhchabran
Copy link
Member

jhchabran commented Sep 26, 2022

➡️ Bet conclusion https://docs.google.com/document/d/1EikVa90v_itxaN5bKSfx59_Dn2q0ElEDhZA6Zp1aiCI/edit#bookmark=id.4thqlb3dusam

There is a critical need to have a way of providing large-instances of various code hosts for internal testing purposes. These tools should be sourcegraph-wide services and all new features should be going through scale testing to ensure they work for all customers, especially strategic customers.

See Codehost Scale Testing for more.

This is an essential pillar to reach strategic readiness. See Toward a confidently Strategic-ready code intelligence platform (post-4.0) 2022-08 for more.

Scratch Log

Problem

The context mentioned above is quite clear, but how to get there is very blurry.

For example, there are no defined use cases yet. What each team wants to test at scale is purely based on previous observations, with varying confidence levels. What kind of data is relevant enough is totally unclear. What kind of synthetic data represents accurately enough a customer instance so we can be confident that testing against it will tell us that it'll work with the customer?

➡️ So we're navigating uncharted waters, and we need to provide tooling to make this happen, as early as possible so each team can use their findings to reshape their roadmap toward strategic readiness.

  • Repo management & IAM
    • Have a L/XL instance to play with, connected to GitHub, Gitlab, BitBucket, Perforce (sorted by priority)
  • CodeNav
    • Need to understand how scaling affects performances: have an instance you can scale up to whichever size you want and connect to any codehost
  • Search Core
    • Find out how many concurrent searches we can support
    • What can 2,000 engineers do? What do they do to an instance?

See Strategic customer support, state of the world for L / XL definitions.

Scope

At this stage, there is no need for any kind of automation because teams can simply take turns in using the scale testing environment.

  • We are to actively support teammates using the instance and document as we go.
  • We push for its use. If it's not being used, we seek to understand why.
  • We build tools that are useful to observe the instance if needed.
  • We are to pair with them to build tooling to feed data into the instances.
  • And we maintain the instance, of course.

Boundaries

  • Single instance only. Or if really needed, we manually duplicate it.
  • Limited automation for provisioning the instance: populate, snapshot and restore.
  • Non-goal: replicating customer traffic on the instance.
  • Non-goal: cost optimization; this is an investment, just don't overspend for no reason.
  • We maintain, but do not monitor the scale testing instance, that's up to the teams using the instance (but if you see something, tell them eh).

Definition of Done

  • An engineer familiar with infrastructure and observability can operate and monitor the instance while conducting some tests.
  • At least two teams got data relevant to strategic readiness out of the scale testing instance.
  • Instance can be fed with synthetic data
  • Users can be populated up to 10k < X < 20k
    • Repositories up to 100k < X < 250k and with the largest one being 16 GB < X < 30 GB
    • Code hosts: GitHub, GitLab at minimum. Perforce.
  • We know what the limits of our scale testing instance are. How big can we go?
  • How much does this cost?

Payout

Each participating team has collected feedback on how their domain operates at scale and knows how to create meaningful data. We can reproduce hypothetical use cases that approximate our customer use case (repos count and size, users count).

The best case scenario is that each team has uncovered potential blockers and has mapped out how to reproduce customers' scenarios. The worst-case scenario is they have simply uncovered bugs that would or have affected customers in the past.

We'll have enough data (cost, usage, reliability) to accurately decide if we want to keep focusing on improving the testing capabilities and/or if we want to focus on automating the tooling itself.

Approach

We previously implemented an MVP, see Scale testing that is currently being tested by Eric Friz and Stephan Hengl. This is our starting point.

  • Assist @efritz, @stefanhengl and @ryanslade in their usage of the scale testing instance.
    • 🔥 This is the most important thing here. Nobody using the scale testing instance means having zero impact.
  • Being able to populate >10k+ users on CodeHost
    • In particular GitHub which is a problem @quinnhare and @jhchabran are in touch with their support)
  • Creating synthetic data
    • What do we have so far? What do we need? How can we plug this in easily?

Tracked issues

@unassigned

Completed

@Piszmog

Completed

@burmudar

Completed

@davejrt

Completed

@jhchabran

Completed

@kalanchan

Completed

@kopancek: 1.00d

Completed: 1.00d

@miveronese: 1.00d

Completed: 1.00d

@mucles

Completed

@quinnhare

Completed

@sanderginn: 1.00d

Completed: 1.00d

Legend

  • 👩 Customer issue
  • 🐛 Bug
  • 🧶 Technical debt
  • 🎩 Quality of life
  • 🛠️ Roadmap
  • 🕵️ Spike
  • 🔒 Security issue
  • 🙆 Stretch goal
@jhchabran jhchabran changed the title Q3B1: Scale Testing – Manually reproducing customer issues ⭐ Q3B1: Scale Testing – Manually reproducing customer issues Sep 27, 2022
@jhchabran
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants