Time to first byte? #592

lukehsiao · 2024-05-10T22:39:33Z

Suppose we want to load-test an API which uses server-sent events (SSE). Is it possible to measure the time-to-first-byte using Goose?

jeremyandrews · 2024-05-10T22:46:54Z

Can you provide some examples of how you’re using SSE and what metrics you’d want to measure? What technologies are you using?

lukehsiao · 2024-05-12T00:07:39Z

Hmmm, I can try and get more specific if you need, but one example would be load testing an API like ChatGPT, which uses SSE so that you can start to see the response streaming back as it is generated, rather than simply staring at a blank page for a long time before the entire response is complete.

In these types of use cases, time-to-first-token (essentially time-to-first-byte) is the interesting metric, as that represents the latency between asking a query and when the user can begin to receive a response. This metric is often what dictates how responsive a streaming LLM API feels to a user.

https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices says more about this if you go to the "Important Metrics for LLM Serving" heading.

Our team uses four key metrics for LLM serving:

Time To First Token (TTFT): How quickly users start seeing the model's output after entering their query. Low waiting times for a response are essential in real-time interactions, but less important in offline workloads. This metric is driven by the time required to process the prompt and then generate the first output token.

So, the question then is: can goose be used to load-test and measure a time-to-first-byte as a proxy for time-to-first-token? Could I use goose to try and reproduce some of the results in this Databricks blog post?

Does that help clarify?

jeremyandrews · 2024-05-12T17:49:45Z

That’s very helpful, yes. I’ll find some time to test and see what can be done. I expect it will take some code changes/additions to be useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time to first byte? #592

Time to first byte? #592

lukehsiao commented May 10, 2024

jeremyandrews commented May 10, 2024

lukehsiao commented May 12, 2024 •

edited

jeremyandrews commented May 12, 2024

Time to first byte? #592

Time to first byte? #592

Comments

lukehsiao commented May 10, 2024

jeremyandrews commented May 10, 2024

lukehsiao commented May 12, 2024 • edited

jeremyandrews commented May 12, 2024

lukehsiao commented May 12, 2024 •

edited