Draft: chore: Add an end-to-end batching benchmark #20479

orf · 2024-05-12T16:21:17Z

This is a bit of a weird PR, sorry. I have been playing a lot with the internals of Vector recently for a project of mine to integrate Vector as a Python library. Vector is a really nicely structured project, and after a few wrong turns I was able to integrate it in-proces with Python as a library.

To explore how everything works (especially the service abstractions) I built the following dummy source and sink, and I figured I'd try and contribute it as a end-to-end Criterion benchmark for a couple of reasons:

I found it quite hard to find completely minimal/bare-bones non-test sources and sinks, so perhaps this could form an example for contributors?
An end-to-end benchmark involving producing and consuming might be an useful thing to include?
- I was curious about potential overhead incurred with having to repeatedly allocate Vec's to call SourceSender::send_batch and I had no idea where to start profiling this.

Feel free to close this, but I figured it might be useful to someone in the future and I didn't want it to go to waste.

bits-bot · 2024-05-12T16:21:21Z

All committers have signed the CLA.

jszwedko

Hi @orf !

Thanks for opening this for discussion!

Once-upon-a-time we actually had planned on, and had started, pursuing using criterion for end-to-end benchmarks, but ended up swapping it for alternate approach first proposed in https://github.com/vectordotdev/vector/blob/1b57acd6bb806a09556a8d5cd3a64709c5f44354/rfcs/2021-02-23-6531-performance-testing.md that relies on an external benchmark tool to run. This was primarily due to issues we with eliminating noise and to have more control over how the benchmark analysis is performed.

That being said, I do see it being useful to have a dummy source and sink to serve as examples for contributors. I just don't know that the benchmarks will end up being used much.

I think I'd be ok with accepting this contribution if we add a comment to benches/integration/mod.rs explaining what it is and what it should be used for. I'll tag some other team members for review too.

orf · 2024-05-30T05:10:18Z

So after a bit of work on this I came to similar conclusions as the linked document - there’s a lot of noise, and while the timings are stable it’s not clear if the timings are fully accurate in measuring the things we want to measure.

There’s definitely value in an example like this I feel, so maybe it should just be an example rather than a benchmark?

jszwedko · 2024-05-31T21:57:52Z

So after a bit of work on this I came to similar conclusions as the linked document - there’s a lot of noise, and while the timings are stable it’s not clear if the timings are fully accurate in measuring the things we want to measure.

There’s definitely value in an example like this I feel, so maybe it should just be an example rather than a benchmark?

Agreed, I think there is some value in a heavily documented example that is in-tree so that it is compiled and kept up-to-date. We have some example sink code in this tutorial: https://github.com/vectordotdev/vector/tree/master/docs/tutorials/sinks but I think it suffers from the fact that it isn't in-tree and thus can become stale without notice.

Would you want to move the example implementations and drop the benchmarks. I think tests/examples/ could make a good place?

Add an end-to-end batching benchmark.

4ffaf78

Add acknowledgements

394f76d

jszwedko reviewed May 28, 2024

View reviewed changes

jszwedko self-assigned this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: chore: Add an end-to-end batching benchmark #20479

Draft: chore: Add an end-to-end batching benchmark #20479

orf commented May 12, 2024 •

edited

bits-bot commented May 12, 2024 •

edited

jszwedko left a comment

orf commented May 30, 2024

jszwedko commented May 31, 2024

Draft: chore: Add an end-to-end batching benchmark #20479

Are you sure you want to change the base?

Draft: chore: Add an end-to-end batching benchmark #20479

Conversation

orf commented May 12, 2024 • edited

bits-bot commented May 12, 2024 • edited

jszwedko left a comment

Choose a reason for hiding this comment

orf commented May 30, 2024

jszwedko commented May 31, 2024

orf commented May 12, 2024 •

edited

bits-bot commented May 12, 2024 •

edited