enhancement(topology): Speed up vector startup with big remap transforms #20480

Zettroke · 2024-05-13T02:29:26Z

We have pretty big vrl transforms in our pipeline and we started to encounter issues with how long vector takes to startup.
Startup time was reaching of up to 10 minutes, which was troublesome.

The optimization consists of 2 thing.

Cache vrl compilation results for remap, as it's outputs method is requiring to compile vrl and called many times. In the end it results in vrl compilation happening around 8 times for each remap.
Parallelizing topology building.

Results

Test	Baseline	1 thread	16 threads
big_conf_clean-single.yaml	11.40s	4.50s	4.47s
big_conf_clean-6.yaml	67.18s	27.24s	5.01s
big_conf_clean-chain.yaml	120.38s	31.75s	16.69s

Attaching this configs. (Yes, we really have 2kloc vrl 😂)
big_conf_clean-single.yaml.txt
big_conf_clean-6.yaml.txt
big_conf_clean-chain.yaml.txt

But there is few questions/problems I have with current implementation:

I'm not so keen on keeping cache in RemapConfig, as it feels kinda hacky, but I'm not sure
I've added rayon as a dependency, I thought about hiding it behind a feature flag, but it would be really annoying.
I don't like pre-cache step in compile, but I don't want to introduce par_iter into every method either.
Maybe there should be a cli option for limiting thread count

While startup time is not the most important metric, it's troubling when it takes minutes.
This optimization is extremely helpful for heavy use of vrl transforms and huge topologies, like my company's use case.

Hope we can resolve questions/problems and merge it.

bits-bot · 2024-05-13T02:29:30Z

All committers have signed the CLA.

…in remap transform

jszwedko · 2024-05-20T17:15:00Z

Thanks for this @Zettroke ! We had the same observation that the VRL compilation was being repeated every time the outputs was calculated but hadn't been able to address it yet. We'll review this shortly.

Zettroke · 2024-05-20T20:06:53Z

There is weird test failure for topology::schema::tests::test_expanded_definition.
For some reason output doesn't get sorted. And I can't reproduce it locally

bruceg · 2024-05-22T14:59:01Z

At a high level, this PR makes two relatively independent major changes to the remap loading code, and it would be helpful to be able to evaluate them separately. Could you untangle the compilation caching from the parallelization work? I for one would really want to address them separately.

Zettroke · 2024-05-22T15:21:00Z

Sure thing. I'll make 2 pull requests then.

Zettroke · 2024-05-23T16:07:33Z

@bruceg Can I base parallelization branch from caching branch? Or should I make them independent? Or should I wait with it until we will finalize caching?

bruceg · 2024-05-23T20:25:18Z

Make them independent if you want us to review them simultaneously, otherwise run the process with this one and the PR the other.

bruceg · 2024-05-30T18:23:51Z

Replaced by #20555

Zettroke requested a review from a team as a code owner May 13, 2024 02:29

github-actions bot added domain: topology Anything related to Vector's topology code domain: transforms Anything related to Vector's transform components labels May 13, 2024

Speed up vector startup by caching and parallelizing vrl compilation …

f751869

…in remap transform

Zettroke force-pushed the speedup-vector-startup branch from ceb866c to f751869 Compare May 13, 2024 02:32

Zettroke marked this pull request as draft May 13, 2024 08:56

fix tests + spelling

3adacaf

Zettroke marked this pull request as ready for review May 13, 2024 11:15

fix order dependence in test

dd69665

cargo fmt

2432c22

Zettroke force-pushed the speedup-vector-startup branch from 731f9ec to 2432c22 Compare May 20, 2024 18:43

pront self-requested a review May 20, 2024 18:44

Zettroke mentioned this pull request May 23, 2024

enhancement(remap transform): add caching to remap vrl compilation #20555

Open

bruceg closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement(topology): Speed up vector startup with big remap transforms #20480

enhancement(topology): Speed up vector startup with big remap transforms #20480

Zettroke commented May 13, 2024 •

edited

bits-bot commented May 13, 2024 •

edited

jszwedko commented May 20, 2024

Zettroke commented May 20, 2024

bruceg commented May 22, 2024

Zettroke commented May 22, 2024

Zettroke commented May 23, 2024 •

edited

bruceg commented May 23, 2024

bruceg commented May 30, 2024

enhancement(topology): Speed up vector startup with big remap transforms #20480

enhancement(topology): Speed up vector startup with big remap transforms #20480

Conversation

Zettroke commented May 13, 2024 • edited

Results

bits-bot commented May 13, 2024 • edited

jszwedko commented May 20, 2024

Zettroke commented May 20, 2024

bruceg commented May 22, 2024

Zettroke commented May 22, 2024

Zettroke commented May 23, 2024 • edited

bruceg commented May 23, 2024

bruceg commented May 30, 2024

Zettroke commented May 13, 2024 •

edited

bits-bot commented May 13, 2024 •

edited

Zettroke commented May 23, 2024 •

edited