-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(topology): Speed up vector startup with big remap transforms #20480
Conversation
…in remap transform
ceb866c
to
f751869
Compare
Thanks for this @Zettroke ! We had the same observation that the VRL compilation was being repeated every time the outputs was calculated but hadn't been able to address it yet. We'll review this shortly. |
731f9ec
to
2432c22
Compare
At a high level, this PR makes two relatively independent major changes to the remap loading code, and it would be helpful to be able to evaluate them separately. Could you untangle the compilation caching from the parallelization work? I for one would really want to address them separately. |
Sure thing. I'll make 2 pull requests then. |
@bruceg Can I base parallelization branch from caching branch? Or should I make them independent? Or should I wait with it until we will finalize caching? |
Make them independent if you want us to review them simultaneously, otherwise run the process with this one and the PR the other. |
Replaced by #20555 |
We have pretty big vrl transforms in our pipeline and we started to encounter issues with how long vector takes to startup.
Startup time was reaching of up to 10 minutes, which was troublesome.
The optimization consists of 2 thing.
outputs
method is requiring to compile vrl and called many times. In the end it results in vrl compilation happening around 8 times for each remap.Results
Attaching this configs. (Yes, we really have 2kloc vrl 😂)
big_conf_clean-single.yaml.txt
big_conf_clean-6.yaml.txt
big_conf_clean-chain.yaml.txt
But there is few questions/problems I have with current implementation:
RemapConfig
, as it feels kinda hacky, but I'm not surecompile
, but I don't want to introducepar_iter
into every method either.While startup time is not the most important metric, it's troubling when it takes minutes.
This optimization is extremely helpful for heavy use of vrl transforms and huge topologies, like my company's use case.
Hope we can resolve questions/problems and merge it.