Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess Kedro performance for complex pipelines #3866

Open
astrojuanlu opened this issue May 13, 2024 Discussed in #3790 · 0 comments
Open

Assess Kedro performance for complex pipelines #3866

astrojuanlu opened this issue May 13, 2024 Discussed in #3790 · 0 comments

Comments

@astrojuanlu
Copy link
Member

The Kedro-Viz team carried out a performance analysis using an internal QB pipeline, with preliminary results shown here kedro-org/kedro-viz#1064

  • It takes a long time to initialise the Kedro modules and reach the actual kedro viz run command (already sort of known, Improve Kedro CLI startup time kedro#1476)
  • The expensive operation before starting the viz server is loading the data from the Kedro session (possibly related to Lazy Loading of Catalog Items kedro#2829 ?)
  • Most of the time taken to load the data is from catalog and pipelines_dict resolution, which worsens as the pipeline count increases

(from kedro-org/kedro-viz#1064 (comment), summary of internal report).

There is preliminary evidence that the Kedro Framework CLI is a bottleneck for Kedro Viz.

This is on top of the already existing evidence that Kedro takes a lot of time to load even for trivial commands or almost empty projects #1476

We noted that there are several factors that make a pipeline "complex":

  • Lots of nodes
  • Lots of pipelines
  • Lots of datasets

In I expanded on @AhdraMeraliQB's original proposal and suggested that we create a family of pipelines, comprising

Comes from #3790

Originally posted by AhdraMeraliQB January 6, 2024

Description

There are several features across the Kedro organisation that could benefit from manual testing on large projects to evaluate performance. The proposal is to create several kedro projects of varying size that can be used to test and experiment with.

This could be particularly useful for testing Viz features, CC @rashidakanchwala, @NeroOkwa
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant