-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to get an example pipeline #3295
Add an option to get an example pipeline #3295
Conversation
Do we need to copy postGen cookiecutter hooks to 4 spaceflights starters to enable |
kedro/framework/cli/starters.py
Outdated
starter_path = "git+https://github.com/kedro-org/kedro-starters.git" | ||
if add_ons: | ||
if example_pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct. Even if the example pipeline isn't wanted, we might need to fetch from a starter template other than the default. @SajidAlamQB can probably help out here about what the flow should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@merelcht , thank you. You right, I need to consult with @SajidAlamQB what should we do in case of Spark and Viz but no example option choosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't need to be changed as we already select the right template with examples in them for pyspark, viz and pyspark & viz. These templates already contain the example pipelines but are removed in hooks/utils.py
_handle_starter_setup
. So the current default case is example=no
. I think we just need a check in utils.py
to skip _handle_starter_setup
if example=yes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SajidAlamQB, many thanks, got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only for the default template [1-5], options without pyspark and viz, we don't already have example pipelines so we might need to change template to spaceflights-pandas
in here if example=yes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only for the default template [1-5], options without pyspark and viz, we don't already have example pipelines so we might need to change template to
spaceflights-pandas
in here ifexample=yes
.
yes, looks like logic should be different for pyspark, viz and without them:
- without pyspark, viz: If --example==yes, use sfaceflights-pandas starter, elif --example ==no, use default template
- with pyspark or viz or both: use starter anyway, but we need modifications in _handle_starter_setup
Last step: strip all of them with the same algorithm based on add_ons 1-5.
Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this sounds right.
c73d901
to
73d4acb
Compare
@DimedS In addition, I wonder is |
Hm, I see that the last version of #3054 with --example on the end: |
I see - I didn't see the latest comment. We should push the project name as the first prompt, but it's out of scope for this PR. |
2ecafbf
to
01d1c0d
Compare
32c4477
to
8c6d5c1
Compare
@@ -3,4 +3,4 @@ include LICENSE.md | |||
include kedro/framework/project/default_logging.yml | |||
include kedro/ipython/*.png | |||
include kedro/ipython/*.svg | |||
recursive-include templates * | |||
recursive-include kedro/templates * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you for the PR and the detailed description. Having it checkout to a feature starter branch save me a lot of time to review and test.
I leave some comments but I think the key questions are
- Can we simplify
example_pipeline
as a boolean flag from the CLI with just--example
? If not, we should parse this early on as a boolean save ourselves from checkingexample_pipeline
is a None or not at every step, since I think they don't make a difference. - Validation - the validation function assume
click
and it also does parsing, I think should keep it separated.
kedro/framework/cli/starters.py
Outdated
extra_context["example_pipeline"] = ( | ||
example_pipeline # type: ignore | ||
if example_pipeline is not None | ||
else _validate_yn(None, None, extra_context.get("example_pipeline", None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _validate_yn
is expected to used in click
. It feels weird to have _validate_yn(None, None, ...)
. Can we simplify this a little bit, there is a lot of handling or None
, if we can make sure this is a bool
early on we can save ourselves a lot of checking.
if "Data Structure" not in selected_add_ons_list and example_pipeline != "True": | ||
_remove_dir(current_dir / "data") | ||
|
||
if "Pyspark" in selected_add_ons_list: | ||
if "Pyspark" in selected_add_ons_list and example_pipeline != "True": | ||
_handle_starter_setup(selected_add_ons_list, python_package_name) | ||
|
||
if "Kedro Viz" in selected_add_ons_list: | ||
if "Kedro Viz" in selected_add_ons_list and example_pipeline != "True": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic here is correct but it is hard to follow. I notice that _handle_starter_setup
is highly coupled with PySpark
and Kedro Viz
, in fact this function only get called if these plugins get selected.
-
The only difference is if viz is selected, one extra
reporting
pipeline is removed. I think this should reflected in the argument of_handle_starter_setup
, we don't really needselected_add_ons_list
, a boolean flag is enough. -
Maybe it helps to have a nested
if
here, for me it easier to map it to the diagram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with @noklam here. I would rename _handle_starter_setup
to make it clear this is removal of files, and I also think the code inside _handle_starter_setup
can be more generic. We don't need to check pipeline names and test names, but just remove all files inside the pipelines folder etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated that part to clarify that it's only about Viz and PySpark, @merelcht do you think we should remove all files in tests/pipelines except of init.py?
@noklam , many thanks for your notes.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code
kedro/kedro/framework/cli/starters.py
Lines 862 to 868 in 8c6d5c1
if template_path == str(TEMPLATE_PATH) or ( | |
add_ons and ("Pyspark" in add_ons or "Kedro Viz" in add_ons) | |
): | |
if add_ons == "[]": # TODO: This should be a list | |
click.secho("\nYou have selected no add-ons") | |
else: | |
click.secho(f"\nYou have selected the following add-ons: {add_ons}") |
if "Data Structure" not in selected_add_ons_list and example_pipeline != "True": | ||
_remove_dir(current_dir / "data") | ||
|
||
if "Pyspark" in selected_add_ons_list: | ||
if "Pyspark" in selected_add_ons_list and example_pipeline != "True": | ||
_handle_starter_setup(selected_add_ons_list, python_package_name) | ||
|
||
if "Kedro Viz" in selected_add_ons_list: | ||
if "Kedro Viz" in selected_add_ons_list and example_pipeline != "True": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with @noklam here. I would rename _handle_starter_setup
to make it clear this is removal of files, and I also think the code inside _handle_starter_setup
can be more generic. We don't need to check pipeline names and test names, but just remove all files inside the pipelines folder etc.
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
It appears that I've addressed all the review comments and have also included new tests to cover the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code
kedro/kedro/framework/cli/starters.py
Lines 862 to 868 in 8c6d5c1
if template_path == str(TEMPLATE_PATH) or ( add_ons and ("Pyspark" in add_ons or "Kedro Viz" in add_ons) ): if add_ons == "[]": # TODO: This should be a list click.secho("\nYou have selected no add-ons") else: click.secho(f"\nYou have selected the following add-ons: {add_ons}") should also be updated. Now you don't see which add-ons you have selected when you use an example.
@DimedS This still needs to be updated. If I choose an example I don't get a message showing me what add_ons
I chose.
kedro/framework/cli/starters.py
Outdated
cookiecutter_args["directory"] = "spaceflights-pyspark-viz" | ||
cookiecutter_args[ | ||
"checkout" | ||
] = "3076-add-example-pipeline-to-hooks" # ToDel: temporary for test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to change this back before merging, and all other occurrences below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I can change it back only after this PR will be merged, otherwise some tests will fail
…-to-get-an-example-pipeline
b9e91b7
to
6c03f9f
Compare
Sorry, missed it, fixed with other last comments |
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to be merged now! Nice work @DimedS 👏
Don't forget to add it to the release notes 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this manually and all looks good! Thanks you @DimedS awesome work! 🌟
expected_files += sum(example_files_count) | ||
expected_files += ( | ||
4 if "7" in add_ons_list else 0 | ||
) # add 3 .py and 1 parameters files in reporting for Viz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, these tests can be confusing to understand. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is one of the more complicated piece.
Don't remember to update the RELEASE.md and revert the hardcoded branch.
"checkout"
] = "3076-add-example-pipeline-to-hooks"
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com>
Add an option to get an example pipeline, fix and add new tests --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Description
This PR introduces an
--example
flag to thekedro new
command. Users can specify--example y/n
directly in the command line to indicate their preference. If not specified, users will be prompted interactively, with 'No' as the default selection.User flow:
If
example=No
is chosen in this scenario, data and config will be removed via hooks in the selected starter.If
example=Yes
is chosen, the 'spaceflights-starter' will be provided.If
example=No
is chosen, a standard template will be provided.example=Yes
chosen.Development notes
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file