New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2898] Include nested pipeline configuration in packaging #992
Changes from 2 commits
28ca764
3d44b10
a134ef7
ccbf69d
4258d03
4377a66
c8f36be
0ffcb49
edfe060
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -312,6 +312,52 @@ def test_package_modular_pipeline_with_nested_parameters( | |
assert "retail/config/parameters/retail.yml" in wheel_contents | ||
assert "retail/config/parameters/retail_banking.yml" not in wheel_contents | ||
|
||
def test_package_pipeline_with_deep_nested_parameters( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe worth updating this test (or making a new one) to cover a more deeply nested version, just to check our glob is working as expected? Like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
self, fake_repo_path, fake_project_cli, fake_metadata | ||
): | ||
CliRunner().invoke( | ||
fake_project_cli, ["pipeline", "create", "retail"], obj=fake_metadata | ||
) | ||
deep_nested_param_path = Path( | ||
fake_repo_path / "conf" / "base" / "parameters" / "deep" / "retail" | ||
) | ||
deep_nested_param_path.mkdir(parents=True, exist_ok=True) | ||
(deep_nested_param_path / "params1.yml").touch() | ||
|
||
deep_nested_param_path2 = Path( | ||
fake_repo_path / "conf" / "base" / "parameters" / "retail" / "deep" | ||
) | ||
deep_nested_param_path2.mkdir(parents=True, exist_ok=True) | ||
(deep_nested_param_path2 / "params1.yml").touch() | ||
|
||
deep_nested_param_path3 = Path( | ||
fake_repo_path / "conf" / "base" / "parameters" / "deep" | ||
) | ||
deep_nested_param_path3.mkdir(parents=True, exist_ok=True) | ||
(deep_nested_param_path3 / "retail.yml").touch() | ||
|
||
result = CliRunner().invoke( | ||
fake_project_cli, ["pipeline", "package", "retail"], obj=fake_metadata | ||
) | ||
|
||
assert result.exit_code == 0 | ||
assert "Pipeline `retail` packaged!" in result.output | ||
|
||
wheel_location = fake_repo_path / "src" / "dist" | ||
assert f"Location: {wheel_location}" in result.output | ||
|
||
wheel_name = _get_wheel_name(name="retail", version="0.1") | ||
wheel_file = wheel_location / wheel_name | ||
assert wheel_file.is_file() | ||
assert len(list(wheel_location.iterdir())) == 1 | ||
|
||
# pylint: disable=consider-using-with | ||
wheel_contents = set(ZipFile(str(wheel_file)).namelist()) | ||
assert "retail/config/parameters/deep/retail/params1.yml" in wheel_contents | ||
assert "retail/config/parameters/retail/deep/params1.yml" in wheel_contents | ||
assert "retail/config/parameters/retail.yml" in wheel_contents | ||
assert "retail/config/parameters/deep/retail.yml" in wheel_contents | ||
|
||
def test_pipeline_package_version( | ||
self, fake_repo_path, fake_package_path, fake_project_cli, fake_metadata | ||
): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in saying if we want to go another level deep we need to do the same thing?
Why not do something dynamic for a bigger spectrum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct. This implementation is consistent with how deep the nesting goes for parameters, credentials and catalog files. I guess the question is how deep we need to go and how far do we want to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**
already looks arbitrarily deep recursively: https://stackoverflow.com/questions/32604656/what-is-the-glob-character.So actually I'm not quite sure why this doesn't already work as we want 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think we want
f"parameters*/**/{pipeline_name}/**/*"
rather than what we have currently. So basically your change on line 684. That should match something that has a directory calledpipeline_name
somewhere in the directory hierarchy, regardless of how deep it is. So all these should be found:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I made that change as well here: https://github.com/quantumblacklabs/kedro/blob/a134ef72bf8440800bd5e58250a18e02150bd21a/kedro/framework/cli/pipeline.py#L684 @AntonyMilneQB
However, the extra line in
generate_setup_file
is also needed, because otherwise the config will still not be included in the final package.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like actually recursive globs aren't supported by setuptools, so even though we use the
**
syntax it doesn't do anything different from just the single*
. "The patterns here are glob-style patterns: * matches zero or more regular filename characters (on Unix, everything except forward slash; on Windows, everything except backslash and colon);" https://packaging.python.org/guides/using-manifest-in/There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh that is very annoying 😬 I'm a bit surprised our tests were already passing if
**
doesn't work in setup.py at all, but I guess we must have not been testing that case.Looking at the code I think it would be best if we stopped using globs in
_generate_setup_file
altogether and just useconfigs_to_package = _find_config_files
as the single source of truth. Looks likeconf_target
should already have all the right parameters files in it, so all we should need to do is list recursively all (no need for any pattern matching) the files in that directory, pass that into_generate_setup_file
and addREADME.md
to the list?Would be good to get @lorenabalan's thoughts on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with the above, if
setuptools
is not doing what we want, let's compute the list of config files ourselves and just pass them to thesetup.py
file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the implementation as discussed.