New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2898] Include nested pipeline configuration in packaging #992
Conversation
kedro/framework/cli/pipeline.py
Outdated
@@ -942,6 +942,7 @@ def _generate_setup_file( | |||
"config/**/parameters*", | |||
"config/parameters*/**", | |||
"config/parameters*/**/*", | |||
"config/parameters*/**/**/*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in saying if we want to go another level deep we need to do the same thing?
Why not do something dynamic for a bigger spectrum?
[ f"config/parameters*/{'**/'*x}*" for x in range(0,5)]
>>> ['config/parameters*/*',
>>> 'config/parameters*/**/*',
>>> 'config/parameters*/**/**/*',
>>> 'config/parameters*/**/**/**/*',
>>> 'config/parameters*/**/**/**/**/*']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct. This implementation is consistent with how deep the nesting goes for parameters, credentials and catalog files. I guess the question is how deep we need to go and how far do we want to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**
already looks arbitrarily deep recursively: https://stackoverflow.com/questions/32604656/what-is-the-glob-character.
So actually I'm not quite sure why this doesn't already work as we want 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think we want f"parameters*/**/{pipeline_name}/**/*"
rather than what we have currently. So basically your change on line 684. That should match something that has a directory called pipeline_name
somewhere in the directory hierarchy, regardless of how deep it is. So all these should be found:
parameters*/{pipeline_name}/*
parameters*/a/{pipeline_name}/*
parameters*/{pipeline_name}/b/*
parameters*/a/{pipeline_name}/b/*
parameters*/a/b/c/d/{pipeline_name}/e/f/g/h/i/j/*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I made that change as well here: https://github.com/quantumblacklabs/kedro/blob/a134ef72bf8440800bd5e58250a18e02150bd21a/kedro/framework/cli/pipeline.py#L684 @AntonyMilneQB
However, the extra line in generate_setup_file
is also needed, because otherwise the config will still not be included in the final package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like actually recursive globs aren't supported by setuptools, so even though we use the **
syntax it doesn't do anything different from just the single *
. "The patterns here are glob-style patterns: * matches zero or more regular filename characters (on Unix, everything except forward slash; on Windows, everything except backslash and colon);" https://packaging.python.org/guides/using-manifest-in/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh that is very annoying 😬 I'm a bit surprised our tests were already passing if **
doesn't work in setup.py at all, but I guess we must have not been testing that case.
Looking at the code I think it would be best if we stopped using globs in _generate_setup_file
altogether and just use configs_to_package = _find_config_files
as the single source of truth. Looks like conf_target
should already have all the right parameters files in it, so all we should need to do is list recursively all (no need for any pattern matching) the files in that directory, pass that into _generate_setup_file
and add README.md
to the list?
Would be good to get @lorenabalan's thoughts on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with the above, if setuptools
is not doing what we want, let's compute the list of config files ourselves and just pass them to the setup.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the implementation as discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐
@@ -334,6 +334,52 @@ def test_package_modular_pipeline_with_nested_parameters( | |||
assert "retail/config/parameters/retail.yml" in wheel_contents | |||
assert "retail/config/parameters/retail_banking.yml" not in wheel_contents | |||
|
|||
def test_package_pipeline_with_deep_nested_parameters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth updating this test (or making a new one) to cover a more deeply nested version, just to check our glob is working as expected? Like "conf" / "base" / "parameters" / "a" / "b" / "c"/ "d"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Looking good! |
…org#992) Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
Description
Addresses the issue found in #914 where nested pipeline configuration files aren't included when the pipeline is packaged.
Development notes
With this change configuration files nested up to two levels deep from
/parameters
will be included. This is consistent with how we handle catalog, credential and parameter files.Checklist
RELEASE.md
fileNotice
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.