Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing: Support dict unpacking in cmd. #7907

Merged
merged 1 commit into from Jul 5, 2022

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Jun 16, 2022

Allow to use dictionaries as values for template interpolation but only inside the cmd key.

Given the following params.yaml and dvc.yaml:

# params.yaml
dict:
  foo: foo
  bar: bar
# dvc.yaml
stages:
  stage1:
    cmd: python script.py ${dict}

The dictionary will be unpacked with the following syntax:

# dvc.yaml
stages:
  stage1:
    cmd: python script.py --foo foo --bar bar

Closes #6107

@daavoo daavoo linked an issue Jun 16, 2022 that may be closed by this pull request
@daavoo daavoo requested a review from dberenbaum June 16, 2022 19:01
@skshetry
Copy link
Member

skshetry commented Jun 17, 2022

Any recent motivations for this? 🙂

@daavoo
Copy link
Contributor Author

daavoo commented Jun 17, 2022

Any recent motivations for this? 🙂

When exploring how to better integrate dvc params with ML frameworks, this showed as low-hanging fruit for some frameworks like Yolov5 (linked issue), HuggingFace , PyTorch Lightning, etc.
In addition, the default syntax is good enough for custom training scripts using simple argparse and having more flexibility of configuring cmd calls of CLI tools without having to edit dvc.yaml for each iteration (adding/removing argument)

@daavoo daavoo self-assigned this Jun 17, 2022
@daavoo daavoo added A: templating Related to the templating feature feature is a feature labels Jun 17, 2022
@dberenbaum

This comment was marked as resolved.

@daavoo daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from cc627fb to b9daa31 Compare June 20, 2022 09:08
@daavoo daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch 3 times, most recently from 6a76372 to a2ea47d Compare June 21, 2022 09:47
@daavoo daavoo marked this pull request as ready for review June 21, 2022 10:49
@daavoo daavoo requested a review from a team as a code owner June 21, 2022 10:49
@daavoo daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from a2ea47d to e4af3f4 Compare June 21, 2022 10:49
@daavoo daavoo requested a review from a team as a code owner June 21, 2022 10:49
@daavoo daavoo requested review from karajan1001 and efiop June 21, 2022 10:49
def _(obj: dict):
result = ""
for k, v in flatten(obj).items():
if isinstance(v, bool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for argparse.BooleanOptionalAction not for store_true/false. Maybe need some better way to handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue about different options, as commented below #7907 (comment)

Which option do you think we should consider as the most appropriate default?

Note that this can be used for things beyond argparse, for example you can use the interpolation to pass flags to arbitrary executables that you call inside cmd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added config.parsing.list with store_true and boolean_optional .

Not sure about store_false because the interaction with dvc params would look kind of strange, IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have no idea about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's more than enough to support for now.

raise ParseError(
f"Cannot interpolate nested iterable in '{k}'"
)
result += f"--{k} {i} "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> import argparse
> parser = argparse.ArgumentParser()
> parser.add_argument('--foo', nargs='*')
> parser.parse_args('--foo x y'.split())
Namespace(foo=['x', 'y'])

> parser.parse_args('--foo x --foo y'.split())
Namespace(foo=['y'])

Looks like setting arguments for multi times will overwrite the previous one.

Copy link
Contributor Author

@daavoo daavoo Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to cover the action='append' mode:

>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', action='append')
>>> parser.parse_args('--foo 1 --foo 2'.split())
Namespace(foo=['1', '2'])

Maybe it's a matter of clearly stating it in the docs.
Ideally, we should support all options but I am not sure how to do it. We could add a bunch of config options or just allow users to register a custom to_str method for interpolation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pick one behavior for now. Is it possible to see which one is more widely used in existing ML CLIs? I would guess the nargs --foo 1 2 behavior is more common, even though I find the explicit append behavior easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added config.parsing.list with nargs and append options. It doesn't look that bad to have the options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this, it's a bad idea to make dvc.yaml parsing conditional, it should be self-contained.

@daavoo daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from e4af3f4 to b41e148 Compare June 23, 2022 15:07


@pytest.mark.parametrize(
"bool_config", [None, "store_true", "boolean_optional"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Maybe we only need to test two different conditions? The None can be tested through some other method (Test if the default value is set correctly)?
  2. Because these two configurations are independent, maybe we can test them in an (I'm not sure how to describe it) instead of using a product. ["a1b1", "a2b2"] instead of ["a1", "a2"] x ["b1", "b2"].
    The two suggestions above can reduce the amount of the test from 9 to 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to 3 test cases (["a1b1", "a2b2"]). I still included the default in the same test, not sure if it's worth it to separate it as the body of the test would be the same

@karajan1001
Copy link
Contributor

LGTM for the most part, another concern is that the two new configurations still can't solve all of the problems as two different types of arg might exist in one command.

Allow to use dictionaries as values for template interpolation but only inside the `cmd` key.

See tests/func/parsing/test_interpolated_entry.py::test_cmd_dict for detailed syntax.

Add `config.parsing` section for configuring behavior of ambiguous data types
like booleans and lists.
@daavoo daavoo force-pushed the 6107-more-flexible-dvcyaml-parameterisation branch from b41e148 to 7fac181 Compare July 4, 2022 08:02
@daavoo daavoo requested a review from karajan1001 July 4, 2022 08:04
@daavoo
Copy link
Contributor Author

daavoo commented Jul 4, 2022

another concern is that the two new configurations still can't solve all of the problems as two different types of arg might exist in one command.

Indeed. There are many unsolved issues but the idea for this initial P.R. is to cover basic scenarios to don't overcomplicate the logic until users request for specific issues

@daavoo daavoo merged commit 9099413 into main Jul 5, 2022
@daavoo daavoo deleted the 6107-more-flexible-dvcyaml-parameterisation branch July 5, 2022 07:02
daavoo added a commit to iterative/dvc.org that referenced this pull request Jul 5, 2022
daavoo added a commit to iterative/dvc.org that referenced this pull request Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: templating Related to the templating feature feature is a feature
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

More flexible dvc.yaml parameterisation
4 participants