Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Add params_show. #7613

Merged
merged 4 commits into from Jun 21, 2022
Merged

api: Add params_show. #7613

merged 4 commits into from Jun 21, 2022

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Apr 21, 2022

Closes #6507

Uses repo.params.show with custom error_handler and postprocess the outputs for a more user-friendly structure.

Extend repo.params.show to accept stages argument to cover the "params of current stage" use case.

dvc.org P.R: iterative/dvc.org#3459


Examples

  • No args.

Will use the current project as repo and retrieve all parameters,
for all stages, for the current revision.

>>> import json
>>> import dvc.api
>>> params = dvc.api.params_show()
>>> print(json.dumps(params, indent=4))
{
    "prepare": {
        "split": 0.2,
        "seed": 20170428
    },
    "featurize": {
        "max_features": 10000,
        "ngrams": 2
    },
    "train": {
        "seed": 20170428,
        "n_est": 50,
        "min_split": 0.01
    }
}
  • Filter with stages.
>>> import json
>>> import dvc.api
>>> params = dvc.api.params_show(stages="prepare")
>>> print(json.dumps(params, indent=4))
{
    "prepare": {
        "split": 0.2,
        "seed": 20170428
    }
}
  • Git URL as repo.
>>> import json
>>> import dvc.api
>>> params = dvc.api.params_show(
...     repo="https://github.com/iterative/demo-fashion-mnist")
{
    "train": {
        "batch_size": 128,
        "hidden_units": 64,
        "dropout": 0.4,
        "num_epochs": 10,
        "lr": 0.001,
        "conv_activation": "relu"
    }
}
  • Using rev.
>>> import json
>>> import dvc.api
>>> params = dvc.api.get_params(
...     repo="https://github.com/iterative/demo-fashion-mnist",
...     rev="low-lr-experiment")
>>> print(json.dumps(params, indent=4))
{
      "train": {
          "batch_size": 128,
          "hidden_units": 64,
          "dropout": 0.4,
          "num_epochs": 10,
          "lr": 0.001,
          "conv_activation": "relu"
      }
}

@daavoo daavoo requested a review from a team as a code owner April 21, 2022 12:00
@daavoo daavoo requested a review from skshetry April 21, 2022 12:00
@daavoo daavoo self-assigned this Apr 21, 2022
@daavoo daavoo added A: api Related to the dvc.api feature is a feature labels Apr 21, 2022
@daavoo daavoo force-pushed the api-read-params branch 3 times, most recently from 104090a to b4766c9 Compare April 21, 2022 12:04
dvc/api.py Outdated Show resolved Hide resolved
dvc/api.py Outdated Show resolved Hide resolved
@daavoo daavoo marked this pull request as draft May 5, 2022 19:33
@dberenbaum

This comment was marked as outdated.

@efiop

This comment was marked as off-topic.

@daavoo

This comment was marked as outdated.

@pmrowla

This comment was marked as outdated.

@efiop

This comment was marked as outdated.

@efiop

This comment was marked as off-topic.

@daavoo

This comment was marked as outdated.

@daavoo

This comment was marked as off-topic.

@daavoo

This comment was marked as off-topic.

@efiop

This comment was marked as outdated.

@daavoo

This comment was marked as outdated.

@skshetry

This comment was marked as off-topic.

@pmrowla

This comment was marked as outdated.

@daavoo

This comment was marked as outdated.

@daavoo daavoo changed the title api: Add get_params. api: Add params_show. Jun 15, 2022
@daavoo daavoo force-pushed the api-read-params branch 2 times, most recently from c59e5dd to 826c935 Compare June 15, 2022 10:42
dvc/api.py Outdated
@@ -92,6 +94,160 @@ def read(path, repo=None, rev=None, remote=None, mode="r", encoding=None):
return fd.read()


def params_show(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why? It’s a getter. Again, I am confused with porcelain/API discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry The argument during the last retro discussion was that this is at least following the CLI conventions. Could consider adding get_params as an alias though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but is dvc.api the place to have these CLI wrappers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI wrappers?

What do you call CLI wrappers? I think the idea is that we follow CLI conventions regarding names, but this doesn't mean that the API is a CLI wrapper.
This API uses the internal Python API, raises exceptions, and returns a structure meant to be used in Python (instead of the internal structure returned by CLI --json options, for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, my vote would be get_params because the most immediate use case here is to onboard new users who aren't already familiar with the CLI.

However, I would rather get this merged than continue to discuss naming.

@daavoo daavoo requested review from dberenbaum and efiop June 15, 2022 10:44
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 16, 2022

Filter with stages.

1 Could DVC detect a default stage based on the name of the file you're in? Then the no-args run would auto-filter that one stage's params specifically. Add params_show(all=True) to load all.

2 And maybe when loading a single stage's params, the JSON structure shouldn't have its name as key (so the code never needs to know it). E.g.

# prepare.py
import json
import dvc.api

params = dvc.api.params_show()
print(json.dumps(params))
{
  "split": 0.2,
  "seed": 20170428
}
# use params['split'] instead of params[list(params.keys())[0]]['split']

I guess the inconsistent structure for different calls may be a problem though.

dvc/api.py Outdated
@@ -214,6 +215,268 @@ def read(path, repo=None, rev=None, remote=None, mode="r", encoding=None):
return fd.read()


def params_show(
*targets: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it very relevant to pick by param files? Maybe the natural targets should be param names (return simple dict) or even stage names and this could be a secondary optional arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it very relevant to pick by param files?

In all dvc {x} show / dvc {x} diff commands, targets are files, i.e. https://dvc.org/doc/command-reference/metrics/show

Maybe the natural targets should be param names (return simple dict)

Not sure I follow what this would look like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that stage names are likely most useful here, but we are also trying to mimic the CLI. It's a good idea @jorgeorpinel but there has already been a lot of related discussions and I would rather get something merged than try to optimize the order of args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to follow the CLI.

@daavoo
Copy link
Contributor Author

daavoo commented Jun 17, 2022

And maybe when loading a single stage's params, the JSON structure shouldn't have its name as key (so the code never needs to know it

@jorgeorpinel We don't include the stage name, this is just how the example-get-started params are set up where there are sections inside the params file matching the name of the stage:

https://github.com/iterative/example-get-started/blob/a470fae5c0ccf71641e10d99e69e97534c3d811e/params.yaml#L1-L4

@daavoo
Copy link
Contributor Author

daavoo commented Jun 17, 2022

Could DVC detect a default stage based on the name of the file you're in? Then the no-args run would auto-filter that one stage's params specifically. Add params_show(all=True) to load all.

This would be quite a big assumption to make and not sure how common users have this setup.
stages argument was added to support the use case of the filter by current stage, I don't think the ambiguity of trying to figure it out the current stage would be worth comparing to requiring users to "hardcode" the stage name in the call

Comment on lines +125 to +128
targets=None,
deps=False,
onerror: Callable = None,
stages=None,
Copy link
Member

@efiop efiop Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do we need targets and stages? It it because targets should be paths? But what if you have stages with same names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does stages here accept stage at a custom path? It appears it doesn't, unless I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need targets and stages?

Because they are not mutually exclusive. They have to be decoupled to support several scenarios, for example:

  • A stage (stage_A) uses params from different targets (target_A, target_B).

params_show(target_A, stage=stage_A)

  • Different parts of a single target (target_A) can be used for multiple stages (stage_A, stage_B).

params_show(target_A, stage=stage_B)

But what if you have stages with same names?

I am not sure I follow. The targets and stages filters are applied separately, so it would be up to the user to provide unambiguous arguments.

Does stages here accept stage at a custom path?

What's is a stage at a custom path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's is a stage at a custom path?

Like path/to/dvc.yaml:mystage. Because mystage could be defined in multiple dvc.yaml at different levels.

Copy link
Contributor Author

@daavoo daavoo Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like path/to/dvc.yaml:mystage. Because mystage could be defined in multiple dvc.yaml at different levels.

I updated to filter against stage.addressing instead of stage.name. The former includes the path/to/dvc.yaml: prefix when there are conflicts.

I tested with multiple dvc.yaml using same stage name and it works as expected when including the prefix in the argument

dvc/api.py Outdated Show resolved Hide resolved
Defaults to `False`.
If `True`, multiple `outs` sharing a provided `target_path` will not be filtered.
Closes #6507

Uses `repo.params.show` with custom error_handler and postprocess the outputs for more user-friendly structure.

Extend `repo.params.show` to accept `stages` argument to cover the "params of current stage" use case.
tests/func/test_api.py Outdated Show resolved Hide resolved
Split into sub-modules.

Split tests.
Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @daavoo ! Let's run with this and change on top if we'll need anything.

@efiop efiop merged commit 20f69a1 into main Jun 21, 2022
@efiop efiop deleted the api-read-params branch June 21, 2022 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: api Related to the dvc.api feature is a feature
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Expose params/ ParamsDependency in the Python API.
7 participants