-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Fix mlflow_list_experiments() #3942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @lorenzwalthert!
Just one request - can you add a test case verifying we can list experiments if there is an experiment with at least 2 tags?
Otherwise looks great!
|
||
# experiment tags are returned if every experiment has tags | ||
mlflow_set_experiment_tag("key1", "value1", experiment_id = ex1) | ||
mlflow_set_experiment_tag("key0", "value0", experiment_id = "0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add multiple tags to one of the experiments to verify that we can have a list of tags per experiment?
Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
I added such a test in d24ec71 (and later fixed it in e990889). Another thing I thought about becomes now quite obvious with this new test: The return value is a vector that does not have unique names: c(key = "key1.2", value = "value1.2", key = "key2", value = "value2")
# operating on keys and values is a bit clumsy since `[` and `[[` will return the first key only, not all keys.
c(key = "key1.2", value = "value1.2", key = "key2", value = 'value2')['key']
#> key
"key1.2"
c(key = "key1.2", value = "value1.2", key = "key2", value = 'value2')[['key']]
#> "key1.2"
This will require another transformation to bring the output in a format like this which I believe is much easier to process and more sparse: c(key1.2 = "value1.2", key2 = "value2")
# using names() and unname(), we can now easily operate on keys and values. Since we are breaking the (return value) API with this PR anyways, I wonder if we should - in the same wake - break it again for a better outcome... I did not include this in this PR since I thought it might have been a deliberate design decision to return what we have instead of simple key-value pairs and it would make sense to keep the R and the REST APIs consistent... What's your take on that? |
Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
Oh that's a good point. I think in general R API should follow R conventions and do whatever is natural in R, rather than closely following REST. In this particular case, can we do the same as we do for run tags? i.e.
|
Hi @lorenzwalthert, what do you think about the suggestion above? |
Hi Thomas, sorry for the delay. Trying to wrap my head around it, I realised there is another way to access run data that is related:
And as it seems quite an inconsistent way of returning tags. # ----------------------------- generating some tags -------------------------
run_info <- mlflow::mlflow_start_run()
mlflow::mlflow_set_tag('runtag1key', 'v1')
mlflow::mlflow_set_tag('runtag2key', 'v2')
mlflow::mlflow_end_run()
#> # A tibble: 1 x 12
#> run_uuid experiment_id user_id status start_time end_time
#> <chr> <chr> <chr> <chr> <dttm> <dttm>
#> 1 2da8d91… 0 lorenz FINIS… 2021-01-19 20:06:57 2021-01-19 20:06:58
#> # … with 6 more variables: artifact_uri <chr>, lifecycle_stage <chr>,
#> # run_id <chr>, metrics <lgl>, params <lgl>, tags <list>
mlflow::mlflow_set_experiment_tag('exptag1', 'v3')
mlflow::mlflow_set_experiment_tag('exptag2', 'v4')
# ----------------------------- retrieving -------------------------
# get_run
run <- mlflow::mlflow_get_run(run_info$run_id)
run
#> # A tibble: 1 x 12
#> run_uuid experiment_id user_id status start_time end_time
#> <chr> <chr> <chr> <chr> <dttm> <dttm>
#> 1 2da8d91… 0 lorenz FINIS… 2021-01-19 20:06:57 2021-01-19 20:06:58
#> # … with 6 more variables: artifact_uri <chr>, lifecycle_stage <chr>,
#> # run_id <chr>, metrics <lgl>, params <lgl>, tags <list>
run$tags
#> [[1]]
#> # A tibble: 5 x 2
#> key value
#> <chr> <chr>
#> 1 mlflow.user lorenz
#> 2 runtag2key v2
#> 3 mlflow.source.name <console>
#> 4 mlflow.source.type LOCAL
#> 5 runtag1key v1
# list experiments
ls_ex <- mlflow::mlflow_list_experiments()
ls_ex
#> # A tibble: 4 x 5
#> experiment_id name artifact_location lifecycle_stage tags
#> <chr> <chr> <chr> <chr> <chr>
#> 1 0 Defau… /private/var/folders/6g/bg5mcmsj7s… active expt…
#> 2 0 Defau… /private/var/folders/6g/bg5mcmsj7s… active v3
#> 3 0 Defau… /private/var/folders/6g/bg5mcmsj7s… active expt…
#> 4 0 Defau… /private/var/folders/6g/bg5mcmsj7s… active v4
ls_ex$tags
#> key value key value
#> "exptag1" "v3" "exptag2" "v4"
# get expeirment
get_ex <- mlflow::mlflow_get_experiment()
get_ex
#> # A tibble: 1 x 5
#> experiment_id name artifact_location lifecycle_stage tags
#> <chr> <chr> <chr> <chr> <list>
#> 1 0 Defau… /private/var/folders/6g/bg5mcmsj… active <named…
get_ex$tags
#> [[1]]
#> [[1]]$key
#> [1] "exptag1"
#>
#> [[1]]$value
#> [1] "v3"
#>
#>
#> [[2]]
#> [[2]]$key
#> [1] "exptag2"
#>
#> [[2]]$value
#> [1] "v4" Created on 2021-01-19 by the reprex package (v0.3.0) |
To answer your suggestion: I think it's a good idea to re-use code here. I'd only re-use # inside mlflow_list_experiments()
# Return `NULL` if no experiments
if (!length(response)) return(NULL)
purrr::map(response$experiments, function(x) {
x$tags <- parse_run_data(x$tags)
tibble::as_tibble(x)
}) %>%
do.call(rbind, .) If we return a tibble for every experiment (row), we also don't need to handle tidyverse/purrr#397 for which I built a workaround in previous commits and a tibble is nice because you can use # adding on top of the above reprex:
mlflow::mlflow_create_experiment('test')
ex <- mlflow::mlflow_list_experiments()
ex$tags
#> [[1]]
#> # A tibble: 2 x 2
#> key value
#> <chr> <chr>
#> 1 exptag1 v3
#> 2 exptag2 v4
#>
#> [[2]]
#> [1] NA
# unnest
tidyr::unnest(ex, 'tags')
#> # A tibble: 3 x 7
#> experiment_id name artifact_location lifecycle_stage key value tags
#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 0 Defau… /private/var/folders/6… active expt… v3 NA
#> 2 0 Defau… /private/var/folders/6… active expt… v4 NA
#> 3 1 test /private/var/folders/6… active <NA> <NA> NA
# get key-value representation
tibble::deframe(ex$tags[[1]])
#> exptag1 exptag2
#> "v3" "v4" Created on 2021-01-19 by the reprex package (v0.3.0) Then, we'd have a consistent outcome for |
Hi @lorenzwalthert, thanks for the analysis, your suggestion sounds good to me! And I think we should also adapt mlflow_get_experiment to use the same code as well, but it may be better to do in a followup PR. |
Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
…a %.% do.call(rbind, .) Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>
Cool @tomasatdatabricks, I made the adaption according to your suggestion. Also note that I had to introduce second %>%
f(first, .)
# equivalent to
# f(first, second) We could also do without this feature, but I think going forward, it's good to allow this. If you disagree, I can of course also solve the problem with another assignment instead of the dot feature. |
Thanks @lorenzwalthert! I'll take a look. I think adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @lorenzwalthert!
I'll wait a day or two if we can get someone from R studio to review this as well before merging.
Declaring |
To clarify I think the dplyr refactoring should happen in a separate PR after this one has been folded. |
I agree @kevinykuo, I also think it would make sense to have dplyr as a dependency (so we could have |
Thanks for taking a look @kevinykuo! I am gonna merge this now. |
Follow up PR filed for |
* add failing tests to show bugs in mlflow_set_experiment_tag() Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * make mlflow_list_experiments() work in general Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * only comparse set, which is less strict Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * add test for multiple tags per experiment Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * fix setequal comparison by only comparing experiment 1 Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * return tibble for tags in list_experiments() if any, NA otherwise Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * allow global variable . to make magrittr pipe work with dot, e.g. data %.% do.call(rbind, .) Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
* add failing tests to show bugs in mlflow_set_experiment_tag() Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * make mlflow_list_experiments() work in general Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * only comparse set, which is less strict Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * add test for multiple tags per experiment Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * fix setequal comparison by only comparing experiment 1 Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * return tibble for tags in list_experiments() if any, NA otherwise Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * allow global variable . to make magrittr pipe work with dot, e.g. data %.% do.call(rbind, .) Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
What changes are proposed in this pull request?
Closes #3941. Not sure this should be classified user facing because it is a breaking change in the return value structure in some circumstances (as described in #3941), but it's not likely that it will break much code in downstream projects and in principle a minor change.
How is this patch tested?
Unit tests
Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: Local serving, model deployment tools, spark UDFsarea/server-infra
: MLflow server, JavaScript dev serverarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, JavaScript, plottingarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes