Fix mlflow_list_experiments() #3942

lorenzwalthert · 2021-01-04T20:50:15Z

What changes are proposed in this pull request?

Closes #3941. Not sure this should be classified user facing because it is a breaking change in the return value structure in some circumstances (as described in #3941), but it's not likely that it will break much code in downstream projects and in principle a minor change.

How is this patch tested?

Unit tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

tomasatdatabricks

Thanks for the contribution @lorenzwalthert!

Just one request - can you add a test case verifying we can list experiments if there is an experiment with at least 2 tags?

Otherwise looks great!

tomasatdatabricks · 2021-01-11T21:39:08Z

mlflow/R/mlflow/tests/testthat/test-tracking-experiments.R

+
+  # experiment tags are returned if every experiment has tags
+  mlflow_set_experiment_tag("key1", "value1", experiment_id = ex1)
+  mlflow_set_experiment_tag("key0", "value0", experiment_id = "0")


Can we add multiple tags to one of the experiments to verify that we can have a list of tags per experiment?

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

lorenzwalthert · 2021-01-11T22:28:36Z

Just one request - can you add a test case verifying we can list experiments if there is an experiment with at least 2 tags?

I added such a test in d24ec71 (and later fixed it in e990889).

Another thing I thought about becomes now quite obvious with this new test: The return value is a vector that does not have unique names:

c(key = "key1.2", value = "value1.2", key = "key2", value = "value2")
# operating on keys and values is a bit clumsy since `[` and `[[` will return the first key only, not all keys.
c(key = "key1.2", value = "value1.2", key = "key2", value = 'value2')['key']
#>   key 
       "key1.2"
c(key = "key1.2", value = "value1.2", key = "key2", value = 'value2')[['key']]
#> "key1.2"

This will require another transformation to bring the output in a format like this which I believe is much easier to process and more sparse:

c(key1.2 = "value1.2", key2 = "value2")
# using names() and unname(), we can now easily operate on keys and values.

Since we are breaking the (return value) API with this PR anyways, I wonder if we should - in the same wake - break it again for a better outcome... I did not include this in this PR since I thought it might have been a deliberate design decision to return what we have instead of simple key-value pairs and it would make sense to keep the R and the REST APIs consistent... What's your take on that?

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

tomasatdatabricks · 2021-01-11T23:53:49Z

Oh that's a good point. I think in general R API should follow R conventions and do whatever is natural in R, rather than closely following REST. In this particular case, can we do the same as we do for run tags?

i.e.

runs_list <- response$run %>%
    purrr::map(parse_run)
do.call("rbind", runs_list) %||% data.frame()


parse_run <- function(r) {
  info <- parse_run_info(r$info)

  info$metrics <- parse_metric_data(r$data$metrics)
  info$params <- parse_run_data(r$data$params)
  info$tags <- parse_run_data(r$data$tags)

  new_mlflow_run(info)
}

parse_run_data <- function(d) {
  if (is.null(d)) return(NA)
  d %>%
    purrr::transpose() %>%
    purrr::map(unlist) %>%
    tibble::as_tibble() %>%
    list()
}

tomasatdatabricks · 2021-01-19T18:36:19Z

Hi @lorenzwalthert, what do you think about the suggestion above?

lorenzwalthert · 2021-01-19T19:09:39Z

Hi Thomas, sorry for the delay. Trying to wrap my head around it, I realised there is another way to access run data that is related: mlflow_get_experiment(). Just dumping this here for reference, will follow up. As of before this PR (and ignoring the problem I wanted to solve in the first place), we have:

mlflow_get_run(): uses parse_run() (-> new_mlflow_run()) for REST response and then in particular parse_run_data() for tags and parameters.
mlflow_get_experiment(): no post-processing of REST response. -> pass to new_mlflow_experiment().
mlflow_list_experiments(): post-processing for tags similar to parse_run_data() but not exactly.

And as it seems quite an inconsistent way of returning tags.

# ----------------------------- generating some tags -------------------------
run_info <- mlflow::mlflow_start_run()
mlflow::mlflow_set_tag('runtag1key', 'v1')
mlflow::mlflow_set_tag('runtag2key', 'v2')
mlflow::mlflow_end_run()
#> # A tibble: 1 x 12
#>   run_uuid experiment_id user_id status start_time          end_time           
#>   <chr>    <chr>         <chr>   <chr>  <dttm>              <dttm>             
#> 1 2da8d91… 0             lorenz  FINIS… 2021-01-19 20:06:57 2021-01-19 20:06:58
#> # … with 6 more variables: artifact_uri <chr>, lifecycle_stage <chr>,
#> #   run_id <chr>, metrics <lgl>, params <lgl>, tags <list>
mlflow::mlflow_set_experiment_tag('exptag1', 'v3')
mlflow::mlflow_set_experiment_tag('exptag2', 'v4')

# ----------------------------- retrieving  -------------------------
# get_run
run <- mlflow::mlflow_get_run(run_info$run_id)
run
#> # A tibble: 1 x 12
#>   run_uuid experiment_id user_id status start_time          end_time           
#>   <chr>    <chr>         <chr>   <chr>  <dttm>              <dttm>             
#> 1 2da8d91… 0             lorenz  FINIS… 2021-01-19 20:06:57 2021-01-19 20:06:58
#> # … with 6 more variables: artifact_uri <chr>, lifecycle_stage <chr>,
#> #   run_id <chr>, metrics <lgl>, params <lgl>, tags <list>
run$tags
#> [[1]]
#> # A tibble: 5 x 2
#>   key                value    
#>   <chr>              <chr>    
#> 1 mlflow.user        lorenz   
#> 2 runtag2key         v2       
#> 3 mlflow.source.name <console>
#> 4 mlflow.source.type LOCAL    
#> 5 runtag1key         v1

# list experiments
ls_ex <- mlflow::mlflow_list_experiments()
ls_ex
#> # A tibble: 4 x 5
#>   experiment_id name   artifact_location                   lifecycle_stage tags 
#>   <chr>         <chr>  <chr>                               <chr>           <chr>
#> 1 0             Defau… /private/var/folders/6g/bg5mcmsj7s… active          expt…
#> 2 0             Defau… /private/var/folders/6g/bg5mcmsj7s… active          v3   
#> 3 0             Defau… /private/var/folders/6g/bg5mcmsj7s… active          expt…
#> 4 0             Defau… /private/var/folders/6g/bg5mcmsj7s… active          v4
ls_ex$tags
#>       key     value       key     value 
#> "exptag1"      "v3" "exptag2"      "v4"

# get expeirment
get_ex <- mlflow::mlflow_get_experiment()
get_ex
#> # A tibble: 1 x 5
#>   experiment_id name   artifact_location                 lifecycle_stage tags   
#>   <chr>         <chr>  <chr>                             <chr>           <list> 
#> 1 0             Defau… /private/var/folders/6g/bg5mcmsj… active          <named…
get_ex$tags
#> [[1]]
#> [[1]]$key
#> [1] "exptag1"
#> 
#> [[1]]$value
#> [1] "v3"
#> 
#> 
#> [[2]]
#> [[2]]$key
#> [1] "exptag2"
#> 
#> [[2]]$value
#> [1] "v4"

^{Created on 2021-01-19 by the reprex package (v0.3.0)}

lorenzwalthert · 2021-01-19T19:50:00Z

To answer your suggestion: I think it's a good idea to re-use code here. I'd only re-use parse_run_data() here since we don't have metrics and parameters with mlflow_list_experiments(), and the output apart from tags is all scalars. So we could do this

  # inside mlflow_list_experiments()
  # Return `NULL` if no experiments
  if (!length(response)) return(NULL)
  purrr::map(response$experiments, function(x) {
    x$tags <- parse_run_data(x$tags)
    tibble::as_tibble(x)
  }) %>%
    do.call(rbind, .)

If we return a tibble for every experiment (row), we also don't need to handle tidyverse/purrr#397 for which I built a workaround in previous commits and a tibble is nice because you can use tidyr::unnest() if you want to expand parameters or tibble::deframe() to get the representation I suggested above. Like this:

# adding on top of the above reprex:
mlflow::mlflow_create_experiment('test')
ex <- mlflow::mlflow_list_experiments()
ex$tags
#> [[1]]
#> # A tibble: 2 x 2
#>   key     value
#>   <chr>   <chr>
#> 1 exptag1 v3   
#> 2 exptag2 v4   
#> 
#> [[2]]
#> [1] NA

# unnest
tidyr::unnest(ex, 'tags')
#> # A tibble: 3 x 7
#>   experiment_id name   artifact_location       lifecycle_stage key   value tags 
#>   <chr>         <chr>  <chr>                   <chr>           <chr> <chr> <lgl>
#> 1 0             Defau… /private/var/folders/6… active          expt… v3    NA   
#> 2 0             Defau… /private/var/folders/6… active          expt… v4    NA   
#> 3 1             test   /private/var/folders/6… active          <NA>  <NA>  NA

# get key-value representation
tibble::deframe(ex$tags[[1]])
#> exptag1 exptag2 
#>    "v3"    "v4"

^{Created on 2021-01-19 by the reprex package (v0.3.0)}

Then, we'd have a consistent outcome for mlflow_get_run() and mlflow_list_experiments() as far as tags go. What do you think? We could also adapt mlflow_get_experiment() to work the same way (noting that this is all breaking the return API). NA as default for when there is no tags is probably better than NULL for tidyr::unnest() (as it preserves experiments without tags) and consistent with other outcomes from parse_run_data().

tomasatdatabricks · 2021-01-20T00:02:56Z

Hi @lorenzwalthert, thanks for the analysis, your suggestion sounds good to me!

And I think we should also adapt mlflow_get_experiment to use the same code as well, but it may be better to do in a followup PR.

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

…a %.% do.call(rbind, .) Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

lorenzwalthert · 2021-01-21T10:43:17Z

Cool @tomasatdatabricks, I made the adaption according to your suggestion. Also note that I had to introduce . as a global variable to use the dot feature of magrittr. This is standard practice in tidyverse packages and allows to pipe an argument into an arbitrary position of the next function call, not just the first place.

second %>%
  f(first, .)
# equivalent to 
# f(first, second)

We could also do without this feature, but I think going forward, it's good to allow this. If you disagree, I can of course also solve the problem with another assignment instead of the dot feature.

tomasatdatabricks · 2021-01-21T17:50:29Z

Thanks @lorenzwalthert!

I'll take a look. I think adding . as a global variable makes sense to me given your explanation but I'll double check with someone who is more proficient in R than myself to be sure :)

tomasatdatabricks

Looks good to me, thanks @lorenzwalthert!

I'll wait a day or two if we can get someone from R studio to review this as well before merging.

kevinykuo · 2021-01-21T21:21:17Z

Declaring . is fine here, though I think we should bring back dplyr as a dependency to simplify/optimize much of the data frame manipulation code. IIRC we removed it a couple years ago due to an unstable dependency tree (c.f. #1351) but now that it's had a 1.0 release I'm more confident with it not causing issues in that respect.

kevinykuo · 2021-01-21T21:22:41Z

To clarify I think the dplyr refactoring should happen in a separate PR after this one has been folded.

lorenzwalthert · 2021-01-21T21:35:59Z

I agree @kevinykuo, I also think it would make sense to have dplyr as a dependency (so we could have map_dfr() im that specific case) but I did not want to start that discussion here😊 also, we should consider that such a refactoring would cause merge conflicts for outstanding PRs, so I think it's best done after we merge the majority of outstanding R PRs before working on that.

tomasatdatabricks · 2021-01-21T21:41:59Z

Thanks for taking a look @kevinykuo! I am gonna merge this now.

lorenzwalthert · 2021-01-23T16:15:09Z

Follow up PR filed for mlflow_get_experiment() in #4017 @tomasatdatabricks.

* add failing tests to show bugs in mlflow_set_experiment_tag() Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * make mlflow_list_experiments() work in general Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * only comparse set, which is less strict Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * add test for multiple tags per experiment Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * fix setequal comparison by only comparing experiment 1 Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * return tibble for tags in list_experiments() if any, NA otherwise Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> * allow global variable . to make magrittr pipe work with dot, e.g. data %.% do.call(rbind, .) Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

lorenzwalthert added 2 commits January 4, 2021 21:31

add failing tests to show bugs in mlflow_set_experiment_tag()

539170f

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com> Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

make mlflow_list_experiments() work in general

2c8610b

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

github-actions bot added area/tracking language/r rn/bug-fix labels Jan 4, 2021

lorenzwalthert mentioned this pull request Jan 4, 2021

[BUG] mlflow_list_experiments() bugs with tags #3941

Closed

23 tasks

only comparse set, which is less strict

03f645a

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

tomasatdatabricks suggested changes Jan 11, 2021

View reviewed changes

add test for multiple tags per experiment

d24ec71

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

fix setequal comparison by only comparing experiment 1

e990889

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

lorenzwalthert added 2 commits January 21, 2021 10:10

return tibble for tags in list_experiments() if any, NA otherwise

6fdee05

Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

allow global variable . to make magrittr pipe work with dot, e.g. dat…

a372826

…a %.% do.call(rbind, .) Signed-off-by: Lorenz Walthert <lorenz.walthert@icloud.com>

lorenzwalthert requested a review from tomasatdatabricks January 21, 2021 12:16

tomasatdatabricks approved these changes Jan 21, 2021

View reviewed changes

tomasatdatabricks merged commit 9ca61a4 into mlflow:master Jan 21, 2021

lorenzwalthert deleted the issue-3941 branch January 22, 2021 13:04

lorenzwalthert mentioned this pull request Jan 23, 2021

mlflow_get_experiment() should return same tag structure as mlflow_list_experiments() #4017

Merged

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mlflow_list_experiments() #3942

Fix mlflow_list_experiments() #3942

lorenzwalthert commented Jan 4, 2021

tomasatdatabricks left a comment

tomasatdatabricks Jan 11, 2021

lorenzwalthert commented Jan 11, 2021 •

edited

Loading

tomasatdatabricks commented Jan 11, 2021 •

edited

Loading

tomasatdatabricks commented Jan 19, 2021

lorenzwalthert commented Jan 19, 2021 •

edited

Loading

lorenzwalthert commented Jan 19, 2021 •

edited

Loading

tomasatdatabricks commented Jan 20, 2021

lorenzwalthert commented Jan 21, 2021

tomasatdatabricks commented Jan 21, 2021

tomasatdatabricks left a comment

kevinykuo commented Jan 21, 2021

kevinykuo commented Jan 21, 2021

lorenzwalthert commented Jan 21, 2021 •

edited

Loading

tomasatdatabricks commented Jan 21, 2021

lorenzwalthert commented Jan 23, 2021

Fix mlflow_list_experiments() #3942

Fix mlflow_list_experiments() #3942

Conversation

lorenzwalthert commented Jan 4, 2021

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

tomasatdatabricks left a comment

Choose a reason for hiding this comment

tomasatdatabricks Jan 11, 2021

Choose a reason for hiding this comment

lorenzwalthert commented Jan 11, 2021 • edited Loading

tomasatdatabricks commented Jan 11, 2021 • edited Loading

tomasatdatabricks commented Jan 19, 2021

lorenzwalthert commented Jan 19, 2021 • edited Loading

lorenzwalthert commented Jan 19, 2021 • edited Loading

tomasatdatabricks commented Jan 20, 2021

lorenzwalthert commented Jan 21, 2021

tomasatdatabricks commented Jan 21, 2021

tomasatdatabricks left a comment

Choose a reason for hiding this comment

kevinykuo commented Jan 21, 2021

kevinykuo commented Jan 21, 2021

lorenzwalthert commented Jan 21, 2021 • edited Loading

tomasatdatabricks commented Jan 21, 2021

lorenzwalthert commented Jan 23, 2021

lorenzwalthert commented Jan 11, 2021 •

edited

Loading

tomasatdatabricks commented Jan 11, 2021 •

edited

Loading

lorenzwalthert commented Jan 19, 2021 •

edited

Loading

lorenzwalthert commented Jan 19, 2021 •

edited

Loading

lorenzwalthert commented Jan 21, 2021 •

edited

Loading