Add optional experiment_id parameter to `mlflow.set_experiment` #5012

dbczumar · 2021-11-05T07:49:31Z

What changes are proposed in this pull request?

Add optional experiment_id parameter to mlflow.set_experiment, allowing users to set an experiment by ID.

How is this patch tested?

Included unit tests

Release Notes

Add an optional experiment_id parameter to mlflow.set_experiment(), enabling users to set an experiment by ID.

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: dbczumar <corey.zumar@databricks.com>

BenWilson2 · 2021-11-05T14:35:46Z

tests/tracking/test_tracking.py

+
+
+def test_set_experiment_parameter_validation():
+    with pytest.raises(MlflowException) as exc:


Perhaps use the optional match= in pytest.raises to do regex matching for the exception error message?

with pytest.raises.(MLflowException, match="Must specify exactly one") as exc: mlflow.set_experiment() assert exc.value.error_Code == ErrorCode.Name(INVALID_PARAMETER_VALUE)

Haru asked that I change a few newer unit tests to use this pattern.

Awesome suggestion! Done!

jinzhang21

Thanks for making the change, Corey! LGTM!

jinzhang21 · 2021-11-05T17:13:01Z

mlflow/tracking/fluent.py

        )
+
+    def verify_experiment_active(experiment):
+        if experiment.lifecycle_stage == LifecycleStage.DELETED:


if experiment.lifecycle_stage != LifecycleStage.ACTIVE:
to be more specific, since the name of the function is to verify it's active or not?

Done! Good idea :)

jinzhang21 · 2021-11-05T17:15:23Z

mlflow/tracking/fluent.py

+    if experiment_id is None:
+        experiment = client.get_experiment_by_name(experiment_name)
+        if experiment:
+            verify_experiment_active(experiment)


How about verify outside of the if-else block?

I think we might need to check if experiment is present to check outside of the if block or we need to replicate that if experiment check in the verify_experiment_active function as far as I can tell.

The challenge here is that we'd need to re-fetch the experiment in order to perform the activity check in the case where a new experiment was created, since client.create_experiment returns an experiment ID, rather than the whole experiment entity. In the interest of saving a network call, I've embedded verify_experiment_active within a couple interior branches.

jinzhang21 · 2021-11-05T17:17:55Z

mlflow/tracking/fluent.py

+                "Experiment with name '%s' does not exist. Creating a new experiment.",
+                experiment_name,
+            )
+            experiment_id = client.create_experiment(experiment_name)


There's a potential race condition here between client.get_experiment_by_name() and client.create_experiment(experiment_name). It could cause issues in distributed logging situation. I don't think you need to fix it here, but might be good to annotate the code to make future life easier.

This is a good call-out. It's worth noting that the MLflow fluent API in general is not meant to be safe across threads or processes. I've added a note here nonetheless.

jinzhang21 · 2021-11-05T17:19:05Z

mlflow/tracking/fluent.py

+            )
+            experiment_id = client.create_experiment(experiment_name)
+    else:
+        experiment = client.get_experiment(experiment_id)


Does this function throw if experiment_id doesn't exist?

404 in REST and throws in file_store and sqlalchemy_store

Thanks @BenWilson2 ! Yes, while this throws for MLflow-native stores, it may not always throw for alternative stores. I don't think the None check is particularly problematic here as an extra layer of defense.

jinzhang21 · 2021-11-05T17:21:44Z

mlflow/tracking/fluent.py

-        print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name))
-        exp_id = client.create_experiment(experiment_name)
-    elif experiment.lifecycle_stage == LifecycleStage.DELETED:
+    if (experiment_name is not None and experiment_id is not None) or (


Is empty string legit for either name or id? I guess not. In that case, it might be better to verify
if not (experiment_name and experiment_id)

I am curious on why do we need this condition? We just need experiment_name is None and experiment_id is None right? Let me know if I am missing something.

Wouldn't the guarded exclusivity check protect against conflicting submission behavior. For an invalid id but a valid name it would raise an Exception based on the id validation. Could be super confusing for users.

Is empty string legit for either name or id? I guess not. In that case, it might be better to verify
if not (experiment_name and experiment_id)

I would imagine that it's invalid on most if not all backends, but I'm hesitant to enforce this on the off chance that someone's third-party backend has a legitimate use case for this. For example, MLflow's default experiment used to be a falsey integer value (0) before the change to string representations.

Because the same logic applies for experiment names, I've removed the test case asserting that an empty string name is invalid, as that's backend-dependent.

sunishsheth2009 · 2021-11-05T17:47:30Z

mlflow/tracking/fluent.py

-        print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name))
-        exp_id = client.create_experiment(experiment_name)
-    elif experiment.lifecycle_stage == LifecycleStage.DELETED:
+    if (experiment_name is not None and experiment_id is not None) or (


I am curious on why do we need this condition? We just need experiment_name is None and experiment_id is None right? Let me know if I am missing something.

sunishsheth2009 · 2021-11-05T17:49:43Z

mlflow/tracking/fluent.py

+    if experiment_id is None:
+        experiment = client.get_experiment_by_name(experiment_name)
+        if experiment:
+            verify_experiment_active(experiment)


I think we might need to check if experiment is present to check outside of the if block or we need to replicate that if experiment check in the verify_experiment_active function as far as I can tell.

Signed-off-by: dbczumar <corey.zumar@databricks.com>

dbczumar

@jinzhang21 @BenWilson2 @sunishsheth2009 Thank you for your thorough reviews! I've addressed your comments. I'm going to push one more change so that set_experiment also returns the experiment entity.

dbczumar · 2021-11-05T20:15:58Z

mlflow/tracking/fluent.py

        )
+
+    def verify_experiment_active(experiment):
+        if experiment.lifecycle_stage == LifecycleStage.DELETED:


Done! Good idea :)

dbczumar · 2021-11-05T20:17:04Z

mlflow/tracking/fluent.py

+    if experiment_id is None:
+        experiment = client.get_experiment_by_name(experiment_name)
+        if experiment:
+            verify_experiment_active(experiment)


The challenge here is that we'd need to re-fetch the experiment in order to perform the activity check in the case where a new experiment was created, since client.create_experiment returns an experiment ID, rather than the whole experiment entity. In the interest of saving a network call, I've embedded verify_experiment_active within a couple interior branches.

dbczumar · 2021-11-05T20:17:52Z

mlflow/tracking/fluent.py

+                "Experiment with name '%s' does not exist. Creating a new experiment.",
+                experiment_name,
+            )
+            experiment_id = client.create_experiment(experiment_name)


This is a good call-out. It's worth noting that the MLflow fluent API in general is not meant to be safe across threads or processes. I've added a note here nonetheless.

dbczumar · 2021-11-05T20:19:40Z

mlflow/tracking/fluent.py

+            )
+            experiment_id = client.create_experiment(experiment_name)
+    else:
+        experiment = client.get_experiment(experiment_id)


Thanks @BenWilson2 ! Yes, while this throws for MLflow-native stores, it may not always throw for alternative stores. I don't think the None check is particularly problematic here as an extra layer of defense.

dbczumar · 2021-11-05T20:19:49Z

tests/tracking/test_tracking.py

+
+
+def test_set_experiment_parameter_validation():
+    with pytest.raises(MlflowException) as exc:


Awesome suggestion! Done!

dbczumar · 2021-11-05T20:22:41Z

mlflow/tracking/fluent.py

-        print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name))
-        exp_id = client.create_experiment(experiment_name)
-    elif experiment.lifecycle_stage == LifecycleStage.DELETED:
+    if (experiment_name is not None and experiment_id is not None) or (


Because the same logic applies for experiment names, I've removed the test case asserting that an empty string name is invalid, as that's backend-dependent.

Signed-off-by: dbczumar <corey.zumar@databricks.com>

BenWilson2 · 2021-11-05T21:16:40Z

mlflow/tracking/fluent.py


    global _active_experiment_id
-    _active_experiment_id = experiment_id
+    _active_experiment_id = experiment.experiment_id
+    return experiment


This actually simplifies some MLOps pipeline flows! A bonus feature is always a great thing

BenWilson2

LGTM!

dbczumar added 3 commits November 5, 2021 00:24

Impl

11283dc

Signed-off-by: dbczumar <corey.zumar@databricks.com>

PR

bb3ee5b

Signed-off-by: dbczumar <corey.zumar@databricks.com>

Format

614a0f8

Signed-off-by: dbczumar <corey.zumar@databricks.com>

dbczumar requested a review from jinzhang21 November 5, 2021 07:49

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Nov 5, 2021

BenWilson2 reviewed Nov 5, 2021

View reviewed changes

jinzhang21 approved these changes Nov 5, 2021

View reviewed changes

sunishsheth2009 approved these changes Nov 5, 2021

View reviewed changes

Address comments

fee73f9

Signed-off-by: dbczumar <corey.zumar@databricks.com>

dbczumar commented Nov 5, 2021

View reviewed changes

Return experiment

34ad0a0

Signed-off-by: dbczumar <corey.zumar@databricks.com>

BenWilson2 reviewed Nov 5, 2021

View reviewed changes

BenWilson2 approved these changes Nov 5, 2021

View reviewed changes

dbczumar merged commit 0f9d1e0 into mlflow:master Nov 5, 2021

himkt mentioned this pull request Nov 30, 2021

Avoid installing the latest MLfow to prevent doctests from failing optuna/optuna#3135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional experiment_id parameter to `mlflow.set_experiment` #5012

Add optional experiment_id parameter to `mlflow.set_experiment` #5012

dbczumar commented Nov 5, 2021

BenWilson2 Nov 5, 2021

dbczumar Nov 5, 2021

jinzhang21 left a comment

jinzhang21 Nov 5, 2021

dbczumar Nov 5, 2021

jinzhang21 Nov 5, 2021

sunishsheth2009 Nov 5, 2021

dbczumar Nov 5, 2021

jinzhang21 Nov 5, 2021

dbczumar Nov 5, 2021

jinzhang21 Nov 5, 2021

BenWilson2 Nov 5, 2021

dbczumar Nov 5, 2021

jinzhang21 Nov 5, 2021

sunishsheth2009 Nov 5, 2021

BenWilson2 Nov 5, 2021

dbczumar Nov 5, 2021

dbczumar Nov 5, 2021

jinzhang21 Nov 5, 2021

sunishsheth2009 Nov 5, 2021

sunishsheth2009 Nov 5, 2021

dbczumar left a comment

dbczumar Nov 5, 2021

dbczumar Nov 5, 2021

dbczumar Nov 5, 2021

dbczumar Nov 5, 2021

dbczumar Nov 5, 2021

dbczumar Nov 5, 2021

BenWilson2 Nov 5, 2021

BenWilson2 left a comment



		def test_set_experiment_parameter_validation():
		with pytest.raises(MlflowException) as exc:

Add optional experiment_id parameter to mlflow.set_experiment #5012

Add optional experiment_id parameter to mlflow.set_experiment #5012

Conversation

dbczumar commented Nov 5, 2021

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinzhang21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenWilson2 left a comment

Choose a reason for hiding this comment

Add optional experiment_id parameter to `mlflow.set_experiment` #5012

Add optional experiment_id parameter to `mlflow.set_experiment` #5012