New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional experiment_id parameter to mlflow.set_experiment
#5012
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,10 @@ | |
from mlflow.entities import Experiment, Run, RunInfo, RunStatus, Param, RunTag, Metric, ViewType | ||
from mlflow.entities.lifecycle_stage import LifecycleStage | ||
from mlflow.exceptions import MlflowException | ||
from mlflow.protos.databricks_pb2 import ( | ||
INVALID_PARAMETER_VALUE, | ||
RESOURCE_DOES_NOT_EXIST, | ||
) | ||
from mlflow.tracking.client import MlflowClient | ||
from mlflow.tracking import artifact_utils, _get_store | ||
from mlflow.tracking.context import registry as context_registry | ||
|
@@ -50,12 +54,17 @@ | |
_logger = logging.getLogger(__name__) | ||
|
||
|
||
def set_experiment(experiment_name: str) -> None: | ||
def set_experiment(experiment_name: str = None, experiment_id: str = None) -> None: | ||
""" | ||
Set given experiment as active experiment. If experiment does not exist, create an experiment | ||
with provided name. | ||
Set the given experiment as the active experiment. The experiment must either be specified by | ||
name via `experiment_name` or by ID via `experiment_id`. The experiment name and ID cannot | ||
both be specified. | ||
|
||
:param experiment_name: Case sensitive name of an experiment to be activated. | ||
:param experiment_name: Case sensitive name of the experiment to be activated. If an experiment | ||
with this name does not exist, a new experiment wth this name is | ||
created. | ||
:param experiment_id: ID of the experiment to be activated. If an experiment with this ID | ||
does not exist, an exception is thrown. | ||
|
||
.. code-block:: python | ||
:caption: Example | ||
|
@@ -80,20 +89,48 @@ def set_experiment(experiment_name: str) -> None: | |
Tags: {} | ||
Lifecycle_stage: active | ||
""" | ||
client = MlflowClient() | ||
experiment = client.get_experiment_by_name(experiment_name) | ||
exp_id = experiment.experiment_id if experiment else None | ||
if exp_id is None: # id can be 0 | ||
print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name)) | ||
exp_id = client.create_experiment(experiment_name) | ||
elif experiment.lifecycle_stage == LifecycleStage.DELETED: | ||
if (experiment_name is not None and experiment_id is not None) or ( | ||
experiment_name is None and experiment_id is None | ||
): | ||
raise MlflowException( | ||
"Cannot set a deleted experiment '%s' as the active experiment." | ||
" You can restore the experiment, or permanently delete the " | ||
" experiment to create a new one." % experiment.name | ||
message="Must specify exactly one of: `experiment_id` or `experiment_name`.", | ||
error_code=INVALID_PARAMETER_VALUE, | ||
) | ||
|
||
def verify_experiment_active(experiment): | ||
if experiment.lifecycle_stage == LifecycleStage.DELETED: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if experiment.lifecycle_stage != LifecycleStage.ACTIVE: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! Good idea :) |
||
raise MlflowException( | ||
message=( | ||
"Cannot set a deleted experiment '%s' as the active experiment." | ||
" You can restore the experiment, or permanently delete the " | ||
" experiment to create a new one." % experiment.name | ||
), | ||
error_code=INVALID_PARAMETER_VALUE, | ||
) | ||
|
||
client = MlflowClient() | ||
if experiment_id is None: | ||
experiment = client.get_experiment_by_name(experiment_name) | ||
if experiment: | ||
verify_experiment_active(experiment) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about verify outside of the if-else block? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we might need to check if experiment is present to check outside of the if block or we need to replicate that if experiment check in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The challenge here is that we'd need to re-fetch the experiment in order to perform the activity check in the case where a new experiment was created, since |
||
experiment_id = experiment.experiment_id | ||
else: | ||
_logger.info( | ||
"Experiment with name '%s' does not exist. Creating a new experiment.", | ||
experiment_name, | ||
) | ||
experiment_id = client.create_experiment(experiment_name) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a potential race condition here between client.get_experiment_by_name() and client.create_experiment(experiment_name). It could cause issues in distributed logging situation. I don't think you need to fix it here, but might be good to annotate the code to make future life easier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good call-out. It's worth noting that the MLflow fluent API in general is not meant to be safe across threads or processes. I've added a note here nonetheless. |
||
else: | ||
experiment = client.get_experiment(experiment_id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this function throw if experiment_id doesn't exist? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 404 in REST and throws in file_store and sqlalchemy_store There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @BenWilson2 ! Yes, while this throws for MLflow-native stores, it may not always throw for alternative stores. I don't think the |
||
if experiment is None: | ||
raise MlflowException( | ||
message=f"Experiment with ID '{experiment_id}' does not exist.", | ||
error_code=RESOURCE_DOES_NOT_EXIST, | ||
) | ||
verify_experiment_active(experiment) | ||
|
||
global _active_experiment_id | ||
_active_experiment_id = exp_id | ||
_active_experiment_id = experiment_id | ||
|
||
|
||
class ActiveRun(Run): # pylint: disable=W0223 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ | |
from mlflow.entities import RunStatus, LifecycleStage, Metric, Param, RunTag, ViewType | ||
from mlflow.exceptions import MlflowException | ||
from mlflow.store.tracking.file_store import FileStore | ||
from mlflow.protos.databricks_pb2 import ErrorCode, INVALID_PARAMETER_VALUE | ||
from mlflow.protos.databricks_pb2 import ErrorCode, INVALID_PARAMETER_VALUE, RESOURCE_DOES_NOT_EXIST | ||
from mlflow.tracking.client import MlflowClient | ||
from mlflow.tracking.fluent import start_run | ||
from mlflow.utils.file_utils import local_file_uri_to_path | ||
|
@@ -77,16 +77,7 @@ def test_create_experiments_with_bad_name_types(name): | |
|
||
|
||
@pytest.mark.usefixtures("reset_active_experiment") | ||
def test_set_experiment(): | ||
with pytest.raises(TypeError): | ||
mlflow.set_experiment() # pylint: disable=no-value-for-parameter | ||
|
||
with pytest.raises(Exception): | ||
mlflow.set_experiment(None) | ||
|
||
with pytest.raises(Exception): | ||
mlflow.set_experiment("") | ||
|
||
def test_set_experiment_by_name(): | ||
name = "random_exp" | ||
exp_id = mlflow.create_experiment(name) | ||
mlflow.set_experiment(name) | ||
|
@@ -100,16 +91,66 @@ def test_set_experiment(): | |
assert another_run.info.experiment_id == exp_id2.experiment_id | ||
|
||
|
||
def test_set_experiment_with_deleted_experiment_name(): | ||
@pytest.mark.usefixtures("reset_active_experiment") | ||
def test_set_experiment_by_id(): | ||
name = "random_exp" | ||
exp_id = mlflow.create_experiment(name) | ||
mlflow.set_experiment(experiment_id=exp_id) | ||
with start_run() as run: | ||
assert run.info.experiment_id == exp_id | ||
|
||
nonexistent_id = "-1337" | ||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment(experiment_id=nonexistent_id) | ||
assert exc.value.error_code == ErrorCode.Name(RESOURCE_DOES_NOT_EXIST) | ||
with start_run() as run: | ||
assert run.info.experiment_id == exp_id | ||
|
||
|
||
def test_set_experiment_parameter_validation(): | ||
with pytest.raises(MlflowException) as exc: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps use the optional match= in with pytest.raises.(MLflowException, match="Must specify exactly one") as exc:
mlflow.set_experiment()
assert exc.value.error_Code == ErrorCode.Name(INVALID_PARAMETER_VALUE) Haru asked that I change a few newer unit tests to use this pattern. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Awesome suggestion! Done! |
||
mlflow.set_experiment() | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Must specify exactly one" in str(exc.value) | ||
|
||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment(None) | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Must specify exactly one" in str(exc.value) | ||
|
||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment(None, None) | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Must specify exactly one" in str(exc.value) | ||
|
||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment("name", "id") | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Must specify exactly one" in str(exc.value) | ||
|
||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment("") | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Invalid experiment name" in str(exc.value) | ||
|
||
|
||
def test_set_experiment_with_deleted_experiment(): | ||
name = "dead_exp" | ||
mlflow.set_experiment(name) | ||
with start_run() as run: | ||
exp_id = run.info.experiment_id | ||
|
||
tracking.MlflowClient().delete_experiment(exp_id) | ||
|
||
with pytest.raises(MlflowException): | ||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment(name) | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Cannot set a deleted experiment" in str(exc.value) | ||
|
||
with pytest.raises(MlflowException) as exc: | ||
mlflow.set_experiment(experiment_id=exp_id) | ||
assert exc.value.error_code == ErrorCode.Name(INVALID_PARAMETER_VALUE) | ||
assert "Cannot set a deleted experiment" in str(exc.value) | ||
|
||
|
||
def test_list_experiments(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is empty string legit for either name or id? I guess not. In that case, it might be better to verify
if not (experiment_name and experiment_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious on why do we need this condition? We just need
experiment_name is None and experiment_id is None
right? Let me know if I am missing something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the guarded exclusivity check protect against conflicting submission behavior. For an invalid id but a valid name it would raise an Exception based on the id validation. Could be super confusing for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine that it's invalid on most if not all backends, but I'm hesitant to enforce this on the off chance that someone's third-party backend has a legitimate use case for this. For example, MLflow's default experiment used to be a falsey integer value (
0
) before the change to string representations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the same logic applies for experiment names, I've removed the test case asserting that an empty string name is invalid, as that's backend-dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.