Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional experiment_id parameter to mlflow.set_experiment #5012

Merged
merged 5 commits into from Nov 5, 2021

Conversation

dbczumar
Copy link
Collaborator

@dbczumar dbczumar commented Nov 5, 2021

What changes are proposed in this pull request?

Add optional experiment_id parameter to mlflow.set_experiment, allowing users to set an experiment by ID.

How is this patch tested?

Included unit tests

Release Notes

Add an optional experiment_id parameter to mlflow.set_experiment(), enabling users to set an experiment by ID.

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Nov 5, 2021


def test_set_experiment_parameter_validation():
with pytest.raises(MlflowException) as exc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use the optional match= in pytest.raises to do regex matching for the exception error message?

with pytest.raises.(MLflowException, match="Must specify exactly one") as exc:
    mlflow.set_experiment()
assert exc.value.error_Code == ErrorCode.Name(INVALID_PARAMETER_VALUE)

Haru asked that I change a few newer unit tests to use this pattern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome suggestion! Done!

Copy link
Collaborator

@jinzhang21 jinzhang21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change, Corey! LGTM!

)

def verify_experiment_active(experiment):
if experiment.lifecycle_stage == LifecycleStage.DELETED:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if experiment.lifecycle_stage != LifecycleStage.ACTIVE:
to be more specific, since the name of the function is to verify it's active or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Good idea :)

if experiment_id is None:
experiment = client.get_experiment_by_name(experiment_name)
if experiment:
verify_experiment_active(experiment)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about verify outside of the if-else block?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to check if experiment is present to check outside of the if block or we need to replicate that if experiment check in the verify_experiment_active function as far as I can tell.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge here is that we'd need to re-fetch the experiment in order to perform the activity check in the case where a new experiment was created, since client.create_experiment returns an experiment ID, rather than the whole experiment entity. In the interest of saving a network call, I've embedded verify_experiment_active within a couple interior branches.

"Experiment with name '%s' does not exist. Creating a new experiment.",
experiment_name,
)
experiment_id = client.create_experiment(experiment_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential race condition here between client.get_experiment_by_name() and client.create_experiment(experiment_name). It could cause issues in distributed logging situation. I don't think you need to fix it here, but might be good to annotate the code to make future life easier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call-out. It's worth noting that the MLflow fluent API in general is not meant to be safe across threads or processes. I've added a note here nonetheless.

)
experiment_id = client.create_experiment(experiment_name)
else:
experiment = client.get_experiment(experiment_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function throw if experiment_id doesn't exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404 in REST and throws in file_store and sqlalchemy_store

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenWilson2 ! Yes, while this throws for MLflow-native stores, it may not always throw for alternative stores. I don't think the None check is particularly problematic here as an extra layer of defense.

print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name))
exp_id = client.create_experiment(experiment_name)
elif experiment.lifecycle_stage == LifecycleStage.DELETED:
if (experiment_name is not None and experiment_id is not None) or (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is empty string legit for either name or id? I guess not. In that case, it might be better to verify
if not (experiment_name and experiment_id)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious on why do we need this condition? We just need experiment_name is None and experiment_id is None right? Let me know if I am missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the guarded exclusivity check protect against conflicting submission behavior. For an invalid id but a valid name it would raise an Exception based on the id validation. Could be super confusing for users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is empty string legit for either name or id? I guess not. In that case, it might be better to verify
if not (experiment_name and experiment_id)

I would imagine that it's invalid on most if not all backends, but I'm hesitant to enforce this on the off chance that someone's third-party backend has a legitimate use case for this. For example, MLflow's default experiment used to be a falsey integer value (0) before the change to string representations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the same logic applies for experiment names, I've removed the test case asserting that an empty string name is invalid, as that's backend-dependent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name))
exp_id = client.create_experiment(experiment_name)
elif experiment.lifecycle_stage == LifecycleStage.DELETED:
if (experiment_name is not None and experiment_id is not None) or (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious on why do we need this condition? We just need experiment_name is None and experiment_id is None right? Let me know if I am missing something.

if experiment_id is None:
experiment = client.get_experiment_by_name(experiment_name)
if experiment:
verify_experiment_active(experiment)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to check if experiment is present to check outside of the if block or we need to replicate that if experiment check in the verify_experiment_active function as far as I can tell.

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Copy link
Collaborator Author

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinzhang21 @BenWilson2 @sunishsheth2009 Thank you for your thorough reviews! I've addressed your comments. I'm going to push one more change so that set_experiment also returns the experiment entity.

)

def verify_experiment_active(experiment):
if experiment.lifecycle_stage == LifecycleStage.DELETED:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Good idea :)

if experiment_id is None:
experiment = client.get_experiment_by_name(experiment_name)
if experiment:
verify_experiment_active(experiment)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge here is that we'd need to re-fetch the experiment in order to perform the activity check in the case where a new experiment was created, since client.create_experiment returns an experiment ID, rather than the whole experiment entity. In the interest of saving a network call, I've embedded verify_experiment_active within a couple interior branches.

"Experiment with name '%s' does not exist. Creating a new experiment.",
experiment_name,
)
experiment_id = client.create_experiment(experiment_name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call-out. It's worth noting that the MLflow fluent API in general is not meant to be safe across threads or processes. I've added a note here nonetheless.

)
experiment_id = client.create_experiment(experiment_name)
else:
experiment = client.get_experiment(experiment_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BenWilson2 ! Yes, while this throws for MLflow-native stores, it may not always throw for alternative stores. I don't think the None check is particularly problematic here as an extra layer of defense.



def test_set_experiment_parameter_validation():
with pytest.raises(MlflowException) as exc:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome suggestion! Done!

print("INFO: '{}' does not exist. Creating a new experiment".format(experiment_name))
exp_id = client.create_experiment(experiment_name)
elif experiment.lifecycle_stage == LifecycleStage.DELETED:
if (experiment_name is not None and experiment_id is not None) or (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the same logic applies for experiment names, I've removed the test case asserting that an empty string name is invalid, as that's backend-dependent.

Signed-off-by: dbczumar <corey.zumar@databricks.com>

global _active_experiment_id
_active_experiment_id = experiment_id
_active_experiment_id = experiment.experiment_id
return experiment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually simplifies some MLOps pipeline flows! A bonus feature is always a great thing

Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants