Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only generate model uuid when logging model #5167

Merged
merged 6 commits into from Dec 16, 2021

Conversation

WeichenXu123
Copy link
Collaborator

@WeichenXu123 WeichenXu123 commented Dec 15, 2021

Signed-off-by: Weichen Xu weichen.xu@databricks.com

What changes are proposed in this pull request?

Only generate model uuid when logging model.

Before:
When loading model from a old version mlflow model, a new model uuid will be generated. Loading multiple times will generate multiple different model uuid.

After:
When loading model from a old version mlflow model, the model uuid attribute will be None.

How is this patch tested?

Unit tests.

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 mentioned this pull request Dec 15, 2021
29 tasks
@github-actions github-actions bot added rn/bug-fix Mention under Bug Fixes in Changelogs. area/artifacts Artifact stores and artifact logging labels Dec 15, 2021
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@@ -102,13 +112,17 @@ def test_model_save_load(sklearn_knn_model, iris_data, tmpdir, model_path):

reloaded_model_config = Model.load(os.path.join(model_path, "MLmodel"))
assert model_config.__dict__ == reloaded_model_config.__dict__
assert model_config.model_uuid is not None and _is_valid_uuid(model_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert model_config.model_uuid is not None and _is_valid_uuid(model_config)
assert model_config.model_uuid is not None
assert _is_valid_uuid(model_config.model_uuid)

@harupy
Copy link
Member

harupy commented Dec 15, 2021

Can you add a test that model_uuid is None when we load Model from MLmodel file without model_uuid?

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Comment on lines 189 to 190
model_uuid = uuid.uuid4().hex
mlflow_model = cls(artifact_path=artifact_path, run_id=run_id, model_uuid=model_uuid)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not set uuid here. Otherwise there're many other place we need also set uuid.
e.g.

mlflow_model = Model()

mlflow_model = Model()

etc.

I move the set uuid code into Model constructor again.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@@ -42,7 +42,6 @@ def __init__(
flavors=None,
signature=None, # ModelSignature
saved_input_example_info: Dict[str, Any] = None,
model_uuid=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing model_uuid from the constructor and modifying the attribute during load(), can we define a third type of value for model_uuid?

  1. By default, model_uuid should be a function that generates a UUID, e.g. lambda: uuid.uuid4().hex
  2. Model uuid could be None, indicating that the model has no ID
  3. Model uuid could be a string, indicating that the model already has a UUID

In the constructor, we can check if the input is a function and, if it is, call it to generate an ID. Otherwise, set self.model_uuid = model_uuid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@@ -54,7 +53,7 @@ def __init__(
self.flavors = flavors if flavors is not None else {}
self.signature = signature
self.saved_input_example_info = saved_input_example_info
self.model_uuid = uuid.uuid4().hex if model_uuid is None else model_uuid
self.model_uuid = uuid.uuid4().hex
Copy link
Member

@harupy harupy Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically means every Model instance has a different model_uuid. Is this really the desired behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every new constructed model has a different ID, this is desired.
We only need keep the rule that the model load back from saved model should load old ID back or set it None if saved model don't have ID

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Comment on lines 57 to 60
if callable(model_uuid):
self.model_uuid = model_uuid()
else:
self.model_uuid = model_uuid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if callable(model_uuid):
self.model_uuid = model_uuid()
else:
self.model_uuid = model_uuid
self.model_uuid = model_uuid() if callable(model_uuid) else model_uuid

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @WeichenXu123 !

@WeichenXu123 WeichenXu123 merged commit f9046f9 into mlflow:master Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts Artifact stores and artifact logging rn/bug-fix Mention under Bug Fixes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants