New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only generate model uuid when logging model #5167
Conversation
@@ -102,13 +112,17 @@ def test_model_save_load(sklearn_knn_model, iris_data, tmpdir, model_path): | |||
|
|||
reloaded_model_config = Model.load(os.path.join(model_path, "MLmodel")) | |||
assert model_config.__dict__ == reloaded_model_config.__dict__ | |||
assert model_config.model_uuid is not None and _is_valid_uuid(model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert model_config.model_uuid is not None and _is_valid_uuid(model_config) | |
assert model_config.model_uuid is not None | |
assert _is_valid_uuid(model_config.model_uuid) |
Can you add a test that |
mlflow/models/model.py
Outdated
model_uuid = uuid.uuid4().hex | ||
mlflow_model = cls(artifact_path=artifact_path, run_id=run_id, model_uuid=model_uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not set uuid here. Otherwise there're many other place we need also set uuid.
e.g.
mlflow/mlflow/pyfunc/__init__.py
Line 1072 in f9b5745
mlflow_model = Model() |
Line 122 in f9b5745
mlflow_model = Model() |
etc.
I move the set uuid code into Model constructor again.
@@ -42,7 +42,6 @@ def __init__( | |||
flavors=None, | |||
signature=None, # ModelSignature | |||
saved_input_example_info: Dict[str, Any] = None, | |||
model_uuid=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing model_uuid
from the constructor and modifying the attribute during load()
, can we define a third type of value for model_uuid
?
- By default,
model_uuid
should be a function that generates a UUID, e.g.lambda: uuid.uuid4().hex
- Model uuid could be
None
, indicating that the model has no ID - Model uuid could be a string, indicating that the model already has a UUID
In the constructor, we can check if the input is a function and, if it is, call it to generate an ID. Otherwise, set self.model_uuid = model_uuid
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
mlflow/models/model.py
Outdated
@@ -54,7 +53,7 @@ def __init__( | |||
self.flavors = flavors if flavors is not None else {} | |||
self.signature = signature | |||
self.saved_input_example_info = saved_input_example_info | |||
self.model_uuid = uuid.uuid4().hex if model_uuid is None else model_uuid | |||
self.model_uuid = uuid.uuid4().hex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically means every Model
instance has a different model_uuid
. Is this really the desired behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every new constructed model has a different ID, this is desired.
We only need keep the rule that the model load back from saved model should load old ID back or set it None if saved model don't have ID
mlflow/models/model.py
Outdated
if callable(model_uuid): | ||
self.model_uuid = model_uuid() | ||
else: | ||
self.model_uuid = model_uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if callable(model_uuid): | |
self.model_uuid = model_uuid() | |
else: | |
self.model_uuid = model_uuid | |
self.model_uuid = model_uuid() if callable(model_uuid) else model_uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @WeichenXu123 !
Signed-off-by: Weichen Xu weichen.xu@databricks.com
What changes are proposed in this pull request?
Only generate model uuid when logging model.
Before:
When loading model from a old version mlflow model, a new model uuid will be generated. Loading multiple times will generate multiple different model uuid.
After:
When loading model from a old version mlflow model, the model uuid attribute will be None.
How is this patch tested?
Unit tests.
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes