Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Default evaluator #5092

Merged
merged 128 commits into from Jan 10, 2022

Conversation

WeichenXu123
Copy link
Collaborator

@WeichenXu123 WeichenXu123 commented Nov 22, 2021

What changes are proposed in this pull request?

Prototype Evaluation Default evaluator

Test code:

import numpy as np
from sklearn.linear_model import LogisticRegression
import shap
import matplotlib.pyplot as plt
import sklearn
import pandas as pd

import xgboost
import shap
from mlflow.models.evaluation import evaluate, EvaluationDataset
import mlflow
from mlflow.tracking.artifact_utils import get_artifact_uri

# train XGBoost model
X, y = shap.datasets.adult()
new_feature_names_map = {f: f + '12345678901234' for f in X.columns}
X = X.rename(columns=new_feature_names_map)

model = xgboost.XGBClassifier().fit(X, y)

with mlflow.start_run() as run:
    mlflow.sklearn.log_model(model, 'xgb_model')

model_uri = get_artifact_uri(run.info.run_id, 'xgb_model')


def predict_proba(self, data):
    return self._model_impl.predict_proba(data)


mlflow.pyfunc.PyFuncModel.predict_proba = predict_proba

y = y.astype(np.int32)


X['label'] = y

dataset = EvaluationDataset(data=X, labels='label', name='adult')

"""
        evaluator_config={'default': {
            'explainality.algorithm': 'partition',
            'explainality.nsamples': 1000
        }}
"""

with mlflow.start_run() as run:
    result = evaluate(
        model_uri,
        model_type='classifier',
        dataset=dataset,
        evaluators=['default'],
    )

print(f'run_id={run.info.run_id}\n')
client = mlflow.tracking.MlflowClient()
artifacts = [f.path for f in client.list_artifacts(run.info.run_id)]
print(f'artifacts={artifacts}')

How is this patch tested?

(Details)

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Implement builtin default evaluator for model evaluation.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 marked this pull request as draft November 22, 2021 16:17
if not _matplotlib_initialized:
import matplotlib.pyplot as pyplot

pyplot.rcParams["figure.dpi"] = 144
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should mutate rcParams. Can we just specify dpi and figsize when saving plots?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems cannot. We need set it when creating figure, if so we need modify shap source code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could try this
https://stackoverflow.com/a/35394637

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I just also found it. Use with matplotlib.rc_context(...)

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
_last_failed_evaluator = None


def _get_last_failed_evaluator():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we use this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In usage log, we want to log error raised from which evaluator (TODO), for debug purpose. so I add this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we only store the last failed evaluator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluate API will raise error when it hit the first failed evaluator. (The API designed to be so.) When any evaluator fail evaluate raise error, otherwise return all merged success results

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
sampled_X = sampled_X.rename(columns=truncated_feature_name_map, copy=False)

if algorithm:
supported_algos = ['exact', 'permutation', 'partition']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note:
in shap 0.40, the shap.Explainer argument algo , only when setting 'exact', 'permutation', 'partition', it works fine, the remaining algo (listed in shap doc) are all buggy in my test. So I limit them to be the 3 values.

try:
pyplot.clf()
do_plot()
pyplot.savefig(artifact_file_local_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the dpi value to something higher than the default of 100? The images look really blurry and out of place in the tracking UI. 300 might be a better option.

pyplot.savefig(artifact_file_local_path, dpi=300)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps save in .svg format to ensure that scalability in different zoom levels in browsers will work well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increase dpi to 288 (previous in databricks notebook it is 72)

Save fig as svg is not good choice, the Pillow package does not support svg. But we need return artifact as the pillow Image instance.

sk_metrics.ConfusionMatrixDisplay(
confusion_matrix=confusion_matrix,
display_labels=self.label_list,
).plot(cmap="Blues")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to scale these plots for class count > 6. The example for multiclass that has 10 classes is illegible with numeric wrapping overwrites of values. Setting figsize overrides on the canvas and changing the dpi at figure save time will make a much more clear visualization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The autologging code also generate confusion matrix plot for training dataset and has similar issue. Let's do a follow-up PR to improve for both cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a follow-up ticket for this:
https://databricks.atlassian.net/browse/ML-19817

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Copy link
Collaborator

@jinzhang21 jinzhang21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice implementation! Sorry for the delayed review, Weichen.

mlflow/models/evaluation/artifacts.py Show resolved Hide resolved
def evaluate(
*,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "*" mean here? Do you intend to make it a nameless positional arg like *args? It may worth to spell it out even if it's not used, and put it in the end of the argument list, e.g.
def _evaluate(model, model_type, dataset, actual_run_id, evaluator_name_list, evaluator_name_to_conf_map, *unused_args):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The * is a python grammar, not a argument, it means all following arguments must be a key-word-only arguments. https://www.python.org/dev/peps/pep-3102/

Mark all argument as key-word-only is a proposal raised by @dbczumar , so that in future if mlflow containerized model provide "model_type" property, we can remove the "model_type" argument here but keep backwards compatibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, removing the model_type argumnet is a breaking change, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, we can change API to be :

def evaluate(
    *, model, dataset, run_id, evaluators=None, evaluator_config=None, **kwargs
):
   if "model_type" in kwargs:
      model_type = kwargs["kwargs"]
      logger.warning("deprecation: ....")
   else:
      model_type = model.model_type
   ...

Then we can keep backwards compatiblity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this usage pattern. Interesting!
Another way to keep backward compatibility is to use keyword arg for model_type, e.g.

def evaluate(
    model, dataset, run_id, model_type=None, evaluators=None, evaluator_config=None, **kwargs
):

What's the downside of going this more "conventional" way of specifying optional arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current evaluate implementation, the model_type argument is required. In future we will make it "optional and deprecated" and then remove this argument.

@@ -496,33 +685,72 @@ def evaluate(
a nested dictionary whose key is the evaluator name.
:return: An :py:class:`mlflow.models.evaluation.EvaluationDataset` instance containing
evaluation results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I really like the documentation here to spell out the expected outcome and limitations. Hopefully we'll make it shorter and easier to navigate over time.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 marked this pull request as ready for review January 10, 2022 10:02
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 merged commit 964f5ab into mlflow:master Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants