Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc(monitoring): monitoring guide #3300

Merged
merged 20 commits into from Dec 6, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file added docs/source/_static/img/arize_alerting.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/guides/index.rst
Expand Up @@ -20,6 +20,7 @@ into this part of the documentation.
graph
logging
metrics
monitoring
performance
grpc
gpu
Expand Down
237 changes: 237 additions & 0 deletions docs/source/guides/monitoring.rst
@@ -0,0 +1,237 @@
============================================
Inference Data Collection & Model Monitoring
============================================

Data-Centric Artificial Intelligence is a popular paradigm that is
revolutionizing the way we build AI systems. It is a new way of thinking
bojiang marked this conversation as resolved.
Show resolved Hide resolved
about AI that is based on the idea that data is the most important
component of AI systems.
bojiang marked this conversation as resolved.
Show resolved Hide resolved

BentoML embraces this new paradigm by providing APIs that make a data-centric workflow easy to implement.

.. image:: ../_static/img/monitoring_workflow.png
:width: 600px
:align: center

In this guide, we will focus on the online data collection and model monitoring. BentoML provides a unified interface for that.

The benefits of having a data collection and model monitoring workflow includes:
- Monitoring key statistical business metrics
- Identify data drift early to help decide if the model needs to be retrained
- make QA possible for the previous untracked metrics, like data and model performance
- use the same data collection and model monitoring workflow for training or model iteration
bojiang marked this conversation as resolved.
Show resolved Hide resolved


Build an ML Application with monitoring API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following examples are excerpted from :github:`bentoml/BentoML/tree/main/examples/quickstart`.

Given that we have the definetion of the bento service:
bojiang marked this conversation as resolved.
Show resolved Hide resolved

:code:`service.py`
aarnphm marked this conversation as resolved.
Show resolved Hide resolved

.. code:: python
bojiang marked this conversation as resolved.
Show resolved Hide resolved
:caption: `service.py`

import numpy as np
import bentoml
from bentoml.io import Text
from bentoml.io import NumpyNdarray

CLASS_NAMES = ["setosa", "versicolor", "virginica"]

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(
input=NumpyNdarray.from_sample(np.array([4.9, 3.0, 1.4, 0.2], dtype=np.double)),
output=Text(),
)
async def classify(features: np.ndarray) -> str:
results = await iris_clf_runner.predict.async_run([features])
result = results[0]
category = CLASS_NAMES[result]
return category

Before we go to the production, we will only need one more step to add monitoring:
bojiang marked this conversation as resolved.
Show resolved Hide resolved


aarnphm marked this conversation as resolved.
Show resolved Hide resolved
.. code:: python
bojiang marked this conversation as resolved.
Show resolved Hide resolved
:caption: `service.py`

import numpy as np

import bentoml
from bentoml.io import Text
from bentoml.io import NumpyNdarray

CLASS_NAMES = ["setosa", "versicolor", "virginica"]

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(
input=NumpyNdarray.from_sample(np.array([4.9, 3.0, 1.4, 0.2], dtype=np.double)),
output=Text(),
)
async def classify(features: np.ndarray) -> str:
with bentoml.monitor("iris_classifier_prediction") as mon:
mon.log(features[0], name="sepal length", role="feature", data_type="numerical")
mon.log(features[1], name="sepal width", role="feature", data_type="numerical")
mon.log(features[2], name="petal length", role="feature", data_type="numerical")
mon.log(features[3], name="petal width", role="feature", data_type="numerical")

results = await iris_clf_runner.predict.async_run([features])
result = results[0]
category = CLASS_NAMES[result]

mon.log(category, name="pred", role="prediction", data_type="categorical")
return category


Here we logged the request features and predictions with the `bentoml.monitor` API. Here’s the [API reference].
bojiang marked this conversation as resolved.
Show resolved Hide resolved

Then we can continue the normal steps, build a bento (BentoML Application) for the above service.
ssheng marked this conversation as resolved.
Show resolved Hide resolved
bojiang marked this conversation as resolved.
Show resolved Hide resolved

bojiang marked this conversation as resolved.
Show resolved Hide resolved
.. code:: bash

$ bentoml build
██████╗░███████╗███╗░░██╗████████╗░█████╗░███╗░░░███╗██╗░░░░░
██╔══██╗██╔════╝████╗░██║╚══██╔══╝██╔══██╗████╗░████║██║░░░░░
██████╦╝█████╗░░██╔██╗██║░░░██║░░░██║░░██║██╔████╔██║██║░░░░░
██╔══██╗██╔══╝░░██║╚████║░░░██║░░░██║░░██║██║╚██╔╝██║██║░░░░░
██████╦╝███████╗██║░╚███║░░░██║░░░╚█████╔╝██║░╚═╝░██║███████╗
╚═════╝░╚══════╝╚═╝░░╚══╝░░░╚═╝░░░░╚════╝░╚═╝░░░░░╚═╝╚══════╝

Successfully built Bento(tag="iris_classifier:6aqnksdbuouf2usu").

Deploy the service and collect monitoring data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With BentoML, once we have the bento, it's easy to deploy the ML application to any target. https://docs.bentoml.org/en/latest/concepts/deploy.html

ssheng marked this conversation as resolved.
Show resolved Hide resolved
Use ``serve --production`` to start the bento in production mode as a standalone server:

.. code:: bash

$ bentoml serve iris_classifier --production

Then we can send a request to the server to get the prediction. BentoML will log the request features and predictions to the configured place.
ssheng marked this conversation as resolved.
Show resolved Hide resolved

By default BentoML will export the data to the `monitoring/<your_monitor_name>` directory. To preview:
ssheng marked this conversation as resolved.
Show resolved Hide resolved

.. code:: bash
ssheng marked this conversation as resolved.
Show resolved Hide resolved

$ tail -f monitoring/iris_classifier_prediction/data/*.log
==> monitoring/iris_classifier_prediction/data/data.1.log <==
{"sepal length": 6.3, "sepal width": 2.3, "petal length": 4.4, "petal width": 1.3, "pred": "versicolor", "timestamp": "2022-11-09T15:31:26.781914", "request_id": "10655923893485958044"}
{"sepal length": 4.9, "sepal width": 3.6, "petal length": 1.4, "petal width": 0.1, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.786670", "request_id": "16263733333988780524"}
{"sepal length": 7.7, "sepal width": 3.0, "petal length": 6.1, "petal width": 2.3, "pred": "virginica", "timestamp": "2022-11-09T15:31:26.788535", "request_id": "9077185615468445403"}
{"sepal length": 7.4, "sepal width": 2.8, "petal length": 6.1, "petal width": 1.9, "pred": "virginica", "timestamp": "2022-11-09T15:31:26.795290", "request_id": "1949956912055125154"}
{"sepal length": 5.0, "sepal width": 2.3, "petal length": 3.3, "petal width": 1.0, "pred": "versicolor", "timestamp": "2022-11-09T15:31:26.797957", "request_id": "5892192931675972870"}
{"sepal length": 5.1, "sepal width": 3.5, "petal length": 1.4, "petal width": 0.3, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.801006", "request_id": "11124174524929195678"}
{"sepal length": 5.4, "sepal width": 3.4, "petal length": 1.7, "petal width": 0.2, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.805018", "request_id": "1977947867380701804"}
{"sepal length": 5.4, "sepal width": 3.0, "petal length": 4.5, "petal width": 1.5, "pred": "versicolor", "timestamp": "2022-11-09T15:31:26.809391", "request_id": "5170522495321543267"}
{"sepal length": 6.5, "sepal width": 3.2, "petal length": 5.1, "petal width": 2.0, "pred": "virginica", "timestamp": "2022-11-09T15:31:26.813963", "request_id": "746111233619919779"}
{"sepal length": 5.4, "sepal width": 3.7, "petal length": 1.5, "petal width": 0.2, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.816515", "request_id": "10451493838968794158"}

==> monitoring/iris_classifier_prediction/data/data.2.log <==
{"sepal length": 5.6, "sepal width": 2.5, "petal length": 3.9, "petal width": 1.1, "pred": "versicolor", "timestamp": "2022-11-09T15:31:26.768545", "request_id": "12581333339958540887"}
{"sepal length": 4.5, "sepal width": 2.3, "petal length": 1.3, "petal width": 0.3, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.770188", "request_id": "14803218836235991321"}
{"sepal length": 6.3, "sepal width": 2.9, "petal length": 5.6, "petal width": 1.8, "pred": "virginica", "timestamp": "2022-11-09T15:31:26.771554", "request_id": "3898998431725264845"}
{"sepal length": 4.7, "sepal width": 3.2, "petal length": 1.6, "petal width": 0.2, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.775306", "request_id": "16171654492399963820"}
{"sepal length": 4.9, "sepal width": 3.0, "petal length": 1.4, "petal width": 0.2, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.778971", "request_id": "12433921846139166785"}
{"sepal length": 6.9, "sepal width": 3.1, "petal length": 5.4, "petal width": 2.1, "pred": "virginica", "timestamp": "2022-11-09T15:31:26.783441", "request_id": "3868728687839356795"}
{"sepal length": 5.1, "sepal width": 3.4, "petal length": 1.5, "petal width": 0.2, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.803871", "request_id": "4920762203256166127"}
{"sepal length": 4.5, "sepal width": 2.3, "petal length": 1.3, "petal width": 0.3, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.807770", "request_id": "562712759995883379"}
{"sepal length": 5.1, "sepal width": 3.8, "petal length": 1.6, "petal width": 0.2, "pred": "setosa", "timestamp": "2022-11-09T15:31:26.810136", "request_id": "15755243536090754018"}
{"sepal length": 6.4, "sepal width": 3.1, "petal length": 5.5, "petal width": 1.8, "pred": "virginica", "timestamp": "2022-11-09T15:31:26.812188", "request_id": "15915060852312696387"}


Shipping the collected data
~~~~~~~~~~~~~~~~~~~~~~~~~~~
ssheng marked this conversation as resolved.
Show resolved Hide resolved

BentoML has a general monitoring data collecting API. This makes it possible to ship collected data to anywhere without code changes.
For example to a data warehouse, data analyze pipelines or to a monitoring & drift detection solution.

To achieve this, we just neet to provide a deployment configuration to bentoml.
ssheng marked this conversation as resolved.
Show resolved Hide resolved

Built-in Monitoring Data Collectors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ssheng marked this conversation as resolved.
Show resolved Hide resolved

1. Through log files

The most common way to collect monitoring data is to write it to log files. Many utils like fluentbit, filebeat, logstash, etc. can be used to collect log files and ship them to a data warehouse or a monitoring system.
This is also the default way BentoML exports monitoring data:
ssheng marked this conversation as resolved.
Show resolved Hide resolved

.. code:: yaml
:caption: `deployment_configuration.yaml`
ssheng marked this conversation as resolved.
Show resolved Hide resolved

monitoring:
enabled: true
type: default
options:
log_path: path/to/log/file

For Docker deployments, user can mount the log directory to a volume to persist the log files.
For K8s deployments, user can mount the log directory, and deploy a fluentbit daemonset or sidecar container to collect the log files to target destinations.
ssheng marked this conversation as resolved.
Show resolved Hide resolved


2. Through a OTLP endpoint
ssheng marked this conversation as resolved.
Show resolved Hide resolved

.. code:: yaml
:caption: `deployment_configuration.yaml`
monitoring:
ssheng marked this conversation as resolved.
Show resolved Hide resolved
enable: true
type: otlp
options:
endpoint: http://localhost:5000
insecure: true
credentials: null
headers: null
timeout: 10
compression: null
meta_sample_rate: 1.0,

For some deployment platforms, it's not easy to collect log files. For example, AWS Lambda doesn't support log files. In this case, BentoML can export monitoring data to an OTLP endpoint.
Some log collectors like fluentbit also supports OTLP input.
ssheng marked this conversation as resolved.
Show resolved Hide resolved


Plugins and Third-party Monitoring Data Collectors
ssheng marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

BentoML also supports plugins and third-party monitoring data collectors. User can write a custom monitoring data collector and publish it as a python package.
Unlike built-ins are more protocol specific for general use cases,
plugins could be more platform specific.
ssheng marked this conversation as resolved.
Show resolved Hide resolved

.. note::
To use plugins, you need to install the plugin and include it in the dependencies section of the bentofile.
For example, it is required to add `bentoml-plugins-arize` to the `python:packages` to use the Arize plugin.
ssheng marked this conversation as resolved.
Show resolved Hide resolved

1. Ship to Arize AI
bojiang marked this conversation as resolved.
Show resolved Hide resolved

For end-to-end solutions for data/model monitoring, BentoML colaborates with Arize AI to provide a plugin for Arize.
If you don't want to deploy a pipeline by yourself but still need data and model monitoring for the bussiness, Arize AI is a good choice.

Arize AI provides a unified platform for data scientists, data engineers, and ML engineers to monitor, analyze, and debug ML models in production.
And the `bentoml-plugins-arize` makes it easy to work with BentoML.
ssheng marked this conversation as resolved.
Show resolved Hide resolved


.. code:: yaml
:caption: `deployment_configuration.yaml`

monitoring:
enable: true
type: bentoml_plugins.arize.ArizeMonitor
options:
space_key: <your_space_key>
api_key: <your_api_key>
# ... more arize options
# see https://docs.arize.com/arize/data-ingestion/api-reference/python-sdk/arize.init#keyword-arguments
# and https://docs.arize.com/arize/sending-data-to-arize/data-ingestion-methods/sdk-reference/python-sdk/arize.log
ssheng marked this conversation as resolved.
Show resolved Hide resolved



You could also join the BentoML community and develop your own monitoring data collector by implementing the `bentoml.monitoring.MonitorBase` interface.
bojiang marked this conversation as resolved.
Show resolved Hide resolved
41 changes: 41 additions & 0 deletions docs/source/integrations/arize.rst
@@ -0,0 +1,41 @@
========
ssheng marked this conversation as resolved.
Show resolved Hide resolved
Arize AI
========

Arize AI provides a unified platform for data scientists, data engineers, and ML engineers to monitor, analyze, and debug ML models in production.
If you don't want to deploy a pipeline by yourself but still need data and model monitoring for the bussiness, Arize AI is a good choice.
bojiang marked this conversation as resolved.
Show resolved Hide resolved

For end-to-end solutions for data/model monitoring, BentoML colaborates with Arize AI to make it easy to work with all BentoML Deployments.
bojiang marked this conversation as resolved.
Show resolved Hide resolved


.. seealso::
How to use:
:ref:`Inference Data Collection & Model Monitoring <guides/monitoring:Plugins and Third-party Monitoring Data Collectors>`.
ssheng marked this conversation as resolved.
Show resolved Hide resolved


Preview
~~~~~~~

* Drift Detection & Monitoring

.. image:: ../_static/img/arize_drift_monitoring.png
:width: 100%
:alt: Arize Drift Monitoring

* Data Quality Monitoring

.. image:: ../_static/img/arize_data_quality_monitoring.png
:width: 100%
:alt: Arize Data Quality Monitoring

* Model Explainability

.. image:: ../_static/img/arize_model_explainability.png
:width: 100%
:alt: Arize Model Explainability

* Alerting & Notification

.. image:: ../_static/img/arize_alerting.png
:width: 100%
:alt: Arize Alerting
1 change: 1 addition & 0 deletions docs/source/integrations/index.rst
Expand Up @@ -10,3 +10,4 @@ Integrations
airflow
flink
mlflow
arize