Skip to content

Commit

Permalink
docs: metrics (#3121)
Browse files Browse the repository at this point in the history
* docs: metrics

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* --wip--

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: use NLTK as example

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/snippets/metrics/metric_defs.py

* Update docs/source/guides/snippets/metrics/metric_defs.py

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

* Update docs/source/guides/metrics.rst

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sean Sheng <s3sheng@gmail.com>
  • Loading branch information
aarnphm and ssheng committed Oct 28, 2022
1 parent 9888d6b commit 60537e6
Show file tree
Hide file tree
Showing 24 changed files with 982 additions and 103 deletions.
Binary file added docs/source/_static/img/prometheus-metrics.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/source/guides/grpc.rst
Expand Up @@ -303,7 +303,7 @@ gRPC server:
The following ``build.gradle`` should be able to help you get started:

.. literalinclude:: ../../../grpc-client/java/build.gradle
:language: groovy
:language: text
:caption: build.gradle

To build the client, run:
Expand Down Expand Up @@ -386,7 +386,7 @@ gRPC server:
The following ``build.gradle.kts`` should be able to help you get started:

.. literalinclude:: ../../../grpc-client/kotlin/build.gradle.kts
:language: groovy
:language: text
:caption: build.gradle.kts

To build the client, run:
Expand Down
1 change: 1 addition & 0 deletions docs/source/guides/index.rst
Expand Up @@ -16,6 +16,7 @@ into this part of the documentation.
grpc
configuration
containerization
metrics
gpu
logging
monitoring
Expand Down
167 changes: 167 additions & 0 deletions docs/source/guides/metrics.rst
@@ -0,0 +1,167 @@
=======
Metrics
=======

Metrics are measurements of statistics about your service, which can provide information about the usage and performance of your bentos in production.

BentoML allows users to define custom metrics with `Prometheus <https://prometheus.io/docs/introduction/overview/>`_ to easily enable monitoring for their Bentos.

This article will dive into how to add custom metrics to monitor your BentoService and how you can incorporate custom metrics into
either a :ref:`concepts/runner:Custom Runner` or your :ref:`Service <concepts/service:Service and APIs>`.

Having a `Prometheus server <https://prometheus.io/docs/prometheus/latest/getting_started/>` available will help visualize the examples in this guide.

.. note::

This article assumes that you have a base understanding of a BentoService. If you
are new to BentoML, please start with :ref:`the quickstart tutorial <tutorial:Tutorial: Intro to BentoML>`.

.. seealso::

All `metrics types <https://prometheus.io/docs/concepts/metric_types/>`_ supported by Prometheus are supported in BentoML. See :ref:`Metrics API <reference/metrics:Metrics API>` for more information on ``bentoml.metrics``.


Using Metrics in a BentoService
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We will build a custom histogram to track the latency of our :ref:`pretrained NLTK runner <concepts/runner:Custom Runner>`, a custom
counter to measure the total amount of time our endpoint is invoked.

.. note::

The source code for this custom runner is :github:`available on GitHub <bentoml/BentoML/tree/main/examples/custom_runner/nltk_pretrained_model>`.

Initialize our metrics as follow:

.. literalinclude:: ./snippets/metrics/metric_defs.py
:language: python
:caption: `service.py`

``inference_duration`` is a :meth:`bentoml.metrics.Histogram`, which tracks how long it
takes for our model to run inference.
The :attr:`bentoml.metrics.Histogram.buckets` argument is used to determine the granularity of histogram tracking. The range of the buckets should cover the range of values the histogram is expected track. Number of buckets is positively correlated to the the granularity of tracking. The last value of the bucket should always be the positive infinity. See Prometheus documentation on `Histogram <https://prometheus.io/docs/practices/histograms/>`_ for more details.

``polarity_counter`` is a :meth:`bentoml.metrics.Counter`, which tracks the total number
of analysis by the polarity scores.

.. epigraph::

:bdg-info:`Note:` This also applies to any other metric type, including :meth:`bentoml.metrics.Gauge` and :meth:`bentoml.metrics.Summary`.

Create our NLTK custom runner:

.. literalinclude:: ./snippets/metrics/runner_impl.py
:language: python
:caption: `service.py`

This runnable implementation creates a custom NLTK runner, that use the ``inference_duration``
histogram to track the latency of polarity scores from a given sentence.

Initialize our NLTK runner, and add it to the service:

.. code-block:: python
nltk_runner = t.cast(
"RunnerImpl", bentoml.Runner(NLTKSentimentAnalysisRunnable, name="nltk_sentiment")
)
svc = bentoml.Service("sentiment_analyzer", runners=[nltk_runner])
@svc.api(input=bentoml.io.Text(), output=bentoml.io.JSON())
async def analysis(input_text: str) -> dict[str, bool]:
is_positive = await nltk_runner.is_positive.async_run(input_text)
polarity_counter.labels(polarity=is_positive).inc()
return {"is_positive": is_positive}
Our endpoint ``analysis`` uses the ``polarity_counter`` to track the total number of
invocation for ``analysis`` by polarity scores.

.. tab-set::

.. tab-item:: HTTP
:sync: http

Serve our service:

.. code-block:: bash
» bentoml serve-http --production
Use the following ``prometheus.yml`` config:

.. literalinclude:: ../../../examples/custom_runner/nltk_pretrained_model/prometheus/prometheus.http.yml
:language: python
:caption: `prometheus.yml`

Startup your Prometheus server in a different terminal session:

.. code-block:: bash
» prometheus --config.file=prometheus.yml
In a different terminal, send a request to our service:

.. code-block:: bash
» curl -X POST -F "image=@test_image.png" \
http://0.0.0.0:3000/predict
.. tab-item:: gRPC
:sync: grpc

Serve our service:

.. code-block:: bash
» bentoml serve-grpc --production --enable-reflection
Use the following ``prometheus.yml`` config:

.. literalinclude:: ../../../examples/custom_runner/nltk_pretrained_model/prometheus/prometheus.grpc.yml
:language: python
:caption: `prometheus.yml`

Startup your Prometheus server in a different terminal session:

.. code-block:: bash
» prometheus --config.file=prometheus.yml
In a different terminal, send a request to our service:

.. code-block:: bash
» grpcurl -d @ -plaintext 0.0.0.0:3000 bentoml.grpc.v1alpha1.BentoService/Call <<EOT
{
"apiName": "predict",
"serializedBytes": "..."
}
EOT
Visit `http://localhost:9090/graph <http://localhost:9090/graph>`_ and use the following query for 95th percentile inference latency:
.. code-block:: text
histogram_quantile(0.95, rate(inference_duration_bucket[1m]))
.. image:: ../_static/img/prometheus-metrics.png
.. TODO::
* Grafana dashboard
.. admonition:: Help us improve the project!
Found an issue or a TODO item? You're always welcome to make contributions to the
project and its documentation. Check out the
`BentoML development guide <https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md>`_
and `documentation guide <https://github.com/bentoml/BentoML/blob/main/docs/README.md>`_
to get started.
----
.. rubric:: Notes
.. [#prometheus] `Prometheus <https://prometheus.io/>`_
16 changes: 16 additions & 0 deletions docs/source/guides/snippets/metrics/metric_defs.py
@@ -0,0 +1,16 @@
from __future__ import annotations

import bentoml

inference_duration = bentoml.metrics.Histogram(
name="inference_duration",
documentation="Duration of inference",
labelnames=["nltk_version", "sentiment_cls"],
buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),
)

polarity_counter = bentoml.metrics.Counter(
name="polarity_total",
documentation="Count total number of analysis by polarity scores",
labelnames=["polarity"],
)
18 changes: 18 additions & 0 deletions docs/source/guides/snippets/metrics/runner_impl.py
@@ -0,0 +1,18 @@
class NLTKSentimentAnalysisRunnable(bentoml.Runnable):
SUPPORTED_RESOURCES = ("cpu",)
SUPPORTS_CPU_MULTI_THREADING = False

def __init__(self):
self.sia = SentimentIntensityAnalyzer()

@bentoml.Runnable.method(batchable=False)
def is_positive(self, input_text: str) -> bool:
start = time.perf_counter()
scores = [
self.sia.polarity_scores(sentence)["compound"]
for sentence in nltk.sent_tokenize(input_text)
]
inference_duration.labels(
nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__
).observe(time.perf_counter() - start)
return mean(scores) > 0
1 change: 1 addition & 0 deletions docs/source/reference/index.rst
Expand Up @@ -11,6 +11,7 @@ BentoML APIs and learn about all the options they provide.
core
stores
api_io_descriptors
metrics
frameworks/index
cli

Expand Down
83 changes: 83 additions & 0 deletions docs/source/reference/metrics.rst
@@ -0,0 +1,83 @@
===========
Metrics API
===========

BentoML provides metrics API that uses `Prometheus <https://prometheus.io/>`_ under the hood.

BentoML's ``bentoml.metrics`` is a drop-in replacement for ``prometheus_client`` that should be used in BentoML services:

.. code-block:: diff
diff --git a/service.py b/service.py
index acd8467e..0f3e6e77 100644
--- a/service.py
+++ b/service.py
@@ -1,11 +1,10 @@
-from prometheus_client import Summary
+from bentoml.metrics import Summary
import random
import time
REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request")
@REQUEST_TIME.time()
def process_request(t):
"""A function that takes some time."""
While ``bentoml.metrics`` contains all API that is offered by ``prometheus_client``,
users should always use ``bentoml.metrics`` instead of ``prometheus_client`` in your service definition.

The reason is that BentoML's ``bentoml.metrics`` will construct metrics lazily and
ensure `multiprocessing mode <https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn>`_. are correctly configured.

.. note::

``prometheus_client`` shouldn't be imported in BentoML services, otherwise it will
break multiprocessing mode.

.. note::

All metrics from ``bentoml.metrics`` will set up ``registry`` to handle multiprocess mode,
which means you **SHOULD NOT** pass in ``registry`` argument to metrics initialization:

.. code-block:: python
:caption: service.py
# THIS WILL NOT WORK
from bentoml.metrics import Summary, CollectorRegistry
from bentoml.metrics import multiprocess
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
REQUEST_TIME = Summary(
"request_processing_seconds", "Time spent processing request", registry=registry
)
instead:

.. code-block:: python
:caption: service.py
# THIS WILL WORK
from bentoml.metrics import Summary
REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request")
-----

The following section will go over the most commonly used metrics API in
``bentoml.metrics``:

.. currentmodule:: bentoml._internal.server.metrics

.. autofunction:: bentoml.metrics.generate_latest

.. autofunction:: bentoml.metrics.text_string_to_metric_families

.. autofunction:: bentoml.metrics.Histogram

.. autofunction:: bentoml.metrics.Counter

.. autofunction:: bentoml.metrics.Summary

.. autofunction:: bentoml.metrics.Gauge
1 change: 1 addition & 0 deletions examples/custom_model_runner/.gitignore
@@ -1,2 +1,3 @@
data
mnist_png/
mnist_png.tar.gz

0 comments on commit 60537e6

Please sign in to comment.