docs: metrics (#3121)

* docs: metrics Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * --wip-- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: use NLTK as example Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * Update docs/source/guides/metrics.rst * Update docs/source/guides/snippets/metrics/metric_defs.py * Update docs/source/guides/snippets/metrics/metric_defs.py * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Sean Sheng <s3sheng@gmail.com>
bentoml · Oct 28, 2022 · 60537e6 · 60537e6
1 parent 9888d6b
commit 60537e6
Show file tree

Hide file tree

Showing 24 changed files with 982 additions and 103 deletions.
diff --git a/docs/source/_static/img/prometheus-metrics.png b/docs/source/_static/img/prometheus-metrics.png
diff --git a/docs/source/guides/grpc.rst b/docs/source/guides/grpc.rst
@@ -303,7 +303,7 @@ gRPC server:
             The following ``build.gradle`` should be able to help you get started:
 
             .. literalinclude:: ../../../grpc-client/java/build.gradle
-               :language: groovy
+               :language: text
                :caption: build.gradle
 
             To build the client, run:
@@ -386,7 +386,7 @@ gRPC server:
             The following ``build.gradle.kts`` should be able to help you get started:
 
             .. literalinclude:: ../../../grpc-client/kotlin/build.gradle.kts
-               :language: groovy
+               :language: text
                :caption: build.gradle.kts
 
             To build the client, run:

diff --git a/docs/source/guides/index.rst b/docs/source/guides/index.rst
@@ -16,6 +16,7 @@ into this part of the documentation.
     grpc
     configuration
     containerization
+    metrics
     gpu
     logging
     monitoring

diff --git a/docs/source/guides/metrics.rst b/docs/source/guides/metrics.rst
@@ -0,0 +1,167 @@
+=======
+Metrics
+=======
+
+Metrics are measurements of statistics about your service, which can provide information about the usage and performance of your bentos in production.
+
+BentoML allows users to define custom metrics with `Prometheus <https://prometheus.io/docs/introduction/overview/>`_ to easily enable monitoring for their Bentos.
+
+This article will dive into how to add custom metrics to monitor your BentoService and how you can incorporate custom metrics into 
+either a :ref:`concepts/runner:Custom Runner` or your :ref:`Service <concepts/service:Service and APIs>`.
+
+Having a `Prometheus server <https://prometheus.io/docs/prometheus/latest/getting_started/>` available will help visualize the examples in this guide.
+
+.. note::
+
+   This article assumes that you have a base understanding of a BentoService. If you
+   are new to BentoML, please start with :ref:`the quickstart tutorial <tutorial:Tutorial: Intro to BentoML>`.
+
+.. seealso::
+
+   All `metrics types <https://prometheus.io/docs/concepts/metric_types/>`_ supported by Prometheus are supported in BentoML. See :ref:`Metrics API <reference/metrics:Metrics API>` for more information on ``bentoml.metrics``.
+
+
+Using Metrics in a BentoService
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We will build a custom histogram to track the latency of our :ref:`pretrained NLTK runner <concepts/runner:Custom Runner>`, a custom
+counter to measure the total amount of time our endpoint is invoked.
+
+.. note::
+
+   The source code for this custom runner is :github:`available on GitHub <bentoml/BentoML/tree/main/examples/custom_runner/nltk_pretrained_model>`.
+
+Initialize our metrics as follow:
+
+.. literalinclude:: ./snippets/metrics/metric_defs.py
+   :language: python
+   :caption: `service.py`
+
+``inference_duration`` is a :meth:`bentoml.metrics.Histogram`, which tracks how long it
+takes for our model to run inference.
+The :attr:`bentoml.metrics.Histogram.buckets` argument is used to determine the granularity of histogram tracking. The range of the buckets should cover the range of values the histogram is expected track. Number of buckets is positively correlated to the the granularity of tracking. The last value of the bucket should always be the positive infinity. See Prometheus documentation on `Histogram <https://prometheus.io/docs/practices/histograms/>`_ for more details.
+
+``polarity_counter`` is a :meth:`bentoml.metrics.Counter`, which tracks the total number
+of analysis by the polarity scores.
+
+.. epigraph::
+
+   :bdg-info:`Note:` This also applies to any other metric type, including :meth:`bentoml.metrics.Gauge` and :meth:`bentoml.metrics.Summary`.
+
+Create our NLTK custom runner:
+
+.. literalinclude:: ./snippets/metrics/runner_impl.py
+   :language: python
+   :caption: `service.py`
+
+This runnable implementation creates a custom NLTK runner, that use the ``inference_duration``
+histogram to track the latency of polarity scores from a given sentence.
+
+Initialize our NLTK runner, and add it to the service:
+
+.. code-block:: python
+
+   nltk_runner = t.cast(
+      "RunnerImpl", bentoml.Runner(NLTKSentimentAnalysisRunnable, name="nltk_sentiment")
+   )
+
+   svc = bentoml.Service("sentiment_analyzer", runners=[nltk_runner])
+
+
+   @svc.api(input=bentoml.io.Text(), output=bentoml.io.JSON())
+   async def analysis(input_text: str) -> dict[str, bool]:
+       is_positive = await nltk_runner.is_positive.async_run(input_text)
+       polarity_counter.labels(polarity=is_positive).inc()
+       return {"is_positive": is_positive}
+
+Our endpoint ``analysis`` uses the ``polarity_counter`` to track the total number of
+invocation for ``analysis`` by polarity scores.
+
+.. tab-set::
+
+    .. tab-item:: HTTP
+       :sync: http
+
+       Serve our service:
+
+       .. code-block:: bash
+
+          » bentoml serve-http --production
+
+       Use the following ``prometheus.yml`` config:
+
+       .. literalinclude:: ../../../examples/custom_runner/nltk_pretrained_model/prometheus/prometheus.http.yml
+          :language: python
+          :caption: `prometheus.yml`
+
+       Startup your Prometheus server in a different terminal session:
+
+       .. code-block:: bash
+
+          » prometheus --config.file=prometheus.yml
+
+       In a different terminal, send a request to our service:
+
+       .. code-block:: bash
+
+          » curl -X POST -F "image=@test_image.png" \
+                   http://0.0.0.0:3000/predict
+
+    .. tab-item:: gRPC
+       :sync: grpc
+
+       Serve our service:
+
+       .. code-block:: bash
+
+          » bentoml serve-grpc --production --enable-reflection
+
+       Use the following ``prometheus.yml`` config:
+
+       .. literalinclude:: ../../../examples/custom_runner/nltk_pretrained_model/prometheus/prometheus.grpc.yml
+          :language: python
+          :caption: `prometheus.yml`
+
+       Startup your Prometheus server in a different terminal session:
+
+       .. code-block:: bash
+
+          » prometheus --config.file=prometheus.yml
+
+       In a different terminal, send a request to our service:
+
+       .. code-block:: bash
+
+          » grpcurl -d @ -plaintext 0.0.0.0:3000 bentoml.grpc.v1alpha1.BentoService/Call <<EOT
+            {
+              "apiName": "predict",
+              "serializedBytes": "..."
+            }
+            EOT
+
+Visit `http://localhost:9090/graph <http://localhost:9090/graph>`_ and use the following query for 95th percentile inference latency:
+
+.. code-block:: text
+
+   histogram_quantile(0.95, rate(inference_duration_bucket[1m]))
+
+.. image:: ../_static/img/prometheus-metrics.png
+
+.. TODO::
+
+    * Grafana dashboard
+
+.. admonition:: Help us improve the project!
+
+    Found an issue or a TODO item? You're always welcome to make contributions to the
+    project and its documentation. Check out the
+    `BentoML development guide <https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md>`_
+    and `documentation guide <https://github.com/bentoml/BentoML/blob/main/docs/README.md>`_
+    to get started.
+
+
+----
+
+.. rubric:: Notes
+
+.. [#prometheus] `Prometheus <https://prometheus.io/>`_
diff --git a/docs/source/guides/snippets/metrics/metric_defs.py b/docs/source/guides/snippets/metrics/metric_defs.py
@@ -0,0 +1,16 @@
+from __future__ import annotations
+
+import bentoml
+
+inference_duration = bentoml.metrics.Histogram(
+    name="inference_duration",
+    documentation="Duration of inference",
+    labelnames=["nltk_version", "sentiment_cls"],
+    buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),
+)
+
+polarity_counter = bentoml.metrics.Counter(
+    name="polarity_total",
+    documentation="Count total number of analysis by polarity scores",
+    labelnames=["polarity"],
+)
diff --git a/docs/source/guides/snippets/metrics/runner_impl.py b/docs/source/guides/snippets/metrics/runner_impl.py
@@ -0,0 +1,18 @@
+class NLTKSentimentAnalysisRunnable(bentoml.Runnable):
+    SUPPORTED_RESOURCES = ("cpu",)
+    SUPPORTS_CPU_MULTI_THREADING = False
+
+    def __init__(self):
+        self.sia = SentimentIntensityAnalyzer()
+
+    @bentoml.Runnable.method(batchable=False)
+    def is_positive(self, input_text: str) -> bool:
+        start = time.perf_counter()
+        scores = [
+            self.sia.polarity_scores(sentence)["compound"]
+            for sentence in nltk.sent_tokenize(input_text)
+        ]
+        inference_duration.labels(
+            nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__
+        ).observe(time.perf_counter() - start)
+        return mean(scores) > 0
diff --git a/docs/source/reference/index.rst b/docs/source/reference/index.rst
@@ -11,6 +11,7 @@ BentoML APIs and learn about all the options they provide.
   core
   stores
   api_io_descriptors
+  metrics
   frameworks/index
   cli
 

diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst
@@ -0,0 +1,83 @@
+===========
+Metrics API
+===========
+
+BentoML provides metrics API that uses `Prometheus <https://prometheus.io/>`_ under the hood.
+
+BentoML's ``bentoml.metrics`` is a drop-in replacement for ``prometheus_client`` that should be used in BentoML services:
+
+.. code-block:: diff
+
+   diff --git a/service.py b/service.py
+   index acd8467e..0f3e6e77 100644
+   --- a/service.py
+   +++ b/service.py
+   @@ -1,11 +1,10 @@
+   -from prometheus_client import Summary
+   +from bentoml.metrics import Summary
+    import random
+    import time
+
+   REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request")
+
+   @REQUEST_TIME.time()
+   def process_request(t):
+       """A function that takes some time."""
+
+While ``bentoml.metrics`` contains all API that is offered by ``prometheus_client``,
+users should always use ``bentoml.metrics`` instead of ``prometheus_client`` in your service definition.
+
+The reason is that BentoML's ``bentoml.metrics`` will construct metrics lazily and
+ensure `multiprocessing mode <https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn>`_. are correctly configured.
+
+.. note::
+
+   ``prometheus_client`` shouldn't be imported in BentoML services, otherwise it will
+   break multiprocessing mode.
+
+.. note::
+
+   All metrics from ``bentoml.metrics`` will set up ``registry`` to handle multiprocess mode,
+   which means you **SHOULD NOT** pass in ``registry`` argument to metrics initialization:
+
+   .. code-block:: python
+      :caption: service.py
+
+      # THIS WILL NOT WORK
+      from bentoml.metrics import Summary, CollectorRegistry
+      from bentoml.metrics import multiprocess
+
+      registry = CollectorRegistry()
+      multiprocess.MultiProcessCollector(registry)
+      REQUEST_TIME = Summary(
+         "request_processing_seconds", "Time spent processing request", registry=registry
+      )
+
+   instead:
+
+   .. code-block:: python
+      :caption: service.py
+
+      # THIS WILL WORK
+      from bentoml.metrics import Summary
+
+      REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request")
+
+-----
+
+The following section will go over the most commonly used metrics API in
+``bentoml.metrics``:
+
+.. currentmodule:: bentoml._internal.server.metrics
+
+.. autofunction:: bentoml.metrics.generate_latest
+
+.. autofunction:: bentoml.metrics.text_string_to_metric_families
+
+.. autofunction:: bentoml.metrics.Histogram
+
+.. autofunction:: bentoml.metrics.Counter
+
+.. autofunction:: bentoml.metrics.Summary
+
+.. autofunction:: bentoml.metrics.Gauge
diff --git a/examples/custom_model_runner/.gitignore b/examples/custom_model_runner/.gitignore
@@ -1,2 +1,3 @@
+data
 mnist_png/
 mnist_png.tar.gz