-
Notifications
You must be signed in to change notification settings - Fork 749
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* docs: metrics Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * --wip-- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: use NLTK as example Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * Update docs/source/guides/metrics.rst * Update docs/source/guides/snippets/metrics/metric_defs.py * Update docs/source/guides/snippets/metrics/metric_defs.py * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst * Update docs/source/guides/metrics.rst Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Sean Sheng <s3sheng@gmail.com>
- Loading branch information
Showing
24 changed files
with
982 additions
and
103 deletions.
There are no files selected for viewing
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
======= | ||
Metrics | ||
======= | ||
|
||
Metrics are measurements of statistics about your service, which can provide information about the usage and performance of your bentos in production. | ||
|
||
BentoML allows users to define custom metrics with `Prometheus <https://prometheus.io/docs/introduction/overview/>`_ to easily enable monitoring for their Bentos. | ||
|
||
This article will dive into how to add custom metrics to monitor your BentoService and how you can incorporate custom metrics into | ||
either a :ref:`concepts/runner:Custom Runner` or your :ref:`Service <concepts/service:Service and APIs>`. | ||
|
||
Having a `Prometheus server <https://prometheus.io/docs/prometheus/latest/getting_started/>` available will help visualize the examples in this guide. | ||
|
||
.. note:: | ||
|
||
This article assumes that you have a base understanding of a BentoService. If you | ||
are new to BentoML, please start with :ref:`the quickstart tutorial <tutorial:Tutorial: Intro to BentoML>`. | ||
|
||
.. seealso:: | ||
|
||
All `metrics types <https://prometheus.io/docs/concepts/metric_types/>`_ supported by Prometheus are supported in BentoML. See :ref:`Metrics API <reference/metrics:Metrics API>` for more information on ``bentoml.metrics``. | ||
|
||
|
||
Using Metrics in a BentoService | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
We will build a custom histogram to track the latency of our :ref:`pretrained NLTK runner <concepts/runner:Custom Runner>`, a custom | ||
counter to measure the total amount of time our endpoint is invoked. | ||
|
||
.. note:: | ||
|
||
The source code for this custom runner is :github:`available on GitHub <bentoml/BentoML/tree/main/examples/custom_runner/nltk_pretrained_model>`. | ||
|
||
Initialize our metrics as follow: | ||
|
||
.. literalinclude:: ./snippets/metrics/metric_defs.py | ||
:language: python | ||
:caption: `service.py` | ||
|
||
``inference_duration`` is a :meth:`bentoml.metrics.Histogram`, which tracks how long it | ||
takes for our model to run inference. | ||
The :attr:`bentoml.metrics.Histogram.buckets` argument is used to determine the granularity of histogram tracking. The range of the buckets should cover the range of values the histogram is expected track. Number of buckets is positively correlated to the the granularity of tracking. The last value of the bucket should always be the positive infinity. See Prometheus documentation on `Histogram <https://prometheus.io/docs/practices/histograms/>`_ for more details. | ||
|
||
``polarity_counter`` is a :meth:`bentoml.metrics.Counter`, which tracks the total number | ||
of analysis by the polarity scores. | ||
|
||
.. epigraph:: | ||
|
||
:bdg-info:`Note:` This also applies to any other metric type, including :meth:`bentoml.metrics.Gauge` and :meth:`bentoml.metrics.Summary`. | ||
|
||
Create our NLTK custom runner: | ||
|
||
.. literalinclude:: ./snippets/metrics/runner_impl.py | ||
:language: python | ||
:caption: `service.py` | ||
|
||
This runnable implementation creates a custom NLTK runner, that use the ``inference_duration`` | ||
histogram to track the latency of polarity scores from a given sentence. | ||
|
||
Initialize our NLTK runner, and add it to the service: | ||
|
||
.. code-block:: python | ||
nltk_runner = t.cast( | ||
"RunnerImpl", bentoml.Runner(NLTKSentimentAnalysisRunnable, name="nltk_sentiment") | ||
) | ||
svc = bentoml.Service("sentiment_analyzer", runners=[nltk_runner]) | ||
@svc.api(input=bentoml.io.Text(), output=bentoml.io.JSON()) | ||
async def analysis(input_text: str) -> dict[str, bool]: | ||
is_positive = await nltk_runner.is_positive.async_run(input_text) | ||
polarity_counter.labels(polarity=is_positive).inc() | ||
return {"is_positive": is_positive} | ||
Our endpoint ``analysis`` uses the ``polarity_counter`` to track the total number of | ||
invocation for ``analysis`` by polarity scores. | ||
|
||
.. tab-set:: | ||
|
||
.. tab-item:: HTTP | ||
:sync: http | ||
|
||
Serve our service: | ||
|
||
.. code-block:: bash | ||
» bentoml serve-http --production | ||
Use the following ``prometheus.yml`` config: | ||
|
||
.. literalinclude:: ../../../examples/custom_runner/nltk_pretrained_model/prometheus/prometheus.http.yml | ||
:language: python | ||
:caption: `prometheus.yml` | ||
|
||
Startup your Prometheus server in a different terminal session: | ||
|
||
.. code-block:: bash | ||
» prometheus --config.file=prometheus.yml | ||
In a different terminal, send a request to our service: | ||
|
||
.. code-block:: bash | ||
» curl -X POST -F "image=@test_image.png" \ | ||
http://0.0.0.0:3000/predict | ||
.. tab-item:: gRPC | ||
:sync: grpc | ||
|
||
Serve our service: | ||
|
||
.. code-block:: bash | ||
» bentoml serve-grpc --production --enable-reflection | ||
Use the following ``prometheus.yml`` config: | ||
|
||
.. literalinclude:: ../../../examples/custom_runner/nltk_pretrained_model/prometheus/prometheus.grpc.yml | ||
:language: python | ||
:caption: `prometheus.yml` | ||
|
||
Startup your Prometheus server in a different terminal session: | ||
|
||
.. code-block:: bash | ||
» prometheus --config.file=prometheus.yml | ||
In a different terminal, send a request to our service: | ||
|
||
.. code-block:: bash | ||
» grpcurl -d @ -plaintext 0.0.0.0:3000 bentoml.grpc.v1alpha1.BentoService/Call <<EOT | ||
{ | ||
"apiName": "predict", | ||
"serializedBytes": "..." | ||
} | ||
EOT | ||
Visit `http://localhost:9090/graph <http://localhost:9090/graph>`_ and use the following query for 95th percentile inference latency: | ||
.. code-block:: text | ||
histogram_quantile(0.95, rate(inference_duration_bucket[1m])) | ||
.. image:: ../_static/img/prometheus-metrics.png | ||
.. TODO:: | ||
* Grafana dashboard | ||
.. admonition:: Help us improve the project! | ||
Found an issue or a TODO item? You're always welcome to make contributions to the | ||
project and its documentation. Check out the | ||
`BentoML development guide <https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md>`_ | ||
and `documentation guide <https://github.com/bentoml/BentoML/blob/main/docs/README.md>`_ | ||
to get started. | ||
---- | ||
.. rubric:: Notes | ||
.. [#prometheus] `Prometheus <https://prometheus.io/>`_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
from __future__ import annotations | ||
|
||
import bentoml | ||
|
||
inference_duration = bentoml.metrics.Histogram( | ||
name="inference_duration", | ||
documentation="Duration of inference", | ||
labelnames=["nltk_version", "sentiment_cls"], | ||
buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")), | ||
) | ||
|
||
polarity_counter = bentoml.metrics.Counter( | ||
name="polarity_total", | ||
documentation="Count total number of analysis by polarity scores", | ||
labelnames=["polarity"], | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
class NLTKSentimentAnalysisRunnable(bentoml.Runnable): | ||
SUPPORTED_RESOURCES = ("cpu",) | ||
SUPPORTS_CPU_MULTI_THREADING = False | ||
|
||
def __init__(self): | ||
self.sia = SentimentIntensityAnalyzer() | ||
|
||
@bentoml.Runnable.method(batchable=False) | ||
def is_positive(self, input_text: str) -> bool: | ||
start = time.perf_counter() | ||
scores = [ | ||
self.sia.polarity_scores(sentence)["compound"] | ||
for sentence in nltk.sent_tokenize(input_text) | ||
] | ||
inference_duration.labels( | ||
nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__ | ||
).observe(time.perf_counter() - start) | ||
return mean(scores) > 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
=========== | ||
Metrics API | ||
=========== | ||
|
||
BentoML provides metrics API that uses `Prometheus <https://prometheus.io/>`_ under the hood. | ||
|
||
BentoML's ``bentoml.metrics`` is a drop-in replacement for ``prometheus_client`` that should be used in BentoML services: | ||
|
||
.. code-block:: diff | ||
diff --git a/service.py b/service.py | ||
index acd8467e..0f3e6e77 100644 | ||
--- a/service.py | ||
+++ b/service.py | ||
@@ -1,11 +1,10 @@ | ||
-from prometheus_client import Summary | ||
+from bentoml.metrics import Summary | ||
import random | ||
import time | ||
REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request") | ||
@REQUEST_TIME.time() | ||
def process_request(t): | ||
"""A function that takes some time.""" | ||
While ``bentoml.metrics`` contains all API that is offered by ``prometheus_client``, | ||
users should always use ``bentoml.metrics`` instead of ``prometheus_client`` in your service definition. | ||
|
||
The reason is that BentoML's ``bentoml.metrics`` will construct metrics lazily and | ||
ensure `multiprocessing mode <https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn>`_. are correctly configured. | ||
|
||
.. note:: | ||
|
||
``prometheus_client`` shouldn't be imported in BentoML services, otherwise it will | ||
break multiprocessing mode. | ||
|
||
.. note:: | ||
|
||
All metrics from ``bentoml.metrics`` will set up ``registry`` to handle multiprocess mode, | ||
which means you **SHOULD NOT** pass in ``registry`` argument to metrics initialization: | ||
|
||
.. code-block:: python | ||
:caption: service.py | ||
# THIS WILL NOT WORK | ||
from bentoml.metrics import Summary, CollectorRegistry | ||
from bentoml.metrics import multiprocess | ||
registry = CollectorRegistry() | ||
multiprocess.MultiProcessCollector(registry) | ||
REQUEST_TIME = Summary( | ||
"request_processing_seconds", "Time spent processing request", registry=registry | ||
) | ||
instead: | ||
|
||
.. code-block:: python | ||
:caption: service.py | ||
# THIS WILL WORK | ||
from bentoml.metrics import Summary | ||
REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request") | ||
----- | ||
|
||
The following section will go over the most commonly used metrics API in | ||
``bentoml.metrics``: | ||
|
||
.. currentmodule:: bentoml._internal.server.metrics | ||
|
||
.. autofunction:: bentoml.metrics.generate_latest | ||
|
||
.. autofunction:: bentoml.metrics.text_string_to_metric_families | ||
|
||
.. autofunction:: bentoml.metrics.Histogram | ||
|
||
.. autofunction:: bentoml.metrics.Counter | ||
|
||
.. autofunction:: bentoml.metrics.Summary | ||
|
||
.. autofunction:: bentoml.metrics.Gauge |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
data | ||
mnist_png/ | ||
mnist_png.tar.gz |
Oops, something went wrong.