bentoml · aarnphm · Oct 28, 2022 · Oct 26, 2022 · Oct 26, 2022 · Oct 26, 2022
@@ -167,5 +167,6 @@
     "transformers.file_utils",
     "xgboost",
     "catboost",
+    "prometheus_client",
     "bentoml._internal.models.model.ModelSignatureDict",
 ]
@@ -9,7 +9,7 @@ BentoML allows users to define custom metrics with `Prometheus <https://promethe
 This article will dive into how to add custom metrics to monitor your BentoService and how you can incorporate custom metrics into 
 either a :ref:`concepts/runner:Custom Runner` or your :ref:`Service <concepts/service:Service and APIs>`.
 
-Having a `Prometheus server <https://prometheus.io/docs/prometheus/latest/getting_started/>` available will help visualize the examples in this guide.
+Having a `Prometheus server <https://prometheus.io/docs/prometheus/latest/getting_started/>`_ available will help visualize the examples in this guide.
 
 .. note::
 
@@ -18,11 +18,11 @@ Having a `Prometheus server <https://prometheus.io/docs/prometheus/latest/gettin
 
 .. seealso::
 
-   All `metrics types <https://prometheus.io/docs/concepts/metric_types/>`_ supported by Prometheus are supported in BentoML. See :ref:`Metrics API <reference/metrics:Metrics API>` for more information on ``bentoml.metrics``.
+   All `metrics types <https://prometheus.io/docs/concepts/metric_types/>`_ supported by Prometheus are supported in BentoML. See :ref:`reference/metrics:Metrics API` for more information on ``bentoml.metrics``.
 
 
 Using Metrics in a BentoService
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 We will build a custom histogram to track the latency of our :ref:`pretrained NLTK runner <concepts/runner:Custom Runner>`, a custom
 counter to measure the total amount of time our endpoint is invoked.

@@ -68,7 +68,7 @@ ensure `multiprocessing mode <https://github.com/prometheus/client_python#multip
 The following section will go over the most commonly used metrics API in
 ``bentoml.metrics``:
 
-.. currentmodule:: bentoml._internal.server.metrics
+.. currentmodule:: bentoml.metrics
 
 .. autofunction:: bentoml.metrics.generate_latest
 

@@ -17,4 +17,6 @@ imageio==2.22.1
 pyarrow==9.0.0
 build[virtualenv]==0.8.0
 protobuf==3.19.6
-grpcio-tools>=1.41.0,<1.49.0,!=1.48.2
+grpcio>=1.41.0, <1.49, !=1.48.2
+grpcio-health-checking>=1.41.0, <1.49, !=1.48.2
+opentelemetry-instrumentation-grpc==0.34b0
@@ -138,6 +138,8 @@ async def async_run_method(
         *args: P.args,
         **kwargs: P.kwargs,
     ) -> R | tuple[R, ...]:
+        import aiohttp
+
         from ...runner.container import AutoContainer
 
         inp_batch_dim = __bentoml_method.config.batch_dim[0]