bentoml · ssheng · Oct 28, 2022 · Oct 28, 2022 · Oct 28, 2022 · Oct 28, 2022
@@ -299,15 +299,15 @@ Runner Definition
 Runner Configuration
 --------------------
 
-Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration <guides/configuration:Configuring BentoML>`.
+Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration <guides/configuration:Configuration>`.
 Runners can be both configured individually or in aggregate under the ``runners`` configuration key. To configure a specific runner, specify its name
 under the ``runners`` configuration key. Otherwise, the configuration will be applied to all runners. The examples below demonstrate both
 the configuration for all runners in aggregate and for an individual runner (``iris_clf``).
 
 Adaptive Batching
 ^^^^^^^^^^^^^^^^^
 
-If a model or custom runner supports batching, the :ref:`adaptive batching <guides/configuration:Configuring BentoML>` mechanism is enabled by default.
+If a model or custom runner supports batching, the :ref:`adaptive batching <guides/configuration:Configuration>` mechanism is enabled by default.
 To explicitly disable or control adaptive batching behaviors at runtime, configuration can be specified under the ``batching`` key.
 
 .. tab-set::

@@ -138,7 +138,7 @@ Using GPU
 
 CatBoost Runners will automatically use ``task_type=GPU`` if a GPU is detected.
 
-This behavior can be disabled using the :ref:`BentoML configuration file<guides/configuration:Configuring BentoML>`:
+This behavior can be disabled using the :ref:`BentoML configuration file<guides/configuration:Configuration>`:
 
 access:
 

@@ -145,7 +145,7 @@ GPU Inference
 
 If there is a GPU available, the XGBoost Runner will automatically use ``gpu_predictor`` by default.
 This can be disabled by using the
-:ref:`BentoML configuration file <guides/configuration:Configuring BentoML>` to disable Runner GPU
+:ref:`BentoML configuration file <guides/configuration:Configuration>` to disable Runner GPU
 access:
 
 .. code-block:: yaml

@@ -1,6 +1,6 @@
-========================
+============
 Bento Client
-========================
+============
 
 BentoML provides a client implementation that can be used to make requests to a BentoML server.
 

@@ -1,6 +1,6 @@
-===================
-Configuring BentoML
-===================
+=============
+Configuration
+=============
 
 BentoML starts with an out-of-the-box configuration that works for common use cases. For advanced users, many
 features can be customized through configuration. Both BentoML CLI and Python APIs can be customized 

@@ -1410,7 +1410,7 @@ faster go-to-market strategy.
 Performance tuning
 ~~~~~~~~~~~~~~~~~~
 
-BentoML allows user to tune the performance of gRPC via :ref:`bentoml_configuration.yaml <guides/configuration:Configuring BentoML>` via ``api_server.grpc``.
+BentoML allows user to tune the performance of gRPC via :ref:`bentoml_configuration.yaml <guides/configuration:Configuration>` via ``api_server.grpc``.
 
 A quick overview of the available configuration for gRPC:
 

@@ -13,16 +13,15 @@ into this part of the documentation.
     :titlesonly:
 
     batching
+    containerization
     client
-    grpc
+    server
     configuration
-    containerization
-    metrics
-    gpu
     logging
-    monitoring
+    metrics
     performance
-    server
+    grpc
+    gpu
     security
     tracing
     migration

@@ -1,6 +1,6 @@
-=================
-Customize Logging
-=================
+=======
+Logging
+=======
 
 Server Logging
 --------------

@@ -38,6 +38,17 @@ Here's an example with starlette-authlib:
     svc.add_asgi_middleware(SessionMiddleware, secret_key='you_secret')
 
 
+Certificates
+^^^^^^^^^^^^
+
+BentoML supports HTTPS with self-signed certificates. To enable HTTPS, you can to provide SSL certificate and key files as arguments
+to the :code:`bentoml serve` command. Use :code:`bentoml serve --help` to see the full list of options.
+
+.. code::
+
+    bentoml serve iris_classifier:latest --ssl-certfile /path/to/cert.pem --ssl-keyfile /path/to/key.pem
+
+
 Reverse Proxy
 ^^^^^^^^^^^^^
 

@@ -1,6 +1,6 @@
-=====================
-Customize BentoServer
-=====================
+============
+Bento Server
+============
 
 BentoML Server runs the Service API in an `ASGI <https://asgi.readthedocs.io/en/latest/>`_
 web serving layer and puts Runners in a separate worker process pool managed by BentoML. The ASGI web

@@ -6,7 +6,23 @@
     name="inference_duration",
     documentation="Duration of inference",
     labelnames=["nltk_version", "sentiment_cls"],
-    buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),
+    buckets=(
+        0.005,
+        0.01,
+        0.025,
+        0.05,
+        0.075,
+        0.1,
+        0.25,
+        0.5,
+        0.75,
+        1.0,
+        2.5,
+        5.0,
+        7.5,
+        10.0,
+        float("inf"),
+    ),
 )
 
 polarity_counter = bentoml.metrics.Counter(

@@ -24,13 +24,29 @@ class RunnerImpl(bentoml.Runner):
     name="inference_duration",
     documentation="Duration of inference",
     labelnames=["nltk_version", "sentiment_cls"],
-    buckets=exponential_buckets(0.001, 1.5, 10.0),
+    buckets=(
+        0.005,
+        0.01,
+        0.025,
+        0.05,
+        0.075,
+        0.1,
+        0.25,
+        0.5,
+        0.75,
+        1.0,
+        2.5,
+        5.0,
+        7.5,
+        10.0,
+        float("inf"),
+    ),
 )
 
-num_invocation = bentoml.metrics.Counter(
-    name="num_invocation",
-    documentation="Count total number of invocation for a given endpoint",
-    labelnames=["endpoint"],
+polarity_counter = bentoml.metrics.Counter(
+    name="polarity_total",
+    documentation="Count total number of analysis by polarity scores",
+    labelnames=["polarity"],
 )
 
 
@@ -63,6 +79,6 @@ def is_positive(self, input_text: str) -> bool:
 
 @svc.api(input=Text(), output=JSON())
 async def analysis(input_text: str) -> dict[str, bool]:
-    num_invocation.labels(endpoint="analysis").inc()
     is_positive = await nltk_runner.is_positive.async_run(input_text)
+    polarity_counter.labels(polarity=is_positive).inc()
     return {"is_positive": is_positive}
@@ -215,7 +215,7 @@ def create_response(request):
                          ...
 
         The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds.
-        See :ref:`configuration guides <guides/configuration:Configuring BentoML>` to see how to customize the buckets.
+        See :ref:`configuration guides <guides/configuration:Configuration>` to see how to customize the buckets.
 
         Args:
             name (str): The name of the metric.