Skip to content

Commit

Permalink
docs: Update advanced guides format (#3154)
Browse files Browse the repository at this point in the history
* docs: Update advanced guides format
  • Loading branch information
ssheng committed Oct 28, 2022
1 parent b4ffc31 commit 78d3e00
Show file tree
Hide file tree
Showing 14 changed files with 72 additions and 51 deletions.
4 changes: 2 additions & 2 deletions docs/source/concepts/runner.rst
Expand Up @@ -299,15 +299,15 @@ Runner Definition
Runner Configuration
--------------------

Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration <guides/configuration:Configuring BentoML>`.
Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration <guides/configuration:Configuration>`.
Runners can be both configured individually or in aggregate under the ``runners`` configuration key. To configure a specific runner, specify its name
under the ``runners`` configuration key. Otherwise, the configuration will be applied to all runners. The examples below demonstrate both
the configuration for all runners in aggregate and for an individual runner (``iris_clf``).

Adaptive Batching
^^^^^^^^^^^^^^^^^

If a model or custom runner supports batching, the :ref:`adaptive batching <guides/configuration:Configuring BentoML>` mechanism is enabled by default.
If a model or custom runner supports batching, the :ref:`adaptive batching <guides/configuration:Configuration>` mechanism is enabled by default.
To explicitly disable or control adaptive batching behaviors at runtime, configuration can be specified under the ``batching`` key.

.. tab-set::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/frameworks/catboost.rst
Expand Up @@ -138,7 +138,7 @@ Using GPU

CatBoost Runners will automatically use ``task_type=GPU`` if a GPU is detected.

This behavior can be disabled using the :ref:`BentoML configuration file<guides/configuration:Configuring BentoML>`:
This behavior can be disabled using the :ref:`BentoML configuration file<guides/configuration:Configuration>`:

access:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/frameworks/xgboost.rst
Expand Up @@ -145,7 +145,7 @@ GPU Inference

If there is a GPU available, the XGBoost Runner will automatically use ``gpu_predictor`` by default.
This can be disabled by using the
:ref:`BentoML configuration file <guides/configuration:Configuring BentoML>` to disable Runner GPU
:ref:`BentoML configuration file <guides/configuration:Configuration>` to disable Runner GPU
access:

.. code-block:: yaml
Expand Down
4 changes: 2 additions & 2 deletions docs/source/guides/client.rst
@@ -1,6 +1,6 @@
========================
============
Bento Client
========================
============

BentoML provides a client implementation that can be used to make requests to a BentoML server.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/guides/configuration.rst
@@ -1,6 +1,6 @@
===================
Configuring BentoML
===================
=============
Configuration
=============

BentoML starts with an out-of-the-box configuration that works for common use cases. For advanced users, many
features can be customized through configuration. Both BentoML CLI and Python APIs can be customized
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guides/grpc.rst
Expand Up @@ -1410,7 +1410,7 @@ faster go-to-market strategy.
Performance tuning
~~~~~~~~~~~~~~~~~~

BentoML allows user to tune the performance of gRPC via :ref:`bentoml_configuration.yaml <guides/configuration:Configuring BentoML>` via ``api_server.grpc``.
BentoML allows user to tune the performance of gRPC via :ref:`bentoml_configuration.yaml <guides/configuration:Configuration>` via ``api_server.grpc``.

A quick overview of the available configuration for gRPC:

Expand Down
11 changes: 5 additions & 6 deletions docs/source/guides/index.rst
Expand Up @@ -13,16 +13,15 @@ into this part of the documentation.
:titlesonly:

batching
containerization
client
grpc
server
configuration
containerization
metrics
gpu
logging
monitoring
metrics
performance
server
grpc
gpu
security
tracing
migration
Expand Down
6 changes: 3 additions & 3 deletions docs/source/guides/logging.rst
@@ -1,6 +1,6 @@
=================
Customize Logging
=================
=======
Logging
=======

Server Logging
--------------
Expand Down
21 changes: 0 additions & 21 deletions docs/source/guides/monitoring.rst

This file was deleted.

11 changes: 11 additions & 0 deletions docs/source/guides/security.rst
Expand Up @@ -38,6 +38,17 @@ Here's an example with starlette-authlib:
svc.add_asgi_middleware(SessionMiddleware, secret_key='you_secret')
Certificates
^^^^^^^^^^^^

BentoML supports HTTPS with self-signed certificates. To enable HTTPS, you can to provide SSL certificate and key files as arguments
to the :code:`bentoml serve` command. Use :code:`bentoml serve --help` to see the full list of options.

.. code::
bentoml serve iris_classifier:latest --ssl-certfile /path/to/cert.pem --ssl-keyfile /path/to/key.pem
Reverse Proxy
^^^^^^^^^^^^^

Expand Down
6 changes: 3 additions & 3 deletions docs/source/guides/server.rst
@@ -1,6 +1,6 @@
=====================
Customize BentoServer
=====================
============
Bento Server
============

BentoML Server runs the Service API in an `ASGI <https://asgi.readthedocs.io/en/latest/>`_
web serving layer and puts Runners in a separate worker process pool managed by BentoML. The ASGI web
Expand Down
18 changes: 17 additions & 1 deletion docs/source/guides/snippets/metrics/metric_defs.py
Expand Up @@ -6,7 +6,23 @@
name="inference_duration",
documentation="Duration of inference",
labelnames=["nltk_version", "sentiment_cls"],
buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),
buckets=(
0.005,
0.01,
0.025,
0.05,
0.075,
0.1,
0.25,
0.5,
0.75,
1.0,
2.5,
5.0,
7.5,
10.0,
float("inf"),
),
)

polarity_counter = bentoml.metrics.Counter(
Expand Down
28 changes: 22 additions & 6 deletions examples/custom_runner/nltk_pretrained_model/service.py
Expand Up @@ -24,13 +24,29 @@ class RunnerImpl(bentoml.Runner):
name="inference_duration",
documentation="Duration of inference",
labelnames=["nltk_version", "sentiment_cls"],
buckets=exponential_buckets(0.001, 1.5, 10.0),
buckets=(
0.005,
0.01,
0.025,
0.05,
0.075,
0.1,
0.25,
0.5,
0.75,
1.0,
2.5,
5.0,
7.5,
10.0,
float("inf"),
),
)

num_invocation = bentoml.metrics.Counter(
name="num_invocation",
documentation="Count total number of invocation for a given endpoint",
labelnames=["endpoint"],
polarity_counter = bentoml.metrics.Counter(
name="polarity_total",
documentation="Count total number of analysis by polarity scores",
labelnames=["polarity"],
)


Expand Down Expand Up @@ -63,6 +79,6 @@ def is_positive(self, input_text: str) -> bool:

@svc.api(input=Text(), output=JSON())
async def analysis(input_text: str) -> dict[str, bool]:
num_invocation.labels(endpoint="analysis").inc()
is_positive = await nltk_runner.is_positive.async_run(input_text)
polarity_counter.labels(polarity=is_positive).inc()
return {"is_positive": is_positive}
2 changes: 1 addition & 1 deletion src/bentoml/_internal/server/metrics/prometheus.py
Expand Up @@ -215,7 +215,7 @@ def create_response(request):
...
The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds.
See :ref:`configuration guides <guides/configuration:Configuring BentoML>` to see how to customize the buckets.
See :ref:`configuration guides <guides/configuration:Configuration>` to see how to customize the buckets.
Args:
name (str): The name of the metric.
Expand Down

0 comments on commit 78d3e00

Please sign in to comment.