Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update advanced guides format #3154

Merged
merged 5 commits into from Oct 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/concepts/runner.rst
Expand Up @@ -299,15 +299,15 @@ Runner Definition
Runner Configuration
--------------------

Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration <guides/configuration:Configuring BentoML>`.
Runner behaviors and resource allocation can be specified via BentoML :ref:`configuration <guides/configuration:Configuration>`.
Runners can be both configured individually or in aggregate under the ``runners`` configuration key. To configure a specific runner, specify its name
under the ``runners`` configuration key. Otherwise, the configuration will be applied to all runners. The examples below demonstrate both
the configuration for all runners in aggregate and for an individual runner (``iris_clf``).

Adaptive Batching
^^^^^^^^^^^^^^^^^

If a model or custom runner supports batching, the :ref:`adaptive batching <guides/configuration:Configuring BentoML>` mechanism is enabled by default.
If a model or custom runner supports batching, the :ref:`adaptive batching <guides/configuration:Configuration>` mechanism is enabled by default.
To explicitly disable or control adaptive batching behaviors at runtime, configuration can be specified under the ``batching`` key.

.. tab-set::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/frameworks/catboost.rst
Expand Up @@ -138,7 +138,7 @@ Using GPU

CatBoost Runners will automatically use ``task_type=GPU`` if a GPU is detected.

This behavior can be disabled using the :ref:`BentoML configuration file<guides/configuration:Configuring BentoML>`:
This behavior can be disabled using the :ref:`BentoML configuration file<guides/configuration:Configuration>`:

access:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/frameworks/xgboost.rst
Expand Up @@ -145,7 +145,7 @@ GPU Inference

If there is a GPU available, the XGBoost Runner will automatically use ``gpu_predictor`` by default.
This can be disabled by using the
:ref:`BentoML configuration file <guides/configuration:Configuring BentoML>` to disable Runner GPU
:ref:`BentoML configuration file <guides/configuration:Configuration>` to disable Runner GPU
access:

.. code-block:: yaml
Expand Down
4 changes: 2 additions & 2 deletions docs/source/guides/client.rst
@@ -1,6 +1,6 @@
========================
============
Bento Client
========================
============

BentoML provides a client implementation that can be used to make requests to a BentoML server.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/guides/configuration.rst
@@ -1,6 +1,6 @@
===================
Configuring BentoML
===================
=============
Configuration
=============

BentoML starts with an out-of-the-box configuration that works for common use cases. For advanced users, many
features can be customized through configuration. Both BentoML CLI and Python APIs can be customized
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guides/grpc.rst
Expand Up @@ -1410,7 +1410,7 @@ faster go-to-market strategy.
Performance tuning
~~~~~~~~~~~~~~~~~~

BentoML allows user to tune the performance of gRPC via :ref:`bentoml_configuration.yaml <guides/configuration:Configuring BentoML>` via ``api_server.grpc``.
BentoML allows user to tune the performance of gRPC via :ref:`bentoml_configuration.yaml <guides/configuration:Configuration>` via ``api_server.grpc``.

A quick overview of the available configuration for gRPC:

Expand Down
11 changes: 5 additions & 6 deletions docs/source/guides/index.rst
Expand Up @@ -13,16 +13,15 @@ into this part of the documentation.
:titlesonly:

batching
containerization
client
grpc
server
configuration
containerization
metrics
gpu
logging
monitoring
metrics
performance
server
grpc
gpu
security
tracing
migration
Expand Down
6 changes: 3 additions & 3 deletions docs/source/guides/logging.rst
@@ -1,6 +1,6 @@
=================
Customize Logging
=================
=======
Logging
=======

Server Logging
--------------
Expand Down
21 changes: 0 additions & 21 deletions docs/source/guides/monitoring.rst

This file was deleted.

11 changes: 11 additions & 0 deletions docs/source/guides/security.rst
Expand Up @@ -38,6 +38,17 @@ Here's an example with starlette-authlib:
svc.add_asgi_middleware(SessionMiddleware, secret_key='you_secret')


Certificates
^^^^^^^^^^^^

BentoML supports HTTPS with self-signed certificates. To enable HTTPS, you can to provide SSL certificate and key files as arguments
to the :code:`bentoml serve` command. Use :code:`bentoml serve --help` to see the full list of options.

.. code::

bentoml serve iris_classifier:latest --ssl-certfile /path/to/cert.pem --ssl-keyfile /path/to/key.pem


Reverse Proxy
^^^^^^^^^^^^^

Expand Down
6 changes: 3 additions & 3 deletions docs/source/guides/server.rst
@@ -1,6 +1,6 @@
=====================
Customize BentoServer
=====================
============
Bento Server
============

BentoML Server runs the Service API in an `ASGI <https://asgi.readthedocs.io/en/latest/>`_
web serving layer and puts Runners in a separate worker process pool managed by BentoML. The ASGI web
Expand Down
18 changes: 17 additions & 1 deletion docs/source/guides/snippets/metrics/metric_defs.py
Expand Up @@ -6,7 +6,23 @@
name="inference_duration",
documentation="Duration of inference",
labelnames=["nltk_version", "sentiment_cls"],
buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),
buckets=(
0.005,
0.01,
0.025,
0.05,
0.075,
0.1,
0.25,
0.5,
0.75,
1.0,
2.5,
5.0,
7.5,
10.0,
float("inf"),
),
)

polarity_counter = bentoml.metrics.Counter(
Expand Down
28 changes: 22 additions & 6 deletions examples/custom_runner/nltk_pretrained_model/service.py
Expand Up @@ -24,13 +24,29 @@ class RunnerImpl(bentoml.Runner):
name="inference_duration",
documentation="Duration of inference",
labelnames=["nltk_version", "sentiment_cls"],
buckets=exponential_buckets(0.001, 1.5, 10.0),
buckets=(
0.005,
0.01,
0.025,
0.05,
0.075,
0.1,
0.25,
0.5,
0.75,
1.0,
2.5,
5.0,
7.5,
10.0,
float("inf"),
),
)

num_invocation = bentoml.metrics.Counter(
name="num_invocation",
documentation="Count total number of invocation for a given endpoint",
labelnames=["endpoint"],
polarity_counter = bentoml.metrics.Counter(
name="polarity_total",
documentation="Count total number of analysis by polarity scores",
labelnames=["polarity"],
)


Expand Down Expand Up @@ -63,6 +79,6 @@ def is_positive(self, input_text: str) -> bool:

@svc.api(input=Text(), output=JSON())
async def analysis(input_text: str) -> dict[str, bool]:
num_invocation.labels(endpoint="analysis").inc()
is_positive = await nltk_runner.is_positive.async_run(input_text)
polarity_counter.labels(polarity=is_positive).inc()
return {"is_positive": is_positive}
2 changes: 1 addition & 1 deletion src/bentoml/_internal/server/metrics/prometheus.py
Expand Up @@ -215,7 +215,7 @@ def create_response(request):
...

The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds.
See :ref:`configuration guides <guides/configuration:Configuring BentoML>` to see how to customize the buckets.
See :ref:`configuration guides <guides/configuration:Configuration>` to see how to customize the buckets.

Args:
name (str): The name of the metric.
Expand Down