docs: tracing and configuration

depends on bentoml#3052 Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
aarnphm · Oct 5, 2022 · c4c69a8 · c4c69a8
1 parent 1533da5
commit c4c69a8
Show file tree

Hide file tree

Showing 5 changed files with 227 additions and 35 deletions.
diff --git a/docs/source/guides/configuration.rst b/docs/source/guides/configuration.rst
@@ -2,36 +2,60 @@
 Configuring BentoML
 ===================
 
-BentoML starts with an out-of-the-box configuration that works for common use cases. For advanced users, many
-features can be customized through configuration. Both BentoML CLI and Python APIs can be customized 
-by the configuration. Configuration is best used for scenarios where the customizations can be specified once 
-and applied to the entire team.
+BentoML provides a configuration interface that allows you to customize the runtime
+behaviour of your BentoService.  This article highlight and consolidates the configuration
+fields definition, as well as some of recommendation for best-practice when configuring
+your BentoML.
 
-BentoML configuration is defined by a YAML file placed in a directory specified by the ``BENTOML_CONFIG`` 
-environment variable. The example below starts the bento server with configuration defined in ``~/bentoml_configuration.yaml``:
+   Configuration is best used for scenarios where the customizations can be specified once
+   and applied anywhere among your organization using BentoML.
 
-.. code-block:: shell
+BentoML comes with out-of-the-box configuration that should work for most use cases.
 
-    $ BENTOML_CONFIG=~/bentoml_configuration.yaml bentoml serve iris_classifier:latest
+However, for more advanced users who wants to fine-tune the feature suites BentoML has to offer,
+users can configure such runtime variables and settings via a configuration file, often referred to as
+``bentoml_configuration.yaml``.
 
-Users only need to specify a partial configuration with only the properties they wish to customize instead 
-of a full configuration schema. In the example below, the microbatching workers count is overridden to 4.
-Remaining properties will take their defaults values.
+.. note::
+
+   This is not to be **mistaken** with the ``bentofile.yaml`` which is used to define and
+   package your :ref:`Bento 🍱 <concepts/bento:What is a Bento?>`
+
+   This configuration file are for BentoML runtime configuration.
+
+Providing configuration during serve runtime
+--------------------------------------------
+
+BentoML configuration is a :wiki:`YAML` file which can then be specified via the environment variable ``BENTOML_CONFIG``.
+
+For example, given the following ``bentoml_configuration.yaml`` that specify that the
+server should only use 4 workers:
 
 .. code-block:: yaml
    :caption: `~/bentoml_configuration.yaml`
 
-    api_server:
-      workers: 4
-      timeout: 60
-      http:
-        port: 6000
+   version: 2
+   api_server:
+     workers: 4
+
+Said configuration then can be parsed to :ref:`bentoml serve <reference/cli:serve>` like
+below:
+
+.. code-block:: bash
 
-Throughout the BentoML documentation, features that are customizable through configuration are demonstrated 
-like the example above. For a full configuration schema including all customizable properties, refer to
-the BentoML configuration template defined in :github:`default_configuration.yml <bentoml/BentoML/blob/main/bentoml/_internal/configuration/default_configuration.yaml>`.
+   » BENTOML_CONFIG=~/bentoml_configuration.yaml bentoml serve iris_classifier:latest --production
+
+.. note::
 
+   Users will only have to specify a partial configuration with properties they wish to customize. BentoML
+   will then fill in the rest of the configuration with the default values.
 
+   In the example above, the number of API workers count is overridden to 4.
+   Remaining properties will take their defaults values.
+
+.. seealso::
+
+   :ref:`guides/configuration:Configuration fields`
 
 
 Overrding configuration with environment variables
@@ -63,25 +87,81 @@ Which the override configuration will be intepreted as:
       :alt: Configuration override environment variable
 
 
-Docker Deployment
------------------
+Mounting configuration to containerized Bento
+---------------------------------------------
+
+To mount a configuration file to a containerized BentoService, user can use the
+|volume_mount|_ option to mount the configuration file to the container and
+|env_flag|_ option to set the ``BENTOML_CONFIG`` environment variable:
+
+.. code-block:: bash
+
+   $ docker run --rm -v /path/to/configuration.yml:/home/bentoml/configuration.yml \
+                -e BENTOML_CONFIG=/home/bentoml/configuration.yml \
+                iris_classifier:6otbsmxzq6lwbgxi serve --production
+
+Voila! You have successfully mounted a configuration file to your containerized BentoService.
+
+.. _env_flag: https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file
+
+.. |env_flag| replace:: ``-e``
+
+.. _volume_mount: https://docs.docker.com/storage/volumes/#choose-the--v-or---mount-flag
+
+.. |volume_mount| replace:: ``-v``
+
+
+Configuration fields
+--------------------
+
+This section defines the configuration specs for BentoML.
+
+BentoML configuration provides a versioning specs, which enables users to easily specify
+and upgrade their configuration file as BentoML evolves. One can specify the version of
+the configuration file by adding a top level ``version`` field to ``bentoml_configuration.yaml``:
+
+.. code-block:: yaml
+   :caption: `~/bentoml_configuration.yaml`
+
+   version: 2
+
+.. epigraph::
+
+   Note that ``version`` is not a required field, and BentoML will default to version 1 if
+   it is not specified. This is mainly for backward compatibility with older configuration.
+   However, we encourage users to always use the latest version of BentoML to ensure the best experience.
+
+On the top level, BentoML configuration is split into two sections:
+
+* ``api_server``: Configuration for BentoML API server.
+
+* ``runners``: Configuration for BentoService runners.
+
+.. tab-set::
+
+   .. tab-item:: version 2
+      :sync: v2
+
+      .. include:: ./snippets/configuration/v2.rst
+
+   .. tab-item:: version 1
+      :sync: v1
+
+      .. include:: ./snippets/configuration/v1.rst
 
-Configuration file can be mounted to the Docker container using the `-v` option and specified to the BentoML 
-runtime using the `-e` environment variable option.
+.. dropdown:: `Expands for default configuration`
+   :icon: code
 
-.. code-block:: shell
+   .. tab-set::
 
-    $ docker run -v /local/path/configuration.yml:/home/bentoml/configuration.yml -e BENTOML_CONFIG=/home/bentoml/configuration.yml
+      .. tab-item:: version 2
+         :sync: v2
 
+         .. literalinclude:: ../../../bentoml/_internal/configuration/v2/defaults.yaml
+            :language: yaml
 
-.. spelling::
+      .. tab-item:: version 1
+         :sync: v1
 
-    customizations
-    microbatching
-    customizable
-    multiproc
-    dir
-    tls
-    apiserver
-    uri
-    gcs
+         .. literalinclude:: ../../../bentoml/_internal/configuration/v1/defaults.yaml
+            :language: yaml
diff --git a/docs/source/guides/grpc.rst b/docs/source/guides/grpc.rst
@@ -1342,6 +1342,7 @@ A quick overview of the available configuration for gRPC:
 ``max_concurrent_streams``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
+.. epigraph::
    :bdg-info:`Definition:` Maximum number of concurrent incoming streams to allow on a HTTP2 connection.
 
 By default we don't set a limit cap. HTTP/2 connections typically has limit of `maximum concurrent streams <httpwg.org/specs/rfc7540.html#rfc.section.5.1.2>`_
@@ -1370,6 +1371,7 @@ on a connection at one time.
 ``maximum_concurrent_rpcs``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
+.. epigraph::
    :bdg-info:`Definition:` The maximum number of concurrent RPCs this server will service before returning ``RESOURCE_EXHAUSTED`` status.
 
 By default we set to ``None`` to indicate no limit, and let gRPC to decide the limit.
@@ -1379,6 +1381,7 @@ By default we set to ``None`` to indicate no limit, and let gRPC to decide the l
 ``max_message_length``
 ^^^^^^^^^^^^^^^^^^^^^^
 
+.. epigraph::
    :bdg-info:`Definition:` The maximum message length in bytes allowed to be received on/can be send to the server.
 
 By default we set to ``-1`` to indicate no limit.

diff --git a/docs/source/guides/snippets/configuration/v1.rst b/docs/source/guides/snippets/configuration/v1.rst
diff --git a/docs/source/guides/snippets/configuration/v2.rst b/docs/source/guides/snippets/configuration/v2.rst
@@ -0,0 +1,11 @@
+``api_server``
+^^^^^^^^^^^^^^
+
+The following options are available for the ``api_server`` section:
+
++-------------+-------------------------------------+-----------------+
+| Option      | Description                         | Default         |
++-------------+-------------------------------------+-----------------+
+| ``workers`` | Number of API workers for to spawn  | None (which will be determined by BentoML)
++-------------+-------------------------------------+-----------------+
+``timeout``
diff --git a/docs/source/guides/snippets/configuration/v2/api_server.yaml b/docs/source/guides/snippets/configuration/v2/api_server.yaml
@@ -0,0 +1,98 @@
+api_server:
+  workers: ~ # cpu_count() will be used when null
+  timeout: 60
+  backlog: 2048
+  metrics:
+    enabled: true
+    namespace: bentoml_api_server
+    duration:
+      # https://github.com/prometheus/client_python/blob/f17a8361ad3ed5bc47f193ac03b00911120a8d81/prometheus_client/metrics.py#L544
+      buckets:
+        [
+          0.005,
+          0.01,
+          0.025,
+          0.05,
+          0.075,
+          0.1,
+          0.25,
+          0.5,
+          0.75,
+          1.0,
+          2.5,
+          5.0,
+          7.5,
+          10.0,
+        ]
+      min: ~
+      max: ~
+      factor: ~
+  logging:
+    access:
+      enabled: true
+      request_content_length: true
+      request_content_type: true
+      response_content_length: true
+      response_content_type: true
+      format:
+        trace_id: 032x
+        span_id: 016x
+  ssl:
+    enabled: false
+    certfile: ~
+    keyfile: ~
+    keyfile_password: ~
+    ca_certs: ~
+    version: 17 # ssl.PROTOCOL_TLS_SERVER
+    cert_reqs: 0 # ssl.CERT_NONE
+    ciphers: TLSv1 # default ciphers
+  http:
+    host: 0.0.0.0
+    port: 3000
+    cors:
+      enabled: false
+      allow_origin: ~
+      allow_credentials: ~
+      allow_methods: ~
+      allow_headers: ~
+      allow_origin_regex: ~
+      max_age: ~
+      expose_headers: ~
+  grpc:
+    host: 0.0.0.0
+    port: 3000
+    max_concurrent_streams: ~
+    maximum_concurrent_rpcs: ~
+    max_message_length: -1
+    reflection:
+      enabled: false
+    metrics:
+      host: 0.0.0.0
+      port: 3001
+  tracing:
+    exporter_type: ~
+    sample_rate: ~
+    excluded_urls: ~
+    timeout: ~
+    max_tag_value_length: ~
+    zipkin:
+      endpoint: ~
+    jaeger:
+      protocol: thrift
+      collector_endpoint: ~
+      thrift:
+        agent_host_name: ~
+        agent_port: ~
+        udp_split_oversized_batches: ~
+      grpc:
+        insecure: ~
+    otlp:
+      protocol: ~
+      endpoint: ~
+      compression: ~
+      http:
+        certificate_file: ~
+        headers: ~
+      grpc:
+        headers: ~
+        insecure: ~