This guide will demonstrate advanced features that BentoML offers for you to get started with gRPC:
- First-class support for
custom gRPC Servicer <guides/grpc:Mounting Servicer>
,custom interceptors <guides/grpc:Mounting gRPC Interceptors>
, handlers. - Seemlessly adding gRPC support to existing Bento.
This guide will also walk you through tradeoffs of serving with gRPC, as well as recommendation on scenarios where gRPC might be a good fit.
Requirements:
This guide assumes that you have basic knowledge of gRPC and protobuf. If you aren't familar with gRPC, you can start with gRPC quick start guide.
For quick introduction to serving with gRPC, see Intro to BentoML <tutorial:Tutorial: Intro to BentoML>
We will be using the example from the quickstart<tutorial:Tutorial: Intro to BentoML>
to demonstrate BentoML capabilities with gRPC.
BentoML supports for gRPC are introduced in version 1.0.6 and above.
Install BentoML with gRPC support with pip
:
» pip install -U "bentoml[grpc]"
Thats it! You can now serve your Bento with gRPC via bentoml serve-grpc <reference/cli:serve-grpc>
without having to modify your current service definition 😃.
» bentoml serve-grpc iris_classifier:latest --production
There are two ways to interact with your gRPC BentoService:
Use tools such as
fullstorydev/grpcurl
,fullstorydev/grpcui
: The server requiresreflection <grpc/grpc/blob/master/doc/server-reflection.md>
to be enabled for those tools to work. Pass in--enable-reflection
to enable reflection:» bentoml serve-grpc iris_classifier:latest --production --enable-reflection
Open a different terminal and use one of the following:
- Use one of the below
client implementations <guides/grpc:Client Implementation>
to send test requests to your BentoService.
Note
All of the following client implementations are available on GitHub <bentoml/BentoML/tree/main/grpc-client/>
.
<br />
From another terminal, use one of the following client implementation to send request to the gRPC server:
Note
gRPC comes with supports for multiple languages. In the upcoming sections we will demonstrate two workflows of generating stubs and implementing clients:
- Using bazel_ to manage and isolate dependencies (recommended)
- A manual approach using
protoc
its language-specific plugins
Python
We will create our Python client in the directory ~/workspace/iris_python_client/
:
» mkdir -p ~/workspace/iris_python_client
» cd ~/workspace/iris_python_client
Create a client.py
file with the following content:
../../../grpc-client/python/client.py
Go
Requirements:
Make sure to install the prerequisites before using Go.
We will create our Golang client in the directory ~/workspace/iris_go_client/
:
» mkdir -p ~/workspace/iris_go_client
» cd ~/workspace/iris_go_client
Using bazel (recommended)
Define a WORKSPACE
_ file:
WORKSPACE
./snippets/grpc/go/WORKSPACE.snippet.bzl
Followed by defining a BUILD
_ file:
BUILD
./snippets/grpc/go/BUILD.snippet.bzl
Using protoc and language-specific plugins
Create a Go module:
» go mod init iris_go_client && go mod tidy
Add the following lines to ~/workspace/iris_go_client/go.mod
:
require github.com/bentoml/bentoml/grpc/v1alpha1 v0.0.0-unpublished
replace github.com/bentoml/bentoml/grpc/v1alpha1 v0.0.0-unpublished => ./github.com/bentoml/bentoml/grpc/v1alpha1
By using replace directive, we ensure that Go will know where our generated stubs to be imported from. (since we don't host the generate gRPC stubs on pkg.go.dev 😄)
Here is the protoc
command to generate the gRPC Go stubs:
» protoc -I. -I thirdparty/protobuf/src \
--go_out=. --go_opt=paths=import \
--go-grpc_out=. --go-grpc_opt=paths=import \
bentoml/grpc/v1alpha1/service.proto
Then run the following to make sure the generated stubs are importable:
» pushd github.com/bentoml/bentoml/grpc/v1alpha1
» go mod init v1alpha1 && go mod tidy
» popd
Create a client.go
file with the following content:
../../../grpc-client/go/client.go
C++
Requirements:
Make sure follow the instructions to install gRPC and Protobuf locally.
We will create our C++ client in the directory ~/workspace/iris_cc_client/
:
» mkdir -p ~/workspace/iris_cc_client
» cd ~/workspace/iris_cc_client
Using bazel (recommended)
Define a WORKSPACE
_ file:
WORKSPACE
./snippets/grpc/cpp/WORKSPACE.snippet.bzl
Followed by defining a BUILD
_ file:
BUILD
./snippets/grpc/cpp/BUILD.snippet.bzl
Using protoc and language-specific plugins
Here is the protoc
command to generate the gRPC C++ stubs:
» protoc -I . -I ./thirdparty/protobuf/src \
--cpp_out=. --grpc_out=. \
--plugin=protoc-gen-grpc=$(which grpc_cpp_plugin) \
bentoml/grpc/v1alpha1/service.proto
Create a client.cpp
file with the following content:
../../../grpc-client/cpp/client.cc
Java
Requirements:
Make sure to have JDK>=7.
Optional:
follow the instructions <grpc/grpc-java/tree/master/compiler>
to install protoc
plugin for gRPC Java if you plan to use protoc
standalone.
Note
Feel free to use any Java build tools of choice (Maven, Gradle, Bazel, etc.) to build and run the client you find fit.
In this tutorial we will be using bazel_.
We will create our Java client in the directory ~/workspace/iris_java_client/
:
» mkdir -p ~/workspace/iris_java_client
» cd ~/workspace/iris_java_client
Create the client Java package (com.client.BentoServiceClient
):
» mkdir -p src/main/java/com/client
Using bazel (recommended)
Define a WORKSPACE
_ file:
WORKSPACE
./snippets/grpc/java/WORKSPACE.snippet.bzl
Followed by defining a BUILD
_ file:
BUILD
./snippets/grpc/java/BUILD.snippet.bzl
Using others build system
One simply can't manually running javac
to compile the Java class, since there are way too many dependencies to be resolved.
Provided below is an example of how one can use gradle to build the Java client.
» gradle init --project-dir .
The following build.gradle
should be able to help you get started:
../../../grpc-client/java/build.gradle
To build the client, run:
» ./gradlew build
Proceed to create a src/main/java/com/client/BentoServiceClient.java
file with the following content:
../../../grpc-client/java/src/main/java/com/client/BentoServiceClient.java
On running protoc
standalone (optional)
Here is the protoc
command to generate the gRPC Java stubs if you need to use protoc
standalone:
» protoc -I . \
-I ./thirdparty/protobuf/src \
--java_out=./src/main/java \
--grpc-java_out=./src/main/java \
bentoml/grpc/v1alpha1/service.proto
Kotlin
Requirements:
Make sure to have the prequisites to get started with grpc/grpc-kotlin
.
Optional:
feel free to install Kotlin gRPC codegen <grpc/grpc-kotlin/blob/master/compiler/README.md>
in order to generate gRPC stubs if you plan to use protoc
standalone.
To bootstrap the Kotlin client, feel free to use either gradle or maven to build and run the following client code.
In this example, we will use bazel_ to build and run the client.
We will create our Kotlin client in the directory ~/workspace/iris_kotlin_client/
, followed by creating the client directory structure:
» mkdir -p ~/workspace/iris_kotlin_client
» cd ~/workspace/iris_kotlin_client
» mkdir -p src/main/kotlin/com/client
Using bazel (recommended)
Define a WORKSPACE
_ file:
WORKSPACE
./snippets/grpc/kotlin/WORKSPACE.snippet.bzl
Followed by defining a BUILD
_ file:
BUILD
./snippets/grpc/kotlin/BUILD.snippet.bzl
Using others build system
One simply can't manually compile all the Kotlin files, since there are way too many dependencies to be resolved.
Provided below is an example of how one can use gradle to build the Kotlin client.
» gradle init --project-dir .
The following build.gradle.kts
should be able to help you get started:
../../../grpc-client/kotlin/build.gradle.kts
To build the client, run:
» ./gradlew build
Proceed to create a src/main/kotlin/com/client/BentoServiceClient.kt
file with the following content:
../../../grpc-client/kotlin/src/main/kotlin/com/client/BentoServiceClient.kt
On running protoc
standalone (optional)
Here is the protoc
command to generate the gRPC Kotlin stubs if you need to use protoc
standalone:
» protoc -I. -I ./thirdparty/protobuf/src \
--kotlin_out ./kotlin/src/main/kotlin/ \
--grpc-kotlin_out ./kotlin/src/main/kotlin \
--plugin=protoc-gen-grpc-kotlin=$(which protoc-gen-grpc-kotlin) \
bentoml/grpc/v1alpha1/service.proto
Node.js
Requirements:
Make sure to have Node.js installed in your system.
We will create our Node.js client in the directory ~/workspace/iris_node_client/
:
» mkdir -p ~/workspace/iris_node_client
» cd ~/workspace/iris_node_client
Initialize the project and use the following package.json
:
../../../grpc-client/node/package.json
Install the dependencies with either npm
or yarn
:
» yarn install --add-devs
Note
If you are using M1, you might also have to prepend npm_config_target_arch=x64
to yarn
command:
» npm_config_target_arch=x64 yarn install --add-devs
Here is the protoc
command to generate the gRPC Javascript stubs:
» $(npm bin)/grpc_tools_node_protoc \
-I . -I ./thirdparty/protobuf/src \
--js_out=import_style=commonjs,binary:. \
--grpc_out=grpc_js:js \
bentoml/grpc/v1alpha1/service.proto
Proceed to create a client.js
file with the following content:
../../../grpc-client/node/client.js
Swift
Requirements:
Make sure to have the prequisites <grpc/grpc-swift/blob/main/docs/quick-start.md#prerequisites>
to get started with grpc/grpc-swift
.
We will create our Swift client in the directory ~/workspace/iris_swift_client/
:
» mkdir -p ~/workspace/iris_swift_client
» cd ~/workspace/iris_swift_client
We will use Swift Package Manager to build and run the client.
» swift package init --type executable
Initialize the project and use the following Package.swift
:
../../../grpc-client/swift/Package.swift
Here is the protoc
command to generate the gRPC Swift stubs:
» protoc -I. -I ./thirdparty/protobuf/src \
--swift_out=Sources --swift_opt=Visibility=Public \
--grpc-swift_out=Sources --grpc-swift_opt=Visibility=Public \
--plugin=protoc-gen-grpc-swift=$(which protoc-gen-grpc-swift) \
bentoml/grpc/v1alpha1/service.proto
Proceed to create a Sources/BentoServiceClient/main.swift
file with the following content:
../../../grpc-client/swift/Sources/BentoServiceClient/main.swift
PHP
Requirements:
Make sure to follow the instructions <grpc/grpc/blob/master/src/php/README.md>
to install grpc
via either pecl or from source.
Note
You will also have to symlink the built C++ extension to the PHP extension directory for it to be loaded by PHP.
We will then use bazel_, composer to build and run the client.
We will create our PHP client in the directory ~/workspace/iris_php_client/
:
» mkdir -p ~/workspace/iris_php_client
» cd ~/workspace/iris_php_client
Create a new PHP package:
» composer init
An example composer.json
for the client:
../../../grpc-client/php/composer.json
Here is the protoc
command to generate the gRPC swift stubs:
» protoc -I . -I ./thirdparty/protobuf/src \
--php_out=. \
--grpc_out=. \
--plugin=protoc-gen-grpc=$(which grpc_php_plugin) \
bentoml/grpc/v1alpha1/service.proto
Proceed to create a BentoServiceClient.php
file with the following content:
../../../grpc-client/php/BentoServiceClient.php
Bazel instruction for swift
, nodejs
, python
<br />
Then you can proceed to run the client scripts:
Python
» python -m client
Go
Using bazel (recommended)
» bazel run //:client_go
Using protoc and language-specific plugins
» go run ./client.go
C++
Using bazel (recommended)
» bazel run :client_cc
Using protoc and language-specific plugins
Refer to grpc/grpc
for instructions on using CMake and other similar build tools.
Note
See the instructions on GitHub <bentoml/BentoML/tree/main/grpc-client/README.md>
for working C++ client.
Java
Using bazel (recommended)
» bazel run :client_java
Using others build system
We will use gradlew
to build the client and run it:
» ./gradlew build && \
./build/tmp/scripts/bentoServiceClient/bento-service-client
Note
See the instructions on GitHub <bentoml/BentoML/tree/main/grpc-client/README.md>
for working Java client.
Kotlin
Using bazel (recommended)
» bazel run :client_kt
Using others build system
We will use gradlew
to build the client and run it:
» ./gradlew build && \
./build/tmp/scripts/bentoServiceClient/bento-service-client
Note
See the instructions on GitHub <bentoml/BentoML/tree/main/grpc-client/README.md>
for working Kotlin client.
Node.js
» node client.js
Swift
» swift run BentoServiceClient
PHP
» php -d extension=/path/to/grpc.so -d max_execution_time=300 BentoServiceClient.php
Additional language support for client implementation
Ruby
Note:
Please check out the gRPC Ruby <grpc/grpc/blob/master/src/ruby/README.md#grpc-ruby>
for how to install from source. Check out the examples folder <grpc/grpc/blob/master/examples/ruby/README.md#prerequisites>
for Ruby client implementation.
.NET
Note:
Please check out the gRPC .NET <grpc/grpc-dotnet/tree/master/examples>
examples folder for grpc/grpc-dotnet
client implementation.
Dart
Note:
Please check out the gRPC Dart <grpc/grpc-dart/tree/master/examples>
examples folder for grpc/grpc-dart
client implementation.
Rust
Note:
Currently there are no official gRPC Rust client implementation. Please check out the tikv/grpc-rs
as one of the unofficial implementation.
After successfully running the client, proceed to build the bento as usual:
» bentoml build
<br />
To containerize the Bento with gRPC features, pass in --enable-features=grpc
to bentoml containerize <reference/cli:containerize>
to add additional gRPC dependencies to your Bento
» bentoml containerize iris_classifier:latest --enable-features=grpc
--enable-features
allows users to containerize any of the existing Bentos with additional features <concepts/bento:Enable features for your Bento>
that BentoML provides without having to rebuild the Bento.
Note
--enable-features
accepts a comma-separated list of features or multiple arguments.
After containerization, your Bento container can now be used with gRPC:
» docker run -it --rm \
-p 3000:3000 -p 3001:3001 \
iris_classifier:6otbsmxzq6lwbgxi serve-grpc --production
Congratulations! You have successfully served, containerized and tested your BentoService with gRPC.
We will dive into some of the details of how gRPC is implemented in BentoML.
Let's take a quick look at protobuf definition of the BentoService:
service BentoService {
rpc Call(Request) returns (Response) {}
}
Expands for current protobuf definition.
v1alpha1
../../../src/bentoml/grpc/v1alpha1/service.proto
As you can see, BentoService defines a simple rpc Call
that sends a Request
message and returns a Response
message.
A Request
message takes in:
- `api_name`: the name of the API function defined inside your BentoService.
- oneof `content`: the field can be one of the following types:
Protobuf definition | IO Descriptor |
guides/grpc:Array representation via NDArray`` |
bentoml.io.NumpyNdarray <reference/api_io_descriptors:NumPy \`\`ndarray\`\`> |
guides/grpc:Tabular data representation via DataFrame`` |
bentoml.io.PandasDataFrame <reference/api_io_descriptors:Tabular Data with Pandas> |
guides/grpc:Series representation via Series`` |
bentoml.io.PandasDataFrame <reference/api_io_descriptors:Tabular Data with Pandas> |
guides/grpc:File-like object via File`` |
bentoml.io.File <reference/api_io_descriptors:Files> |
google.protobuf.StringValue _ |
bentoml.io.Text <reference/api_io_descriptors:Texts> |
google.protobuf.Value _ |
bentoml.io.JSON <reference/api_io_descriptors:Structured Data with JSON> |
guides/grpc:Complex payload via Multipart`` |
bentoml.io.Multipart <reference/api_io_descriptors:Multipart Payloads> |
guides/grpc:Compact data format via serialized_bytes`` |
(See below) |
Note
Series
is currently not yet supported.
The Response
message will then return one of the aforementioned types as result.
<br />
Example:
In the quickstart guide<tutorial:Creating a Service>
, we defined a classify
API that takes in a bentoml.io.NumpyNdarray <reference/api_io_descriptors:NumPy \`\`ndarray\`\`>
.
Therefore, our Request
message would have the following structure:
Python
./snippets/grpc/python/request.py
Go
./snippets/grpc/go/request.go
C++
./snippets/grpc/cpp/request.cc
Java
./snippets/grpc/java/Request.java
Kotlin
./snippets/grpc/kotlin/Request.kt
Node.js
./snippets/grpc/node/request.js
Swift
./snippets/grpc/swift/Request.swift
Description:
NDArray
represents a flattened n-dimensional array of arbitrary type. It accepts the following fields:
dtype
The data type of given input. This is a Enum field that provides 1-1 mapping with Protobuf data types to NumPy data types:
pb.NDArray.DType numpy.dtype Enum value DTYPE_UNSPECIFIED
None
0 DTYPE_FLOAT
np.float
1 DTYPE_DOUBLE
np.double
2 DTYPE_BOOL
np.bool_
3 DTYPE_INT32
np.int32
4 DTYPE_INT64
np.int64
5 DTYPE_UINT32
np.uint32
6 DTYPE_UINT64
np.uint64
7 DTYPE_STRING
np.str_
8 shape
A list of int32 that represents the shape of the flattened array. the
bentoml.io.NumpyNdarray <reference/api_io_descriptors:NumPy \`\`ndarray\`\`>
will then reshape the given payload into expected shape.Note that this value will always takes precendence over the
shape
field in thebentoml.io.NumpyNdarray <reference/api_io_descriptors:NumPy \`\`ndarray\`\`>
descriptor, meaning the array will be reshaped to this value first if given. Refer tobentoml.io.NumpyNdarray.from_proto
for implementation details.string_values, float_values, double_values, bool_values, int32_values, int64_values, uint32_values, unit64_values
Each of the fields is a list of the corresponding data type. The list is a flattened array, and will be reconstructed alongside with
shape
field to the original payload.Per request sent, one message should only contain ONE of the aforementioned fields.
The interaction among the above fields and
dtype
are as follows:- if
dtype
is not present in the message: - All of the fields are empty, then we return a
np.empty
. We will loop through all of the provided fields, and only allows one field per message.
If here are more than one field (i.e.
string_values
andfloat_values
), then we will raise an error, as we don't know how to deserialize the data.
- All of the fields are empty, then we return a
- if
- otherwise:
- We will use the provided dtype-to-field map to get the data from the given message.
DType field DTYPE_BOOL
bool_values
DTYPE_DOUBLE
double_values
DTYPE_FLOAT
float_values
DTYPE_INT32
int32_values
DTYPE_INT64
int64_values
DTYPE_STRING
string_values
DTYPE_UINT32
uint32_values
DTYPE_UINT64
uint64_values
For example, if
dtype
isDTYPE_FLOAT
, then the payload expects to havefloat_values
field.
2
Python API
NumpyNdarray.from_sample(
np.array([[5.4, 3.4, 1.5, 0.4]])
)
pb.NDArray
ndarray {
dtype: DTYPE_FLOAT
shape: 1
shape: 4
float_values: 5.4
float_values: 3.4
float_values: 1.5
float_values: 0.4
}
API reference:
bentoml.io.NumpyNdarray.from_proto
<br />
Description:
DataFrame
represents any tabular data type. Currently we only support the columns orientation since it is best for preserving the input order.
It accepts the following fields:
column_names
A list of string that represents the column names of the given tabular data.
column_values
A list of Series where Series represents a series of arbitrary data type. The allowed fields for Series as similar to the ones in `NDArray`:
- one of [string_values, float_values, double_values, bool_values, int32_values, int64_values, uint32_values, unit64_values]
2
Python API
PandasDataFrame.from_sample(
pd.DataFrame({
"age": [3, 29],
"height": [94, 170],
"weight": [31, 115]
}),
orient="columns",
)
pb.DataFrame
dataframe {
column_names: "age"
column_names: "height"
column_names: "weight"
columns {
int32_values: 3
int32_values: 29
}
columns {
int32_values: 40
int32_values: 190
}
columns {
int32_values: 140
int32_values: 178
}
}
API reference:
bentoml.io.PandasDataFrame.from_proto
Description:
Series
portrays a series of values. This can be used for representing Series types in tabular data.
It accepts the following fields:
string_values, float_values, double_values, bool_values, int32_values, int64_values
Similar to NumpyNdarray, each of the fields is a list of the corresponding data type. The list is a 1-D array, and will be then pass to
pd.Series
.Each request should only contain ONE of the aforementioned fields.
The interaction among the above fields and
dtype
fromPandasSeries
are as follows:- if
dtype
is not present in the descriptor: - All of the fields are empty, then we return an empty
pd.Series
. We will loop through all of the provided fields, and only allows one field per message.
If here are more than one field (i.e.
string_values
andfloat_values
), then we will raise an error, as we don't know how to deserialize the data.
- All of the fields are empty, then we return an empty
- if
- otherwise:
- We will use the provided dtype-to-field map to get the data from the given message.
2
Python API
PandasSeries.from_sample([5.4, 3.4, 1.5, 0.4])
pb.Series
series {
float_values: 5.4
float_values: 3.4
float_values: 1.5
float_values: 0.4
}
API reference:
bentoml.io.PandasSeries.from_proto
<br />
Description:
File
represents any arbitrary file type. this can be used to send in any file type, including images, videos, audio, etc.
Note
Currently both bentoml.io.File
and bentoml.io.Image
are using pb.File
It accepts the following fields:
content
A bytes field that represents the content of the file.
- Document
kind
once enum was dropped. - Demonstrate python API to protobuf representation
Description:
Multipart
represents a complex payload that can contain multiple different fields. It takes a fields
, which is a dictionary of input name to its coresponding bentoml.io.IODescriptor
2
Python API
Multipart(
meta=Text(),
arr=NumpyNdarray(
dtype=np.float16,
shape=[2,2]
)
)
pb.Multipart
multipart {
fields {
key: "arr"
value {
ndarray {
dtype: DTYPE_FLOAT
shape: 2
shape: 2
float_values: 1.0
float_values: 2.0
float_values: 3.0
float_values: 4.0
}
}
}
fields {
key: "meta"
value {
text {
value: "nlp"
}
}
}
}
API reference:
bentoml.io.Multipart.from_proto
The serialized_bytes
field in both Request
and Response
is reserved for pre-established protocol encoding between client and server.
BentoML leverages the field to improve serialization performance between BentoML client and server. Thus the field is not recommended for use directly.
gRPC service multiplexing <guides/grpc:Demystifying the misconception of gRPC vs. REST>
enables us to mount additional custom servicers alongside with BentoService, and serve them under the same port.
import route_guide_pb2
import route_guide_pb2_grpc
from servicer_impl import RouteGuideServicer
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
services_name = [
v.full_name for v in route_guide_pb2.DESCRIPTOR.services_by_name.values()
]
svc.mount_grpc_servicer(
RouteGuideServicer,
add_servicer_fn=add_RouteGuideServicer_to_server,
service_names=services_name,
)
Serve your service with bentoml serve-grpc <reference/cli:serve-grpc>
command:
» bentoml serve-grpc service.py:svc --reload --enable-reflection
Now your RouteGuide
service can also be accessed through localhost:3000
.
Note
service_names
is REQUIRED here, as this will be used for server reflection <grpc/grpc/blob/master/doc/server-reflection.md>
when --enable-reflection
is passed to bentoml serve-grpc
.
Inteceptors are a component of gRPC that allows us to intercept and interact with the proto message and service context either before - or after - the actual RPC call was sent/received by client/server.
Interceptors to gRPC is what middleware is to HTTP. The most common use-case for interceptors are authentication, tracing <guides/tracing:Tracing>
, access logs, and more.
BentoML comes with a sets of built-in async interceptors to provide support for access logs, OpenTelemetry, and Prometheus.
The following diagrams demonstrates the flow of a gRPC request from client to server:
Since interceptors are executed in the order they are added, users interceptors will be executed after the built-in interceptors.
Users interceptors shouldn't modify the existing headers and data of the incoming
Request
.
BentoML currently only support async interceptors (via grpc.aio.ServerInterceptor, as opposed to grpc.ServerInterceptor). This is because BentoML gRPC server is an async implementation of gRPC server.
Note
If you are using grpc.ServerInterceptor
, you will need to migrate it over to use the new grpc.aio.ServerInterceptor
in order to use this feature.
Feel free to reach out to us at #support on Slack
A toy implementation AppendMetadataInterceptor
from __future__ import annotations
import typing as t
import functools
import dataclasses
from typing import TYPE_CHECKING
from grpc import aio
if TYPE_CHECKING:
from bentoml.grpc.types import Request
from bentoml.grpc.types import Response
from bentoml.grpc.types import RpcMethodHandler
from bentoml.grpc.types import AsyncHandlerMethod
from bentoml.grpc.types import HandlerCallDetails
from bentoml.grpc.types import BentoServicerContext
@dataclasses.dataclass
class Context:
usage: str
accuracy_score: float
class AppendMetadataInterceptor(aio.ServerInterceptor):
def __init__(self, *, usage: str, accuracy_score: float) -> None:
self.context = Context(usage=usage, accuracy_score=accuracy_score)
self._record: set[str] = set()
async def intercept_service(
self,
continuation: t.Callable[[HandlerCallDetails], t.Awaitable[RpcMethodHandler]],
handler_call_details: HandlerCallDetails,
) -> RpcMethodHandler:
from bentoml.grpc.utils import wrap_rpc_handler
handler = await continuation(handler_call_details)
if handler and (handler.response_streaming or handler.request_streaming):
return handler
def wrapper(behaviour: AsyncHandlerMethod[Response]):
@functools.wraps(behaviour)
async def new_behaviour(
request: Request, context: BentoServicerContext
) -> Response | t.Awaitable[Response]:
self._record.update(
{f"{self.context.usage}:{self.context.accuracy_score}"}
)
resp = await behaviour(request, context)
context.set_trailing_metadata(
tuple(
[
(k, str(v).encode("utf-8"))
for k, v in dataclasses.asdict(self.context).items()
]
)
)
return resp
return new_behaviour
return wrap_rpc_handler(wrapper, handler)
To add your intercptors to existing BentoService, use svc.add_grpc_interceptor
:
from custom_interceptor import CustomInterceptor
svc.add_grpc_interceptor(CustomInterceptor)
Note
add_grpc_interceptor
also supports partial class as well as multiple arguments interceptors:
multiple arguments
from metadata_interceptor import AppendMetadataInterceptor
svc.add_grpc_interceptor(AppendMetadataInterceptor, usage="NLP", accuracy_score=0.867)
partial method
from functools import partial
from metadata_interceptor import AppendMetadataInterceptor
svc.add_grpc_interceptor(partial(AppendMetadataInterceptor, usage="NLP", accuracy_score=0.867))
gRPC is designed to be high performance framework for inter-service communications. This means that it is a perfect fit for building microservices. The following are some recommendation we have for using gRPC for model serving:
<br />
You might stumble upon articles comparing gRPC to REST, and you might get the impression that gRPC is a better choice than REST when building services. This is not entirely true.
gRPC is built on top of HTTP/2, and it addresses some of the shortcomings of HTTP/1.1, such as head-of-line blocking <Head-of-line_blocking>
, and HTTP pipelining <HTTP_pipelining>
. However, gRPC is not a replacement for REST, and indeed it is not a replacement for model serving. gRPC comes with its own set of trade-offs, such as:
- Limited browser support: It is impossible to call a gRPC service directly from any browser. You will end up using tools such as
gRPCUI <fullstorydev/grpcui>
in order to interact with your service, or having to go through the hassle of implementing a gRPC client in your language of choice. - Binary protocol format: While
Protobuf <protocolbuffers/protobuf>
is efficient to send and receive over the wire, it is not human-readable. This means additional toolin for debugging and analyzing protobuf messages are required. - Knowledge gap: gRPC comes with its own concepts and learning curve, which requires teams to invest time in filling those knowledge gap to be effectively use gRPC. This often leads to a lot of friction and sometimes increase friction to the development agility.
- Lack of support for additional content types: gRPC depends on protobuf, its content type are restrictive, in comparison to out-of-the-box support from HTTP+REST.
gRPC on HTTP/2 dives into how gRPC is built on top of HTTP/2, and this article goes into more details on how HTTP/2 address the problem from HTTP/1.1
For HTTP/2 specification, see RFC 7540.
<br />
Yes and no.
If your organization is already using gRPC for inter-service communications, using your Bento with gRPC is a no-brainer. You will be able to seemlessly integrate your Bento with your existing gRPC services without having to worry about the overhead of implementing grpc-gateway <grpc-ecosystem/grpc-gateway>
.
However, if your organization is not using gRPC, we recommend to keep using REST for model serving. This is because REST is a well-known and well-understood protocol, meaning there is no knowledge gap for your team, which will increase developer agility, and faster go-to-market strategy.
<br />
BentoML allows user to tune the performance of gRPC via bentoml_configuration.yaml <guides/configuration:Configuration>
via api_server.grpc
.
A quick overview of the available configuration for gRPC:
api_server:
grpc:
host: 0.0.0.0
port: 3000
max_concurrent_streams: ~
maximum_concurrent_rpcs: ~
max_message_length: -1
reflection:
enabled: false
metrics:
host: 0.0.0.0
port: 3001
<br />
Definition:
Maximum number of concurrent incoming streams to allow on a HTTP2 connection.
By default we don't set a limit cap. HTTP/2 connections typically has limit of maximum concurrent streams on a connection at one time.
Some notes about fine-tuning max_concurrent_streams
Note that a gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on said connection. When the number of active calls reaches the connection stream limit, any additional calls are queued to the client. Queued calls then wait for active calls to complete before being sent. This means that application will higher load and long running streams could see a performance degradation caused by queuing because of the limit.
Setting a limit cap on the number of concurrent streams will prevent this from happening, but it also means that you need to tune the limit cap to the right number.
- If the limit cap is too low, you will sooner or later running into the issue mentioned above.
- Not setting a limit cap are also NOT RECOMMENDED. Too many streams on a single HTTP/2 connection introduces thread contention between streams trying to write to the connection, packet loss which causes all call to be blocked.
Remarks:
We recommend you to play around with the limit cap, starting with 100, and increase if needed.
<br />
Definition:
The maximum number of concurrent RPCs this server will service before returningRESOURCE_EXHAUSTED
status.
By default we set to None
to indicate no limit, and let gRPC to decide the limit.
<br />
Definition:
The maximum message length in bytes allowed to be received on/can be send to the server.
By default we set to -1
to indicate no limit. Message size limits via this options is a way to prevent gRPC from consuming excessive resources. By default, gRPC uses per-message limits to manage inbound and outbound message.
Some notes about fine-tuning max_message_length
This options sets two values: grpc.max_receive_message_length <grpc/grpc/blob/e8df8185e521b518a8f608b8a5cf98571e2d0925/include/grpc/impl/codegen/grpc_types.h#L153>
and grpc.max_send_message_length <grpc/grpc/blob/e8df8185e521b518a8f608b8a5cf98571e2d0925/include/grpc/impl/codegen/grpc_types.h#L159>
.
#define GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH "grpc.max_receive_message_length"
#define GRPC_ARG_MAX_SEND_MESSAGE_LENGTH "grpc.max_send_message_length"
By default, gRPC sets incoming message to be 4MB, and no limit on outgoing message. We recommend you to only set this option if you want to limit the size of outcoming message. Otherwise, you should let gRPC to determine the limit.
We recommend you to also check out gRPC performance best practice to learn about best practice for gRPC.