feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit introduces gRPC streaming capabilities to BentoML using a new
v1alpha1
protocol version.Key changes include:
gRPC Service Definition (
.proto
):src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto
defining aBentoService
with a server-streamingCallStream
RPC method.Server Implementation:
BentoService
insrc/bentoml/grpc/v1alpha1/server.py
.v1alpha1
server logic into the existing gRPC server infrastructure by modifyingsrc/bentoml/_internal/service/service.py
andsrc/bentoml/_internal/server/grpc_app.py
to handle the new protocol version.Client SDK:
src/bentoml/grpc/v1alpha1/client.py
withBentoMlGrpcClient
for easy interaction with theCallStream
method. The client supports asynchronous streaming.CLI Enhancements:
bentoml serve-grpc
command supports thev1alpha1
protocol via the--protocol-version
flag.bentoml call-grpc-stream
(implemented insrc/bentoml_cli/call_grpc_stream.py
) to invoke the streaming service from the CLI.Benchmarking:
tests/benchmark/benchmark_streaming.py
to compare the performance of gRPC streaming (v1alpha1
) against a conceptual REST streaming equivalent. The script allows for configurable payload sizes and stream lengths.Documentation and Examples:
docs/source/guides/grpc_streaming.md
covering the definition, implementation, and usage of gRPC streaming.examples/grpc_streaming/
demonstrating how to build and use a custom gRPC streaming service with BentoML, including its own.proto
file, service implementation, and client example.docs/source/index.rst
to include the new documentation.This feature allows you to leverage gRPC for efficient, bi-directional streaming communication with your BentoML services, providing an alternative to traditional REST APIs for scenarios requiring low-latency, high-throughput streaming.
What does this PR address?
This PR introduces comprehensive gRPC streaming capabilities to BentoML, addressing the need for high-performance, low-latency streaming communication in machine learning services. The implementation provides:
v1alpha1
protocol versionThe feature is particularly valuable for use cases requiring:
Before submitting:
pre-commit run -a
script has passed (instructions)?