Skip to content

feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alvidofaisal
Copy link

This commit introduces gRPC streaming capabilities to BentoML using a new v1alpha1 protocol version.

Key changes include:

  • gRPC Service Definition (.proto):

    • Added src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto defining a BentoService with a server-streaming CallStream RPC method.
  • Server Implementation:

    • Implemented the BentoService in src/bentoml/grpc/v1alpha1/server.py.
    • Integrated the v1alpha1 server logic into the existing gRPC server infrastructure by modifying src/bentoml/_internal/service/service.py and src/bentoml/_internal/server/grpc_app.py to handle the new protocol version.
    • Generated necessary gRPC stubs.
  • Client SDK:

    • Created src/bentoml/grpc/v1alpha1/client.py with BentoMlGrpcClient for easy interaction with the CallStream method. The client supports asynchronous streaming.
  • CLI Enhancements:

    • Verified that the existing bentoml serve-grpc command supports the v1alpha1 protocol via the --protocol-version flag.
    • Added a new command bentoml call-grpc-stream (implemented in src/bentoml_cli/call_grpc_stream.py) to invoke the streaming service from the CLI.
  • Benchmarking:

    • Introduced tests/benchmark/benchmark_streaming.py to compare the performance of gRPC streaming (v1alpha1) against a conceptual REST streaming equivalent. The script allows for configurable payload sizes and stream lengths.
  • Documentation and Examples:

    • Added a new documentation page docs/source/guides/grpc_streaming.md covering the definition, implementation, and usage of gRPC streaming.
    • Created a new example project examples/grpc_streaming/ demonstrating how to build and use a custom gRPC streaming service with BentoML, including its own .proto file, service implementation, and client example.
    • Updated docs/source/index.rst to include the new documentation.

This feature allows you to leverage gRPC for efficient, bi-directional streaming communication with your BentoML services, providing an alternative to traditional REST APIs for scenarios requiring low-latency, high-throughput streaming.

What does this PR address?

This PR introduces comprehensive gRPC streaming capabilities to BentoML, addressing the need for high-performance, low-latency streaming communication in machine learning services. The implementation provides:

  1. Server-side streaming support through a new v1alpha1 protocol version
  2. Complete client SDK for easy integration with streaming services
  3. CLI tools for testing and interacting with streaming endpoints
  4. Performance benchmarking tools to compare gRPC streaming vs REST alternatives
  5. Comprehensive documentation and examples to guide users in implementing streaming services

The feature is particularly valuable for use cases requiring:

  • Real-time inference with streaming inputs/outputs
  • Low-latency communication for interactive applications
  • High-throughput scenarios where gRPC's efficiency provides significant performance benefits
  • Bi-directional communication patterns not easily achievable with REST APIs

Before submitting:

This commit introduces gRPC streaming capabilities to BentoML using a new `v1alpha1` protocol version.

Key changes include:

- **gRPC Service Definition (`.proto`)**:
    - Added `src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto` defining a `BentoService` with a server-streaming `CallStream` RPC method.

- **Server Implementation**:
    - Implemented the `BentoService` in `src/bentoml/grpc/v1alpha1/server.py`.
    - Integrated the `v1alpha1` server logic into the existing gRPC server infrastructure by modifying `src/bentoml/_internal/service/service.py` and `src/bentoml/_internal/server/grpc_app.py` to handle the new protocol version.
    - Generated necessary gRPC stubs.

- **Client SDK**:
    - Created `src/bentoml/grpc/v1alpha1/client.py` with `BentoMlGrpcClient` for easy interaction with the `CallStream` method. The client supports asynchronous streaming.

- **CLI Enhancements**:
    - Verified that the existing `bentoml serve-grpc` command supports the `v1alpha1` protocol via the `--protocol-version` flag.
    - Added a new command `bentoml call-grpc-stream` (implemented in `src/bentoml_cli/call_grpc_stream.py`) to invoke the streaming service from the CLI.

- **Benchmarking**:
    - Introduced `tests/benchmark/benchmark_streaming.py` to compare the performance of gRPC streaming (`v1alpha1`) against a conceptual REST streaming equivalent. The script allows for configurable payload sizes and stream lengths.

- **Documentation and Examples**:
    - Added a new documentation page `docs/source/guides/grpc_streaming.md` covering the definition, implementation, and usage of gRPC streaming.
    - Created a new example project `examples/grpc_streaming/` demonstrating how to build and use a custom gRPC streaming service with BentoML, including its own `.proto` file, service implementation, and client example.
    - Updated `docs/source/index.rst` to include the new documentation.

This feature allows you to leverage gRPC for efficient, bi-directional streaming communication with your BentoML services, providing an alternative to traditional REST APIs for scenarios requiring low-latency, high-throughput streaming.
@alvidofaisal alvidofaisal requested a review from a team as a code owner May 31, 2025 14:23
@alvidofaisal alvidofaisal requested review from parano and removed request for a team May 31, 2025 14:23
Copy link
Contributor

hyperlint-ai bot commented May 31, 2025

PR Change Summary

Introduced gRPC v1alpha1 streaming support in BentoML, enhancing communication capabilities for machine learning services.

  • Added gRPC service definition for server-streaming support.
  • Implemented server-side logic and integrated with existing infrastructure.
  • Created a client SDK for easy interaction with streaming services.
  • Introduced CLI commands for testing and interacting with gRPC streaming.

Modified Files

  • docs/source/index.rst

Added Files

  • docs/source/guides/grpc_streaming.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant