Skip to content

making llm_max_batch_size and mllm_max_batch_size configurable #4077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SushantGautam opened this issue May 4, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@SushantGautam
Copy link

Describe the feature
Currently, llm_max_batch_size and mllm_max_batch_size are hardcoded as class variables in InferEngine. This limits flexibility and makes it difficult to adapt the engine to different model configurations or hardware constraints.

Proposed change:
Allow llm_max_batch_size and mllm_max_batch_size to be passed as optional arguments from CLI, with default values falling back to the current hardcoded ones if not provided.

and go through
engine = InferEngine(. .. llm_max_batch_size=2 * 1024 * 1024, mllm_max_batch_size=2048)

This change supports adaptive resource usage and better integration with different backend systems or use cases.

Makes infer for different deployment scenarios easier.

Additional context
This change supports better scalability and tuning for multi-modal LLM inference, especially useful in environments with memory constraints or throughput requirements.

@Jintao-Huang Jintao-Huang added the bug Something isn't working label May 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants