making llm_max_batch_size and mllm_max_batch_size configurable #4077

SushantGautam · 2025-05-04T10:35:11Z

Describe the feature
Currently, llm_max_batch_size and mllm_max_batch_size are hardcoded as class variables in InferEngine. This limits flexibility and makes it difficult to adapt the engine to different model configurations or hardware constraints.

Proposed change:
Allow llm_max_batch_size and mllm_max_batch_size to be passed as optional arguments from CLI, with default values falling back to the current hardcoded ones if not provided.

and go through
engine = InferEngine(. .. llm_max_batch_size=2 * 1024 * 1024, mllm_max_batch_size=2048)

This change supports adaptive resource usage and better integration with different backend systems or use cases.

Makes infer for different deployment scenarios easier.

Additional context
This change supports better scalability and tuning for multi-modal LLM inference, especially useful in environments with memory constraints or throughput requirements.

Jintao-Huang added the bug Something isn't working label May 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

making llm_max_batch_size and mllm_max_batch_size configurable #4077

making llm_max_batch_size and mllm_max_batch_size configurable #4077

SushantGautam commented May 4, 2025

making llm_max_batch_size and mllm_max_batch_size configurable #4077

making llm_max_batch_size and mllm_max_batch_size configurable #4077

Comments

SushantGautam commented May 4, 2025