You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature
Currently, llm_max_batch_size and mllm_max_batch_size are hardcoded as class variables in InferEngine. This limits flexibility and makes it difficult to adapt the engine to different model configurations or hardware constraints.
Proposed change:
Allow llm_max_batch_size and mllm_max_batch_size to be passed as optional arguments from CLI, with default values falling back to the current hardcoded ones if not provided.
and go through
engine = InferEngine(. .. llm_max_batch_size=2 * 1024 * 1024, mllm_max_batch_size=2048)
This change supports adaptive resource usage and better integration with different backend systems or use cases.
Makes infer for different deployment scenarios easier.
Additional context
This change supports better scalability and tuning for multi-modal LLM inference, especially useful in environments with memory constraints or throughput requirements.
The text was updated successfully, but these errors were encountered:
Describe the feature
Currently, llm_max_batch_size and mllm_max_batch_size are hardcoded as class variables in InferEngine. This limits flexibility and makes it difficult to adapt the engine to different model configurations or hardware constraints.
Proposed change:
Allow llm_max_batch_size and mllm_max_batch_size to be passed as optional arguments from CLI, with default values falling back to the current hardcoded ones if not provided.
and go through
engine = InferEngine(. .. llm_max_batch_size=2 * 1024 * 1024, mllm_max_batch_size=2048)
This change supports adaptive resource usage and better integration with different backend systems or use cases.
Makes infer for different deployment scenarios easier.
Additional context
This change supports better scalability and tuning for multi-modal LLM inference, especially useful in environments with memory constraints or throughput requirements.
The text was updated successfully, but these errors were encountered: