[MLI-4966] Launch support multiple routes passthrough #722

meher-m · 2025-10-14T13:59:24Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

Manual Test

# in one tmux, start server
python examples/multi_route_fastapi_server.py

# in another tmux
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ python3 -c "                                       
# Test the new route fields
from llmengine.data_types.model_endpoints import CreateLLMEndpointRequest
from llmengine.data_types.core import LLMSource, LLMInferenceFramework

# Create request with new multi-route fields
request = CreateLLMEndpointRequest(
    name='test-multi-route',
    model_name='llama-2-7b',
    metadata={},
    min_workers=1,
    max_workers=2,
    per_worker=10,
    labels={'test': 'multi-route'},
    routes=['/v1/chat/completions', '/analyze'],
    forwarder_type='passthrough'
)

print('✅ New route fields work!')
print(f'Routes: {request.routes}')
print(f'Forwarder type: {request.forwarder_type}')
"
A newer version (0.0.0b45) of 'scale-llm-engine' is available. Please upgrade!
To upgrade, run: pip install --upgrade scale-llm-engine
Don't want to see this message? Set the environment variable 'LLM_ENGINE_DISABLE_VERSION_CHECK' to 'true'.
✅ New route fields work!
Routes: ['/v1/chat/completions', '/analyze']
Forwarder type: passthrough

Bigger Test

# start the server
python examples/multi_route_fastapi_server.py

# test that you can hit it
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -s http://localhost:5005/health
{"status":"healthy","routes":["/predict","/v1/chat/completions","/v1/completions","/analyze","/custom/endpoint"]}%  

# start the forwarder
GIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \
    --config model_engine_server/inference/configs/service--http_forwarder.yaml \
    --port 5001 \
    --num-workers 1 \
    --set "forwarder.sync.extra_routes=['/v1/chat/completions','/v1/completions','/analyze','/custom/endpoint']" \
    --set "forwarder.stream.extra_routes=['/v1/chat/completions','/v1/completions','/analyze','/custom/endpoint']" \
    --set "forwarder.sync.healthcheck_route=/health" \
    --set "forwarder.stream.healthcheck_route=/health"

# Test a few of the routes

(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -X POST localhost:5001/predict -H "Content-Type: application/json" -d '{"args": {"text": "Hello world", "model": "test"}}'
{"result":"{\"result\": \"Processed text: Hello world\", \"model\": \"test\", \"route\": \"/predict\"}"}%                                                                                                                   
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -X POST localhost:5001/v1/chat/completions -H "Content-Type: application/json" -d '{"args": {"messages": [{"role": "user", "content": "Hello, how are you?"}], "model": "gpt-3.5-turbo"}}'
{"result":"{\"choices\": [{\"message\": {\"role\": \"assistant\", \"content\": \"Echo: Hello, how are you?\"}, \"finish_reason\": \"stop\", \"index\": 0}], \"model\": \"gpt-3.5-turbo\", \"usage\": {\"prompt_tokens\": 4, \"completion_tokens\": 5, \"total_tokens\": 9}}"}%                                                                                                                                                                          
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -X POST localhost:5001/analyze -H "Content-Type: application/json" -d '{"args": {"text": "This is a good example of multi-route functionality working perfectly!"}}'
{"result":"{\"analysis\": {\"word_count\": 10, \"char_count\": 70, \"sentiment\": \"positive\"}, \"text\": \"This is a good example of multi-route functionality working perfectly!\", \"route\": \"/analyze\"}"}%

Linear Ticket

https://linear.app/scale-epd/issue/MLI-4966/support-multiple-routes-passthrough-on-launch

initial code, cursor

b1dee6b

meher-m self-assigned this Oct 14, 2025

reformat

e9de35b

meher-m changed the title ~~Launch support multiple routes passthrough~~ [MLI-4966] Launch support multiple routes passthrough Oct 14, 2025

meher-m requested a review from dmchoiboi October 14, 2025 15:47

reformat isort

91b2006

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLI-4966] Launch support multiple routes passthrough #722

[MLI-4966] Launch support multiple routes passthrough #722

meher-m commented Oct 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[MLI-4966] Launch support multiple routes passthrough #722

Are you sure you want to change the base?

[MLI-4966] Launch support multiple routes passthrough #722

Conversation

meher-m commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Manual Test

Bigger Test

Linear Ticket

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

meher-m commented Oct 14, 2025 •

edited

Loading