Skip to content

Conversation

meher-m
Copy link
Collaborator

@meher-m meher-m commented Oct 14, 2025

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

Manual Test

# in one tmux, start server
python examples/multi_route_fastapi_server.py

# in another tmux
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ python3 -c "                                       
# Test the new route fields
from llmengine.data_types.model_endpoints import CreateLLMEndpointRequest
from llmengine.data_types.core import LLMSource, LLMInferenceFramework

# Create request with new multi-route fields
request = CreateLLMEndpointRequest(
    name='test-multi-route',
    model_name='llama-2-7b',
    metadata={},
    min_workers=1,
    max_workers=2,
    per_worker=10,
    labels={'test': 'multi-route'},
    routes=['/v1/chat/completions', '/analyze'],
    forwarder_type='passthrough'
)

print('✅ New route fields work!')
print(f'Routes: {request.routes}')
print(f'Forwarder type: {request.forwarder_type}')
"
A newer version (0.0.0b45) of 'scale-llm-engine' is available. Please upgrade!
To upgrade, run: pip install --upgrade scale-llm-engine
Don't want to see this message? Set the environment variable 'LLM_ENGINE_DISABLE_VERSION_CHECK' to 'true'.
✅ New route fields work!
Routes: ['/v1/chat/completions', '/analyze']
Forwarder type: passthrough

Bigger Test

# start the server
python examples/multi_route_fastapi_server.py

# test that you can hit it
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -s http://localhost:5005/health
{"status":"healthy","routes":["/predict","/v1/chat/completions","/v1/completions","/analyze","/custom/endpoint"]}%  

# start the forwarder
GIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \
    --config model_engine_server/inference/configs/service--http_forwarder.yaml \
    --port 5001 \
    --num-workers 1 \
    --set "forwarder.sync.extra_routes=['/v1/chat/completions','/v1/completions','/analyze','/custom/endpoint']" \
    --set "forwarder.stream.extra_routes=['/v1/chat/completions','/v1/completions','/analyze','/custom/endpoint']" \
    --set "forwarder.sync.healthcheck_route=/health" \
    --set "forwarder.stream.healthcheck_route=/health"

# Test a few of the routes

(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -X POST localhost:5001/predict -H "Content-Type: application/json" -d '{"args": {"text": "Hello world", "model": "test"}}'
{"result":"{\"result\": \"Processed text: Hello world\", \"model\": \"test\", \"route\": \"/predict\"}"}%                                                                                                                   
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -X POST localhost:5001/v1/chat/completions -H "Content-Type: application/json" -d '{"args": {"messages": [{"role": "user", "content": "Hello, how are you?"}], "model": "gpt-3.5-turbo"}}'
{"result":"{\"choices\": [{\"message\": {\"role\": \"assistant\", \"content\": \"Echo: Hello, how are you?\"}, \"finish_reason\": \"stop\", \"index\": 0}], \"model\": \"gpt-3.5-turbo\", \"usage\": {\"prompt_tokens\": 4, \"completion_tokens\": 5, \"total_tokens\": 9}}"}%                                                                                                                                                                          
(model-engine-venv) (base) ➜  llm-engine git:(meher-m/multiple-routes-passthrough-launch) ✗ curl -X POST localhost:5001/analyze -H "Content-Type: application/json" -d '{"args": {"text": "This is a good example of multi-route functionality working perfectly!"}}'
{"result":"{\"analysis\": {\"word_count\": 10, \"char_count\": 70, \"sentiment\": \"positive\"}, \"text\": \"This is a good example of multi-route functionality working perfectly!\", \"route\": \"/analyze\"}"}%  

Linear Ticket

https://linear.app/scale-epd/issue/MLI-4966/support-multiple-routes-passthrough-on-launch

@meher-m meher-m self-assigned this Oct 14, 2025
@meher-m meher-m changed the title Launch support multiple routes passthrough [MLI-4966] Launch support multiple routes passthrough Oct 14, 2025
@meher-m meher-m requested a review from dmchoiboi October 14, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant