@@ -11,9 +11,19 @@ A sophisticated load balancing and failover plugin for OptiLLM that distributes
1111- 📊 ** Performance Tracking** : Monitor latency and errors per provider
1212- 🗺️ ** Model Mapping** : Map model names to provider-specific deployments
1313
14+ ## Installation
15+
16+ ``` bash
17+ # Install OptiLLM via pip
18+ pip install optillm
19+
20+ # Verify installation
21+ optillm --version
22+ ```
23+
1424## Quick Start
1525
16- ### 1. Basic Setup
26+ ### 1. Create Configuration
1727
1828Create ` ~/.optillm/proxy_config.yaml ` :
1929
@@ -23,27 +33,69 @@ providers:
2333 base_url : https://api.openai.com/v1
2434 api_key : ${OPENAI_API_KEY}
2535 weight : 2
36+ model_map :
37+ gpt-4 : gpt-4-turbo-preview # Optional: map model names
2638
2739 - name : backup
2840 base_url : https://api.openai.com/v1
2941 api_key : ${OPENAI_API_KEY_BACKUP}
3042 weight : 1
3143
3244routing :
33- strategy : weighted
45+ strategy : weighted # Options: weighted, round_robin, failover
46+ ` ` `
47+
48+ ### 2. Start OptiLLM Server
49+
50+ ` ` ` bash
51+ # Option A: Use proxy as default for ALL requests (recommended)
52+ optillm --approach proxy
53+
54+ # Option B: Start server normally (requires model prefix or extra_body)
55+ optillm
56+
57+ # With custom port
58+ optillm --approach proxy --port 8000
3459```
3560
36- ### 2. Usage Examples
61+ ### 3. Usage Examples
62+
63+ #### When using ` --approach proxy ` (Recommended)
64+ ``` bash
65+ # No need for "proxy-" prefix! The proxy handles all requests automatically
66+ curl -X POST http://localhost:8000/v1/chat/completions \
67+ -H " Content-Type: application/json" \
68+ -d ' {
69+ "model": "gpt-4",
70+ "messages": [{"role": "user", "content": "Hello"}]
71+ }'
72+
73+ # The proxy will:
74+ # 1. Route to one of your configured providers
75+ # 2. Apply model mapping if configured
76+ # 3. Handle failover automatically
77+ ```
3778
38- #### Standalone Proxy
79+ #### Without ` --approach proxy ` flag
3980``` bash
40- # Route requests through proxy
81+ # Method 1: Use model prefix
4182curl -X POST http://localhost:8000/v1/chat/completions \
4283 -H " Content-Type: application/json" \
4384 -d ' {
4485 "model": "proxy-gpt-4",
4586 "messages": [{"role": "user", "content": "Hello"}]
4687 }'
88+
89+ # Method 2: Use extra_body
90+ curl -X POST http://localhost:8000/v1/chat/completions \
91+ -H " Content-Type: application/json" \
92+ -d ' {
93+ "model": "gpt-4",
94+ "messages": [{"role": "user", "content": "Hello"}],
95+ "extra_body": {
96+ "optillm_approach": "proxy"
97+ }
98+ }'
4799```
48100
49101#### Proxy with Approach/Plugin
@@ -151,20 +203,29 @@ providers:
151203
152204### Model-Specific Routing
153205
154- Different providers may use different model names:
206+ When using ` --approach proxy`, the proxy automatically maps model names to provider-specific deployments :
155207
156208` ` ` yaml
157209providers:
158210 - name: azure
159211 base_url: ${AZURE_ENDPOINT}
160212 api_key: ${AZURE_KEY}
161213 model_map:
162- # Request -> Provider mapping
214+ # Request model -> Provider deployment name
163215 gpt-4: gpt-4-deployment-001
164216 gpt-4-turbo: gpt-4-turbo-latest
165217 gpt-3.5-turbo: gpt-35-turbo-deployment
218+
219+ - name: openai
220+ base_url: https://api.openai.com/v1
221+ api_key: ${OPENAI_API_KEY}
222+ # No model_map needed - uses model names as-is
166223` ` `
167224
225+ With this configuration and `optillm --approach proxy` :
226+ - Request for "gpt-4" → Azure uses "gpt-4-deployment-001", OpenAI uses "gpt-4"
227+ - Request for "gpt-3.5-turbo" → Azure uses "gpt-35-turbo-deployment", OpenAI uses "gpt-3.5-turbo"
228+
168229# ## Failover Configuration
169230
170231Set up primary and backup providers :
@@ -294,16 +355,22 @@ from openai import OpenAI
294355
295356client = OpenAI(
296357 base_url="http://localhost:8000/v1",
297- api_key="dummy"
358+ api_key="dummy" # Can be any string when using proxy
359+ )
360+
361+ # If server started with --approach proxy:
362+ response = client.chat.completions.create(
363+ model="gpt-4", # No "proxy-" prefix needed!
364+ messages=[{"role": "user", "content": "Hello"}]
298365)
299366
300- # Proxy wrapping MOA approach
367+ # Or explicitly use proxy with another approach:
301368response = client.chat.completions.create(
302369 model="gpt-4",
303370 messages=[{"role": "user", "content": "Hello"}],
304371 extra_body={
305372 "optillm_approach": "proxy",
306- "proxy_wrap": "moa"
373+ "proxy_wrap": "moa" # Proxy will route MOA's requests
307374 }
308375)
309376` ` `
@@ -312,9 +379,10 @@ response = client.chat.completions.create(
312379` ` ` python
313380from langchain.llms import OpenAI
314381
382+ # If server started with --approach proxy:
315383llm = OpenAI(
316384 openai_api_base="http://localhost:8000/v1",
317- model_name="proxy- gpt-4"
385+ model_name="gpt-4" # Proxy handles routing automatically
318386)
319387
320388response = llm("What is the meaning of life?")
0 commit comments