Wrap Qwen Code as an OpenAI-compatible API service, allowing you to enjoy the free Qwen3 Coder model through API!
- ✅ 2,000 requests/day
- ✅ 60 requests/minute rate limit
- ✅ Zero cost for individual users
- 🔌 OpenAI API Compatible: Implements
/v1/chat/completionsendpoint - 🚀 Quick Setup: Zero-config run with
uvx - ⚡ High Performance: Built on FastAPI + asyncio with concurrent request support
-
Install uv
uvis an extremely fast Python package installer and resolver, written in Rust.pip install uv
-
Install dependencies
Clone this repository and run:
uv pip install -e .
Follow the installation guide from Qwen Code's official repository.
The first time you run qwen, it will guide you through an authentication process using the OAuth 2.0 device flow. This is a one-time setup.
- Browser-Based Login: The application will automatically open a new tab in your web browser, directing you to the Qwen login page.
- Authorization: Log in to your Qwen account in the browser.
After successful authorization, the application will securely store the authentication tokens in ~/.qwen/oauth_creds.json. This allows the proxy to access your Qwen account without requiring you to log in again.
Run the following command:
uv run qwen-code-proxyQwen Code Proxy listens on port 8765 by default. You can customize the startup port with the --port parameter.
After startup, test the service with curl:
curl http://localhost:8765/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy-key" \
-d '{
"model": "qwen3-coder-plus",
"messages": [{"role": "user", "content": "Hello! Can you introduce your self?"}]
}'from openai import OpenAI
client = OpenAI(
base_url='http://localhost:8765/v1',
api_key='dummy-key' # Any string works
)
response = client.chat.completions.create(
model='qwen3-coder-plus',
messages=[
{'role': 'user', 'content': 'Hello! Can you introduce your self?'}
],
)
print(response.choices[0].message.content)Add Model Provider in Kilo Code settings:
- API Provider: OpenAI Compatible
- API Host:
http://localhost:8765/v1 - API Key: Any string works
- Model Name:
qwen3-coder-plus - Uncheck the "Enable Streaming"
- Uncheck the "Image Support"
- Set the "Rate limit" to "1s", because currently Qwen Code's rate limit is 60 request per minute.
View command line parameters:
qwen-code-proxy --helpAvailable options:
--host: Server host address (default: 127.0.0.1)--port: Server port (default: 8765)--rate-limit: Max requests per minute (default: 60)--max-concurrency: Max concurrent subprocesses (default: 4)--timeout: Qwen Code Proxy command timeout in seconds (default: 30.0)--debug: Enable debug mode (enables debug logging and file watching)
MIT License
Issues and Pull Requests are welcome!
This project is a fork and adaptation of gemini-cli-proxy, originally created by William Liu.
The original tool provided an OpenAI-compatible API layer for Gemini CLI. This version has been modified to support Qwen Code instead.