A "drop-in" replacement for OpenAI's client that offers a unified interface for interacting with large language models from various providers, with support for hundreds of models, built-in fallback mechanisms, and enhanced reliability features.
Important
More stars → more visibility → more contributors → better features → more robust tool for everyone 🎉
Thank you for your support! 🙏
- Overview
- Getting Started
- Key Features
- Supported Providers
- Architecture
- API Design
- Advanced Features
- Migration from OpenAI
- Model Naming Convention
- Configuration
- Test Coverage
- Documentation
- Call for Contributions
- License
- Acknowledgements
OneLLM is a lightweight, provider-agnostic Python library that offers a unified interface for interacting with large language models (LLMs) from various providers. It simplifies the integration of LLMs into applications by providing a consistent API while abstracting away provider-specific implementation details.
The library follows the OpenAI client API design pattern, making it familiar to developers already using OpenAI and enabling easy migration for existing applications. Simply change your import statements and instantly gain access to hundreds of models across dozens of providers while maintaining your existing code structure.
With support for 19 implemented providers (and more planned), OneLLM gives you access to approximately 300+ unique language models through a single, consistent interface - from the latest proprietary models to open-source alternatives, all accessible through familiar OpenAI-compatible patterns.
Note
Ready for Use: OneLLM now supports 19 providers with 300+ models! From cloud APIs to local models, you can access them all through a single, unified interface. Contributions are welcome to help add even more providers!
# Basic installation (includes OpenAI compatibility and download utility)
pip install OneLLM
# For all providers (includes dependencies for future provider support)
pip install "OneLLM[all]"
OneLLM includes a built-in utility for downloading GGUF models:
# Download a model from HuggingFace (saves to ~/llama_models by default)
onellm download --repo-id "TheBloke/Llama-2-7B-GGUF" --filename "llama-2-7b.Q4_K_M.gguf"
# Download to a custom directory
onellm download -r "microsoft/Phi-3-mini-4k-instruct-gguf" -f "Phi-3-mini-4k-instruct-q4.gguf" -o /path/to/models
# Basic usage with OpenAI-compatible syntax
from onellm import ChatCompletion
response = ChatCompletion.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message["content"])
# Output: I'm doing well, thank you for asking! I'm here and ready to help you...
For more detailed examples, check out the examples directory.
Feature | Description |
---|---|
📦 Drop-in replacement | Use your existing OpenAI code with minimal changes |
🔄 Provider-agnostic | Support for 300+ models across 19 implemented providers |
🔁 Automatic fallback | Seamlessly switch to alternative models when needed |
🔄 Auto-retry mechanism | Retry the same model multiple times before failing |
🧩 OpenAI-compatible | Familiar interface for developers used to OpenAI |
📺 Streaming support | Real-time streaming responses from supported providers |
🖼️ Multi-modal capabilities | Support for text, images, audio across compatible models |
🏠 Local LLM support | Run models locally via Ollama and llama.cpp |
⬇️ Model downloads | Built-in CLI to download GGUF models from HuggingFace |
🧹 Unicode artifact cleaning | Automatic removal of invisible characters to prevent AI detection |
🏷️ Consistent naming | Clear provider/model-name format for attribution |
🧪 Comprehensive tests | 96% test coverage ensuring reliability |
📄 Apache-2.0 license | Open-source license that protects contributions |
OneLLM currently supports 19 providers with more on the way:
- Anthropic - Claude family of models
- Anyscale - Configurable AI platform
- AWS Bedrock - Access to multiple model families
- Azure OpenAI - Microsoft-hosted OpenAI models
- Cohere - Command models with RAG
- DeepSeek - Chinese LLM provider
- Fireworks - Fast inference platform
- Moonshot - Kimi models with long-context capabilities
- Google AI Studio - Gemini models via API key
- Groq - Ultra-fast inference for Llama, Mixtral
- Mistral - Mistral Large, Medium, Small
- OpenAI - GPT-4o, 3o-mini, DALL-E, Whisper, etc.
- OpenRouter - Gateway to 100+ models
- Perplexity - Search-augmented models
- Together AI - Open-source model hosting
- Vertex AI - Google Cloud's enterprise Gemini
- X.AI - Grok models
- Ollama - Run models locally with easy management
- llama.cpp - Direct GGUF model execution
Through these providers, you gain access to hundreds of models, including:
Model Family | Notable Models |
---|---|
OpenAI Family | GPT-4o, GPT-4 Turbo, o3 |
Claude Family | Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku |
Llama Family | Llama 3 70B, Llama 3 8B, Code Llama |
Mistral Family | Mistral Large, Mistral 7B, Mixtral |
Gemini Family | Gemini Pro, Gemini Ultra, Gemini Flash |
Embeddings | Ada-002, text-embedding-3-small/large, Cohere embeddings |
Multimodal | GPT-4 Vision, Claude 3 Vision, Gemini Pro Vision |
OneLLM follows a modular architecture with clear separation of concerns:
---
config:
look: handDrawn
theme: mc
themeVariables:
background: 'transparent'
primaryColor: '#fff0'
secondaryColor: 'transparent'
tertiaryColor: 'transparent'
mainBkg: 'transparent'
flowchart:
layout: fixed
---
flowchart TD
%% User API Layer
User(User Application) --> ChatCompletion
User --> Completion
User --> Embedding
User --> OpenAIClient["OpenAI Client Interface"]
subgraph API["Public API Layer"]
ChatCompletion["ChatCompletion\n.create() / .acreate()"]
Completion["Completion\n.create() / .acreate()"]
Embedding["Embedding\n.create() / .acreate()"]
OpenAIClient
end
%% Core logic
subgraph Core["Core Logic"]
Router["Provider Router"]
Config["Configuration\nEnvironment Variables\nAPI Keys"]
FallbackManager["Fallback Manager"]
RetryManager["Retry Manager"]
end
%% Provider Layer
BaseProvider["Provider Interface<br>(Base Class)"]
subgraph Implementations["Provider Implementations"]
OpenAI["OpenAI"]
Anthropic["Anthropic"]
GoogleProvider["Google"]
Groq["Groq"]
Ollama["Local LLMs"]
OtherProviders["20+ Others"]
end
%% Utilities
subgraph Utilities["Utilities"]
Streaming["Streaming<br>Handlers"]
TokenCounting["Token<br>Counter"]
ErrorHandling["Error<br>Handling"]
Types["Type<br>Definitions"]
Models["Response<br>Models"]
end
%% External services
OpenAIAPI["OpenAI API"]
AnthropicAPI["Anthropic API"]
GoogleAPI["Google API"]
GroqAPI["Groq API"]
LocalModels["Ollama/llama.cpp"]
OtherAPIs["..."]
%% Connections
ChatCompletion --> Router
Completion --> Router
Embedding --> Router
OpenAIClient --> Router
Router --> Config
Router --> FallbackManager
FallbackManager --> RetryManager
RetryManager --> BaseProvider
BaseProvider --> OpenAI
BaseProvider --> Anthropic
BaseProvider --> GoogleProvider
BaseProvider --> Groq
BaseProvider --> Ollama
BaseProvider --> OtherProviders
BaseProvider --> Streaming
BaseProvider --> TokenCounting
BaseProvider --> ErrorHandling
OpenAI --> OpenAIAPI
Anthropic --> AnthropicAPI
GoogleProvider --> GoogleAPI
Groq --> GroqAPI
Ollama --> LocalModels
OtherProviders --> OtherAPIs
-
Public Interface Classes
ChatCompletion
- For chat-based interactionsCompletion
- For text completionEmbedding
- For generating embeddings
-
Provider Interface
- Abstract base class defining required methods
- Provider-specific implementations
-
Configuration System
- Environment variable support
- Runtime configuration options
-
Error Handling
- Standardized error types
- Provider-specific error mapping
-
Fallback System
- Automatic retries with alternative models
- Configurable fallback chains
- Graceful degradation options
OneLLM mirrors the OpenAI Python client library API for familiarity:
from onellm import ChatCompletion
# Basic usage (identical to OpenAI's client)
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
]
)
# Streaming
for chunk in ChatCompletion.create(
model="anthropic/claude-3-opus",
messages=[{"role": "user", "content": "Write a poem about AI"}],
stream=True
):
print(chunk.choices[0].delta.content, end="", flush=True)
# With fallback options
response = ChatCompletion.create(
model=["openai/gpt-4", "anthropic/claude-3-opus", "mistral/mistral-large"],
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# With retries and fallbacks for enhanced reliability
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Explain quantum computing"}],
retries=3, # Try gpt-4 up to 3 additional times before failing or using fallbacks
fallback_models=["anthropic/claude-3-opus", "mistral/mistral-large"]
)
# JSON mode for structured outputs
response = ChatCompletion.create(
model="openai/gpt-4o",
messages=[
{"role": "user", "content": "List the top 3 planets by size"}
],
response_format={"type": "json_object"} # Request structured JSON output
)
print(response.choices[0].message["content"]) # Outputs valid, parseable JSON
# Multi-modal with images
response = ChatCompletion.create(
model="openai/gpt-4-vision",
messages=[
{"role": "user", "content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]}
]
)
# Multi-modal with audio
response = ChatCompletion.create(
model="anthropic/claude-3-sonnet",
messages=[
{"role": "user", "content": [
{"type": "text", "text": "Transcribe and analyze this audio clip"},
{"type": "audio_url", "audio_url": {"url": "https://example.com/audio.mp3"}}
]}
]
)
from onellm import Completion
response = Completion.create(
model="groq/llama3-70b",
prompt="Once upon a time",
max_tokens=100
)
from onellm import Embedding
response = Embedding.create(
model="openai/text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog"
)
OneLLM includes built-in fallback support to handle API errors gracefully:
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello, how are you?"}],
fallback_models=[
"anthropic/claude-3-haiku", # Try Claude if GPT-4 fails
"openai/gpt-3.5-turbo" # Try GPT-3.5 if Claude fails
]
)
If the primary model fails due to service unavailability, rate limiting, or other retriable errors, OneLLM automatically tries the fallback models in sequence.
For transient errors, you can configure OneLLM to retry the same model multiple times before falling back to alternatives:
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello, how are you?"}],
retries=3, # Will try the same model up to 3 additional times if it fails
fallback_models=["anthropic/claude-3-haiku", "openai/gpt-3.5-turbo"]
)
This is implemented using the fallback mechanism under the hood, making it both powerful and efficient.
---
config:
look: handDrawn
theme: mc
themeVariables:
background: 'transparent'
primaryColor: '#fff0'
secondaryColor: 'transparent'
tertiaryColor: 'transparent'
mainBkg: 'transparent'
flowchart:
layout: fixed
---
flowchart TD
START(["Client Request"]) --> REQUEST["Chat/Completion Request"]
REQUEST --> PRIMARY["Primary Model<br>e.g., openai/gpt-4"]
PRIMARY --> API_CHECK{"API<br>Available?"}
API_CHECK -->|Yes| MODEL_CHECK{"Model<br>Available?"}
MODEL_CHECK -->|Yes| QUOTA_CHECK{"Quota/Rate<br>Limits OK?"}
QUOTA_CHECK -->|Yes| SUCCESS["Successful Response"]
SUCCESS --> RESPONSE(["Return to Client"])
API_CHECK -->|No| RETRY_DECISION{"Retry<br>Count < Max?"}
MODEL_CHECK -->|No| RETRY_DECISION
QUOTA_CHECK -->|No| RETRY_DECISION
RETRY_DECISION -->|Yes| RETRY["Retry with Delay<br>(Same Model)"]
RETRY --> PRIMARY
RETRY_DECISION -->|No| FALLBACK_CHECK{"Fallbacks<br>Available?"}
FALLBACK_CHECK -->|Yes| FALLBACK_MODEL["Next Fallback Model<br>e.g., anthropic/claude-3-haiku"]
FALLBACK_MODEL --> FALLBACK_TRY["Try Fallback"]
FALLBACK_TRY --> FALLBACK_API_CHECK{"API<br>Available?"}
FALLBACK_API_CHECK -->|Yes| FALLBACK_SUCCESS["Successful Response"]
FALLBACK_SUCCESS --> RESPONSE
FALLBACK_API_CHECK -->|No| NEXT_FALLBACK{"More<br>Fallbacks?"}
NEXT_FALLBACK -->|Yes| FALLBACK_MODEL
NEXT_FALLBACK -->|No| ERROR["Error Response"]
FALLBACK_CHECK -->|No| ERROR
ERROR --> RESPONSE
For applications that require structured data, OneLLM supports JSON mode with compatible providers:
response = ChatCompletion.create(
model="openai/gpt-4o",
messages=[
{"role": "user", "content": "List the top 3 programming languages with their key features"}
],
response_format={"type": "json_object"} # Request JSON output
)
# The response contains valid, parseable JSON
json_response = response.choices[0].message["content"]
print(json_response) # Structured JSON data
# Parse it with standard libraries
import json
structured_data = json.loads(json_response)
When using providers that don't natively support JSON mode, OneLLM automatically adds system instructions requesting JSON-formatted responses.
Both synchronous and asynchronous APIs are available:
# Synchronous
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
# Asynchronous with fallbacks and retries
response = await ChatCompletion.acreate(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello"}],
retries=2,
fallback_models=["anthropic/claude-3-opus"]
)
OneLLM provides multiple ways to migrate from the OpenAI client, including a fully compatible client interface:
OneLLM is a complete drop-in replacement for the OpenAI client with the OpenAI library included as a default dependency:
# Before
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello world"}]
)
# After - 100% identical client interface
from onellm import OpenAI # or Client
client = OpenAI() # completely compatible with OpenAI's client
response = client.chat.completions.create(
model="gpt-4", # automatically adds "openai/" prefix when needed
messages=[{"role": "user", "content": "Hello world"}]
)
# Before
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello world"}]
)
# After - more concise
from onellm import ChatCompletion
response = ChatCompletion.create(
model="openai/gpt-4", # explicitly using provider prefix
messages=[{"role": "user", "content": "Hello world"}]
)
# Adding fallback options with ChatCompletion
from onellm import ChatCompletion
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello world"}],
fallback_models=[
"anthropic/claude-3-haiku",
"openai/gpt-3.5-turbo"
],
# optional config
fallback_config={
"log_fallbacks": True
}
)
# Using retries with fallbacks for enhanced reliability
from onellm import ChatCompletion
response = ChatCompletion.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello world"}],
retries=3, # Will retry gpt-4 up to 3 additional times before using fallbacks
fallback_models=[
"anthropic/claude-3-haiku",
"openai/gpt-3.5-turbo"
]
)
# Using fallback with the client interface
from onellm import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello world"}],
fallback_models=[
"anthropic/claude-3-opus",
"groq/llama3-70b"
]
)
Models are specified using a provider prefix to clearly identify the source:
Provider | Format | Example |
---|---|---|
OpenAI | openai/{model} |
openai/gpt-4 |
google/{model} |
google/gemini-pro |
|
Anthropic | anthropic/{model} |
anthropic/claude-3-opus |
Groq | groq/{model} |
groq/llama3-70b |
Mistral | mistral/{model} |
mistral/mistral-large |
Ollama | ollama/{model}@host:port |
ollama/llama3:8b@localhost:11434 |
llama.cpp | llama_cpp/{model.gguf} |
llama_cpp/llama-3-8b-q4_K_M.gguf |
XAI (Grok) | xai/{model} |
xai/grok-beta |
Cohere | cohere/{model} |
cohere/command-r-plus |
AWS Bedrock | bedrock/{model} |
bedrock/claude-3-5-sonnet |
OneLLM can be configured through environment variables or at runtime:
# Environment variables
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-...
# Runtime configuration
import onellm
onellm.openai_api_key = "sk-..." # OpenAI API key
onellm.anthropic_api_key = "sk-..." # Anthropic API key
# Configure fallback behavior
onellm.config.fallback = {
"enabled": True,
"default_chains": {
"chat": ["openai/gpt-4", "anthropic/claude-3-opus", "groq/llama3-70b"],
"embedding": ["openai/text-embedding-3-small", "cohere/embed-english"]
},
"retry_delay": 1.0,
"max_retries": 3
}
OneLLM maintains comprehensive test coverage to ensure reliability and compatibility:
Coverage Metric | Value |
---|---|
Overall Package Coverage | 96% (357 passing tests) |
Modules with 100% Coverage | 16 modules |
Minimum Module Coverage | 90% |
Provider Implementations | 93-95% coverage |
This extensive test coverage ensures reliable operation across all supported providers and models.
OneLLM uses a code-first documentation approach:
-
Examples Directory: The
examples/
directory contains well-documented example scripts that demonstrate all key features of the library:- Each example includes detailed frontmatter explaining its purpose and relationship to the codebase
- Examples range from basic usage to advanced features like fallbacks, retries, and multi-modal capabilities
- Running the examples is the fastest way to understand the library's capabilities
-
Code Docstrings: All public APIs, classes, and methods have comprehensive docstrings:
- Detailed parameter descriptions
- Return value documentation
- Exception information
- Usage examples
-
Type Annotations: The codebase uses Python type annotations throughout:
- Provides IDE autocompletion support
- Makes argument requirements clear
- Enables static type checking
This approach keeps documentation tightly coupled with code, ensuring it stays up-to-date as the library evolves. To get started, we recommend examining the examples that match your use case.
We're building something amazing with OneLLM, and we'd love your help to make it even better! There are many ways to contribute:
- Code contributions: Add new providers, enhance existing ones, or improve core functionality
- Bug reports: Help us identify and fix issues
- Documentation: Improve examples, clarify API usage, or fix typos
- Feature requests: Share your ideas for making OneLLM more powerful
- Testing: Help ensure reliability across different environments and use cases
Getting started is easy:
- Check out our open issues for good first contributions
- Fork the repository and create a feature branch
- Make your improvements and run tests
- Submit a pull request with a clear description of your changes
For complete details on contribution guidelines, code style, provider development, testing standards, and our contributor license agreement, please read our CONTRIBUTING.md.
OneLLM is licensed under the Apache-2.0 license.
I chose the Apache 2.0 license to make OneLLM easy to adopt, integrate, and build on. This license:
- Allows you to freely use, modify, and distribute the library in both open-source and proprietary software
- Encourages wide adoption by individuals, startups, and enterprises alike
- Includes a clear patent grant for legal peace of mind
- Enables flexible usage without the complexity of copyleft restrictions
Whether you’re building internal tools or commercial applications, Apache 2.0 gives you the freedom to use OneLLM however you need – no strings attached.
This project stands on the shoulders of many excellent open-source projects and would not be possible without the collaborative spirit of the developer community.
Special thanks to all the LLM providers whose APIs this library integrates with, and to the early adopters who tested the library and provided crucial feedback.
Thank you for trying out OneLLM! Your interest and support mean a lot to this project. Whether you're using it in your applications, experimenting with different LLM providers, or just exploring the capabilities, your participation helps drive this project forward.
If you find OneLLM useful in your work:
- Consider starring the repository on GitHub
- Share your experiences or use cases with the community
- Let us know how we can make it better for your needs
Thank you for your support!
~ Ran Aroussi
𝕏 / @aroussi