RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite.
- 🧠 Choose any LLM provider with LiteLLM, including local llama-cpp-python models
- 💾 Choose either PostgreSQL or SQLite as a keyword & vector search database
- 🥇 Choose any reranker with rerankers, including multilingual FlashRank as the default
- ❤️ Only lightweight and permissive open source dependencies (e.g., no PyTorch or LangChain)
- 🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows
- 📖 PDF to Markdown conversion on top of pdftext and pypdfium2
- 🧬 Multi-vector chunk embedding with late chunking and contextual chunk headings
- ✂️ Optimal level 4 semantic chunking by solving a binary integer programming problem
- 🔍 Hybrid search with the database's native keyword & vector search (tsvector+pgvector, FTS5+sqlite-vec1)
- 🌀 Optimal closed-form linear query adapter by solving an orthogonal Procrustes problem
- 💬 Optional customizable ChatGPT-like frontend for web, Slack, and Teams with Chainlit
- ✍️ Optional conversion of any input document to Markdown with Pandoc
- ✅ Optional evaluation of retrieval and generation performance with Ragas
First, begin by installing spaCy's multilingual sentence model:
# Install spaCy's xx_sent_ud_sm:
pip install https://github.com/explosion/spacy-models/releases/download/xx_sent_ud_sm-3.7.0/xx_sent_ud_sm-3.7.0-py3-none-any.whlNext, it is optional but recommended to install an accelerated llama-cpp-python precompiled binary with:
# Configure which llama-cpp-python precompiled binary to install (⚠️ only v0.2.88 is supported right now):
LLAMA_CPP_PYTHON_VERSION=0.2.88
PYTHON_VERSION=310
ACCELERATOR=metal|cu121|cu122|cu123|cu124
PLATFORM=macosx_11_0_arm64|linux_x86_64|win_amd64
# Install llama-cpp-python:
pip install "https://github.com/abetlen/llama-cpp-python/releases/download/v$LLAMA_CPP_PYTHON_VERSION-$ACCELERATOR/llama_cpp_python-$LLAMA_CPP_PYTHON_VERSION-cp$PYTHON_VERSION-cp$PYTHON_VERSION-$PLATFORM.whl"Finally, install RAGLite with:
pip install ragliteTo add support for a customizable ChatGPT-like frontend, use the chainlit extra:
pip install raglite[chainlit]To add support for filetypes other than PDF, use the pandoc extra:
pip install raglite[pandoc]To add support for evaluation, use the ragas extra:
pip install raglite[ragas]- Configuring RAGLite
- Inserting documents
- Searching and Retrieval-Augmented Generation (RAG)
- Computing and using an optimal query adapter
- Evaluation of retrieval and generation
- Serving a customizable ChatGPT-like frontend
Tip
🧠 RAGLite extends LiteLLM with support for llama.cpp models using llama-cpp-python. To select a llama.cpp model (e.g., from bartowski's collection), use a model identifier of the form "llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>", where n_ctx is an optional parameter that specifies the context size of the model.
Tip
💾 You can create a PostgreSQL database in a few clicks at neon.tech.
First, configure RAGLite with your preferred PostgreSQL or SQLite database and any LLM supported by LiteLLM:
from raglite import RAGLiteConfig
# Example 'remote' config with a PostgreSQL database and an OpenAI LLM:
my_config = RAGLiteConfig(
db_url="postgresql://my_username:my_password@my_host:5432/my_database"
llm="gpt-4o-mini", # Or any LLM supported by LiteLLM.
embedder="text-embedding-3-large", # Or any embedder supported by LiteLLM.
)
# Example 'local' config with a SQLite database and a llama.cpp LLM:
my_config = RAGLiteConfig(
db_url="sqlite:///raglite.sqlite",
llm="llama-cpp-python/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/*Q4_K_M.gguf@8192",
embedder="llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf",
)You can also configure any reranker supported by rerankers:
from rerankers import Reranker
# Example remote API-based reranker:
my_config = RAGLiteConfig(
db_url="postgresql://my_username:my_password@my_host:5432/my_database"
reranker=Reranker("cohere", lang="en", api_key=COHERE_API_KEY)
)
# Example local cross-encoder reranker per language (this is the default):
my_config = RAGLiteConfig(
db_url="sqlite:///raglite.sqlite",
reranker=(
("en", Reranker("ms-marco-MiniLM-L-12-v2", model_type="flashrank")), # English
("other", Reranker("ms-marco-MultiBERT-L-12", model_type="flashrank")), # Other languages
)
)Tip
✍️ To insert documents other than PDF, install the pandoc extra with pip install raglite[pandoc].
Next, insert some documents into the database. RAGLite will take care of the conversion to Markdown, optimal level 4 semantic chunking, and multi-vector embedding with late chunking:
# Insert documents:
from pathlib import Path
from raglite import insert_document
insert_document(Path("On the Measure of Intelligence.pdf"), config=my_config)
insert_document(Path("Special Relativity.pdf"), config=my_config)Now, you can search for chunks with vector search, keyword search, or a hybrid of the two. You can also rerank the search results with the configured reranker. And you can use any search method of your choice (hybrid_search is the default) together with reranking to answer questions with RAG:
# Search for chunks:
from raglite import hybrid_search, keyword_search, vector_search
prompt = "How is intelligence measured?"
chunk_ids_vector, _ = vector_search(prompt, num_results=20, config=my_config)
chunk_ids_keyword, _ = keyword_search(prompt, num_results=20, config=my_config)
chunk_ids_hybrid, _ = hybrid_search(prompt, num_results=20, config=my_config)
# Retrieve chunks:
from raglite import retrieve_chunks
chunks_hybrid = retrieve_chunks(chunk_ids_hybrid, config=my_config)
# Rerank chunks:
from raglite import rerank_chunks
chunks_reranked = rerank_chunks(prompt, chunks_hybrid, config=my_config)
# Answer questions with RAG:
from raglite import rag
prompt = "What does it mean for two events to be simultaneous?"
stream = rag(prompt, config=my_config)
for update in stream:
print(update, end="")
# You can also pass a search method or search results directly:
stream = rag(prompt, search=hybrid_search, config=my_config)
stream = rag(prompt, search=chunks_reranked, config=my_config)RAGLite can compute and apply an optimal closed-form query adapter to the prompt embedding to improve the output quality of RAG. To benefit from this, first generate a set of evals with insert_evals and then compute and store the optimal query adapter with update_query_adapter:
# Improve RAG with an optimal query adapter:
from raglite import insert_evals, update_query_adapter
insert_evals(num_evals=100, config=my_config)
update_query_adapter(config=my_config) # From here, simply call vector_search to use the query adapter.If you installed the ragas extra, you can use RAGLite to answer the evals and then evaluate the quality of both the retrieval and generation steps of RAG using Ragas:
# Evaluate retrieval and generation:
from raglite import answer_evals, evaluate, insert_evals
insert_evals(num_evals=100, config=my_config)
answered_evals_df = answer_evals(num_evals=10, config=my_config)
evaluation_df = evaluate(answered_evals_df, config=my_config)If you installed the chainlit extra, you can serve a customizable ChatGPT-like frontend with:
raglite chainlitThe application is also deployable to web, Slack, and Teams.
You can specify the database URL, LLM, and embedder directly in the Chainlit frontend, or with the CLI as follows:
raglite chainlit \
--db_url sqlite:///raglite.sqlite \
--llm llama-cpp-python/bartowski/Llama-3.2-3B-Instruct-GGUF/*Q4_K_M.gguf@4096 \
--embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.ggufTo use an API-based LLM, make sure to include your credentials in a .env file or supply them inline:
OPENAI_API_KEY=sk-... raglite chainlit --llm gpt-4o-mini --embedder text-embedding-3-largeraglite-chainlit.mov
Prerequisites
1. Set up Git to use SSH
- Generate an SSH key and add the SSH key to your GitHub account.
- Configure SSH to automatically load your SSH keys:
cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes ForwardAgent yes EOF
2. Install Docker
- Install Docker Desktop.
- Linux only:
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
cat << EOF >> ~/.bashrc export UID=$(id --user) export GID=$(id --group) EOF
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
- Linux only:
3. Install VS Code or PyCharm
- Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
- Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments
The following development environments are supported:
- ⭐️ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
- ⭐️ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
- Dev Container: clone this repository, open it with VS Code, and run Ctrl/⌘ + ⇧ + P → Dev Containers: Reopen in Container.
- PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the
devservice. - Terminal: clone this repository, open it with your terminal, and run
docker compose up --detach devto start a Dev Container in the background, and then rundocker compose exec dev zshto open a shell prompt in the Dev Container.
Developing
- This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
- Run
poefrom within the development environment to print a list of Poe the Poet tasks available to run on this project. - Run
poetry add {package}from within the development environment to install a run time dependency and add it topyproject.tomlandpoetry.lock. Add--group testor--group devto install a CI or development dependency, respectively. - Run
poetry updatefrom within the development environment to upgrade all dependencies to the latest versions allowed bypyproject.toml. - Run
cz bumpto bump the package's version, update theCHANGELOG.md, and create a git tag.
Footnotes
-
We use PyNNDescent until sqlite-vec is more mature. ↩