Systematically Improving RAG Applications

A comprehensive course on building and improving Retrieval-Augmented Generation (RAG) systems through systematic evaluation and optimization. This repository contains course materials that supplement the popular Maven course. For additional free materials, visit improvingrag.com.

Course Overview

This course teaches you to move beyond trial-and-error RAG development through a data-driven approach. You'll learn to:

Build robust evaluation frameworks to measure RAG performance objectively
Fine-tune embedding models for 15-30% performance improvements
Understand user query patterns through topic modeling and classification
Enhance retrieval with structured metadata and SQL integration
Implement sophisticated tool selection and orchestration

Why This Course Matters

RAG systems often fail to meet user needs because developers lack systematic approaches to improvement. This course provides:

Objective Measurement: Learn to distinguish real improvements from random variation
Targeted Optimization: Identify exactly where your system fails and why
Production-Ready Techniques: Apply methods proven in real-world applications
End-to-End Coverage: From basic retrieval to complex multi-tool orchestration

Course Structure

Week 0: Foundation and Environment Setup

Learn the fundamental tools for the course: Jupyter Notebooks, LanceDB for vector search, and Pydantic Evals for systematic evaluation.

Key Skills: Vector databases, hybrid search, evaluation frameworks

Week 1: RAG Evaluation Foundations

Build a comprehensive evaluation framework using synthetic data generation, retrieval metrics, and statistical validation.

Key Skills: Synthetic question generation, recall@k, MRR@k, bootstrapping, statistical significance testing

Week 2: Embedding Fine-tuning

Fine-tune embedding models using both managed services (Cohere) and open-source approaches (sentence-transformers) for significant performance gains.

Key Skills: Hard negative mining, triplet loss training, model deployment

Week 4: Query Understanding

Apply topic modeling to discover user query patterns and build classification systems for ongoing monitoring.

Key Skills: BERTopic, query classification, pattern discovery, satisfaction analysis

Week 5: Structured Data & Metadata

Enhance RAG with structured metadata filtering, SQL integration, and PDF parsing for handling complex queries.

Key Skills: Metadata extraction, hybrid retrieval, Text-to-SQL, document parsing

Week 6: Tool Selection

Evaluate and improve tool selection in multi-tool RAG systems through systematic testing and prompting strategies.

Key Skills: Tool orchestration, precision/recall for tools, few-shot prompting

Getting Started

Important Guides

Before starting, please read:

Notebook Versions Guide - Explains the different notebook versions (standard, logfire, modal)
Setup Verification - Run this first to verify your environment

Prerequisites

Python 3.9 (required for BERTopic dependency)
Basic knowledge of Python, machine learning concepts, and APIs
API keys for various services (see Environment Setup)

Installation with uv (Recommended)

First, install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then create a virtual environment and install dependencies:

# Create a virtual environment with Python 3.9
uv venv --python 3.9

# Activate the virtual environment
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate  # On Windows

# Install all dependencies
uv sync

Alternative: Installing with pip

pip install -e .

Environment Setup

Copy .env.example to .env and add your API keys:

cp .env.example .env

Required API keys:

COHERE_API_KEY: Production key (not trial) from Cohere
OPENAI_API_KEY: From OpenAI
HF_TOKEN: Write-enabled token from Hugging Face
LOGFIRE_TOKEN: From Pydantic Logfire
BRAINTRUST_API_KEY: From Braintrust

Load environment variables in notebooks:

from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())

Key Learning Outcomes

By completing this course, you'll be able to:

Measure What Matters: Build evaluation frameworks that objectively measure RAG performance
Improve Systematically: Apply data-driven techniques instead of random experimentation
Handle Complex Queries: Support queries requiring metadata filtering, SQL access, and multi-tool coordination
Deploy with Confidence: Verify improvements are statistically significant before production
Scale Effectively: Apply these techniques to any RAG application or domain

Course Materials

Notebooks

Each week contains 2-4 Jupyter notebooks with hands-on exercises. Notebooks include:

Detailed explanations of concepts
Working code examples
Visualization of results
Best practices and tips

Datasets

Bird-Bench Text-to-SQL dataset for evaluation
Synthetic transaction data for embedding fine-tuning
Klarna FAQ pages for query understanding
Clothing dataset for metadata extraction
70+ commands for tool selection evaluation

Office Hours

The office_hours/ directory contains transcripts and summaries from live sessions, providing additional insights and Q&A content.

Development Setup

Pre-commit Hooks

For contributors, install pre-commit hooks:

pip install pre-commit
pre-commit install

These hooks ensure code quality through:

Black formatting
Ruff linting with auto-fixes
YAML validation
Large file prevention

Troubleshooting

Running Outside Jupyter

When running code in Python files instead of notebooks, wrap async calls:

import asyncio
asyncio.run(main())

Visualization Issues

Notebooks include built-in visualizations. If running outside Jupyter, you may need to explicitly call plt.show() for matplotlib plots.

Community and Support

Report issues at GitHub Issues
Visit improvingrag.com for additional resources
Join the Maven course for live instruction and community access

Acknowledgments

Special thanks to Dmitry Labazkin for frequent feedback and contributions to improve the notebooks.

Note: This is an advanced course assuming familiarity with LLMs and basic RAG concepts. For beginners, we recommend starting with introductory materials on vector databases and semantic search before diving into systematic improvement techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
cohort_1		cohort_1
cohort_2		cohort_2
latest		latest
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
convert_notebooks.sh		convert_notebooks.sh
convert_notebooks_to_md.py		convert_notebooks_to_md.py
course-preview.md		course-preview.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Systematically Improving RAG Applications

Course Overview

Why This Course Matters

Course Structure

Week 0: Foundation and Environment Setup

Week 1: RAG Evaluation Foundations

Week 2: Embedding Fine-tuning

Week 4: Query Understanding

Week 5: Structured Data & Metadata

Week 6: Tool Selection

Getting Started

Important Guides

Prerequisites

Installation with uv (Recommended)

Alternative: Installing with pip

Environment Setup

Key Learning Outcomes

Course Materials

Notebooks

Datasets

Office Hours

Development Setup

Pre-commit Hooks

Troubleshooting

Running Outside Jupyter

Visualization Issues

Community and Support

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

567-labs/systematically-improving-rag

Folders and files

Latest commit

History

Repository files navigation

Systematically Improving RAG Applications

Course Overview

Why This Course Matters

Course Structure

Week 0: Foundation and Environment Setup

Week 1: RAG Evaluation Foundations

Week 2: Embedding Fine-tuning

Week 4: Query Understanding

Week 5: Structured Data & Metadata

Week 6: Tool Selection

Getting Started

Important Guides

Prerequisites

Installation with uv (Recommended)

Alternative: Installing with pip

Environment Setup

Key Learning Outcomes

Course Materials

Notebooks

Datasets

Office Hours

Development Setup

Pre-commit Hooks

Troubleshooting

Running Outside Jupyter

Visualization Issues

Community and Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages