Skip to content

nandigama/rag-zero-to-hero-guide

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

👩🏼‍💻 RAG Zero to Hero Guide

This repository serves as a comprehensive guide to learn RAG from basics to advanced.

LinkedIn Twitter Twitter

Quick links

🧱 RAG Basics Course 🚀 RAG Toolkit 🩸 RAG Survey Papers

RAG Basics Course

Topic Description Link
What is RAG? Explain RAG in with a simple example. Link
Why RAG? Explain the drawbacks of LLMs and how RAG addresses them. Link
How does RAG work? Explain the different steps in RAG - Indexing, Retrieval, Augmentation and Generation. Link
RAG Benefits and Challenges Discusses the benefits and challenges of RAG. Link
RAG Must Know Terms Definitions of RAG must know terms. Link
RAG Roadmap Detailed roadmap to learn RAG from basics to advanced. Link
RAG Developer's Stack Covers the various libraries used to build RAG systems Link
RAG from Scratch RAG implementation from scratch without any frameworks. Link
RAG with LangChain RAG implementation using LangChain framework. Link
Website RAG RAG over a website implemented using LangChain framework. Link
YouTube Video RAG RAG over a YouTube video transcript implemented using LangChain framework. Link
Agentic RAG Agentic RAG system implemented using CrewAI framework. Link

RAG Toolkit

🔴Frameworks🔴

Library Description Link
LangChain LangChain is a framework for developing applications powered by large language models (LLMs). Link
Llama Index LlamaIndex is a data framework for your LLM applications Link
Haystack Haystack is an end-to-end LLM framework that allows you to build applications powered by LLMs, Transformer models, vector search and more. Link
fastRAG Research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. Link
Llmware Unified framework for building enterprise RAG pipelines with small, specialized models Link

🟠Research🟠

Library Description Link
FlashRAG A Python Toolkit for Efficient RAG Research. This toolkit includes 36 pre-processed benchmark RAG datasets and 16 state-of-the-art RAG algorithms. Link

🟡Data Extraction - Web Scraping🟡

Library Description Link
Crawl4AI (Web Scraping) Open-source LLM Friendly Web Crawler & Scrapper Link
ScrapeGraphAI (Web & Document) A web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Link
Crawlee (Web Scraping) A web scraping and browser automation library Link

🟢Data Extraction - Documents🟢

Library Description Link
Docling (Document) Docling parses documents and exports them to the desired format with ease and speed. Link
Llama Parse (Document) GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Link
PyMuPDF4LLM (Document) PyMuPDF4LLM library makes it easier to extract PDF content in the format you need for LLM & RAG environments. Link
MegaParse (Document) Parser for every type of documents Link
ExtractThinker (Document) Document Intelligence library for LLMs Link

🔵Vector Database🔵

Library Description Link
SQLite-Vec A vector search SQLite extension that runs anywhere! Link
FAISS A library for efficient similarity search and clustering of dense vectors. Link
PGVector Open-source vector similarity search for Postgres Link
Chroma The AI-native open-source embedding database. The fastest way to build Python or JavaScript LLM apps with memory! Link
Qdrant High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Link
Pincone The vector database for machine learning applications. Link
Weaviate Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable. Link
Milvus Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search Link

🟣Chunking🟣

Library Description Link
Chonkie RAG chunking library that is lightweight, lightning-fast, and easy to use. The no-nonsense RAG chunking library. This library supports seven different chunking strategies. Link

🟤Rerankers🟤

Library Description Link
Rerankers A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models. Any new reranking models can be added with very little knowledge of the codebase. Link

🟠Agentic RAG🟠

Library Description Link
CrewAI Framework for orchestrating role-playing, autonomous AI agents. Link
Agno Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI. Link
LangGraph Build resilient language agents as graphs. Link
AutoGen An open-source framework for building AI agent systems. Link
R2R Agentic Retrieval-Augmented Generation (RAG) with a RESTful API. R2R offers multimodal content ingestion, hybrid search functionality, knowledge graphs, and comprehensive user and document management. Link
Vectara Build Agentic RAG applications. Link

🟢Graph RAG🟢

Library Description Link
GraphRAG A modular graph-based Retrieval-Augmented Generation (RAG) system. Link
Nano GraphRAG A simple, easy-to-hack GraphRAG implementation. Link
FastGraph RAG Streamlined and promptable Fast GraphRAG framework designed for interpretable, high-precision, agent-driven retrieval workflows. Link

🔴Evaluation🔴

Library Description Link
RAGChecker A Fine-grained Framework For Diagnosing RAG. Link
BeyondLLM Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems Link
RAGAS Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Link
Giskard Open-Source Evaluation & Testing for ML & LLM systems. Link
DeepEval The LLM (RAG) Evaluation Framework. Link

RAG Survey Papers

Paper Category Link
Retrieval-Augmented Generation for Large Language Models: A Survey General Link
Retrieval-Augmented Generation for Natural Language Processing: A Survey General Link
A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions General Link
Retrieval-Augmented Generation for AI-Generated Content: A Survey General Link
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models General Link
A Survey on Retrieval-Augmented Text Generation for Large Language Models General Link
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely General Link
Graph Retrieval-Augmented Generation: A Survey Graph RAG Link
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG Agentic RAG Link
Evaluation of Retrieval-Augmented Generation: A Survey Evaluation Link
Searching for Best Practices in Retrieval-Augmented Generation RAG Best Practices Link

About

Comprehensive guide to learn RAG from basics to advanced.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%