This project is a local Rust-powered AI assistant that uses embedded documentation, agent tools, and a local LLM to help answer Rust programming questions — with context-aware code generation and crate discovery.
- RAG (Retrieval-Augmented Generation) with ChromaDB
- 🕵️♂️ Agents for:
- Crate docs (
docs.rs) - GitHub repo search + README parsing
- Standard library and Rust Book content
- Crate docs (
- 🤖 Local LLM (LLaMA3 / Mistral / OpenHermes) integration
- 🧠 Chat interface with memory + tool calling
- 📁 Modular source loading from local docs (
rustup, crates, scraped pages)
git clone https://github.com/bradleyd/rustopedia.git
cd rustopediacd rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtYou can load any Rust content (books, docs, READMEs) into ChromaDB.
- Local docs via
rustup(HTML extraction + chunking) - Crate docs from
docs.rs(automated fetching) - Rust book or stdlib
- GitHub crate READMEs
Option 1: Load Rust core docs (Book + Stdlib)
cd rag
chmod +x rustup_docs.sh
./rustup_docs.shOption 2: Load popular crate documentation
cd rag
chmod +x crate_docs.sh
./crate_docs.sh all # Fetches serde, tokio, clap, etc.Option 3: Load specific crate docs
./crate_docs.sh fetch tokio # Fetch specific crate
./crate_docs.sh embed # Embed all fetched cratesPlace .md or .txt files into sample_docs/ under collection folders:
sample_docs/rust-book/
sample_docs/rust-docs/
sample_docs/crates/Then run:
python embed_docs.py --dir ../sample_docs/your-collection --collection your-name --chunk-size 800 --overlap 100All files will be embedded with intelligent chunking and stored in ./chroma_db.
You can manually query docs using:
python query_docs.py "How do I deserialize JSON into an enum?"This returns the most relevant chunks across all embedded docs.
Launch the LLM shell with agent orchestration:
cargo run --bin llm_shellYou can now ask things like:
- "What’s the best way to build a CLI app in Rust?"
- "How do I initialize a
Vecwith a capacity?" - "Give me an example using
serdeto serialize structs"
The system will:
- Search ChromaDB
- Route queries to agents (GitHub, crate, docs)
- Return helpful, idiomatic answers using a local LLM
- Classifies queries into topics (e.g.
cli,web) - Uses GitHub API to find top crates
- Pulls README and returns as context
- Fetches documentation from
docs.rsor stdlib/book - Extracts relevant examples
- Adds results to Chroma
rustopedia/
├── llm_shell/ # Main chat interface with agent orchestration
├── rag_engine/ # Core RAG functionality
├── rag_server/ # HTTP server for RAG queries
├── agents/
│ ├── github_agent/ # GitHub repository discovery
│ ├── docs_agent/ # Documentation fetching
│ └── crate_agent/ # Crate.io integration
├── /
│ ├── embed_docs.py # Document embedding with chunking
│ ├── query_docs.py # Manual query interface
│ ├── rustup_docs.sh # Automated Rust docs extraction
│ ├── crate_docs.sh # Crate documentation manager
│ └── sample_docs/ # Document storage (gitignored)
└── chroma_db/ # Persistent vector database
# Load Rust core documentation
cd rag
./rustup_docs.sh
# Load popular crate docs
./crate_docs.sh all
# Query manually
python query_docs.py "How do I create a VecDeque?"
# Chat with LLM
cd ../
cargo run- Allow agent results to include multiple sources
- Stream LLM responses with token buffering
- Add RAG-based fallback when LLM confidence is low
- Use embeddings to determine best agent (vs. keyword match)
The crate_docs.sh script provides comprehensive crate documentation management:
# View all available commands
./crate_docs.sh
# Fetch specific crates
./crate_docs.sh fetch serde
./crate_docs.sh fetch tokio
# Fetch popular crates automatically
./crate_docs.sh popular
# Re-embed all fetched crates
./crate_docs.sh embed
# View collection statistics
./crate_docs.sh stats
# Clear and rebuild everything
./crate_docs.sh allInstall additional Rust docs locally:
rustup component add rust-docsThe rustup_docs.sh script will automatically extract and embed them with optimal chunking.
MIT