Hello
Welcome to WiiDcos, a full-stack local RAG (Retrieval-Augmented Generation) system — built from scratch with LlamaIndex + Qdrant + FastAPI + Streamlit + Docker.
This project started life as a "hello-production-RAG" experiment — a way of sewing together the moving parts of how ChatGPT-style apps read documents and answer questions intelligently. Now it's a full-stack system that can take in PDFs, index them in a vector database, and enable you to chat with them directly in your browser.
| Layer | Tools / Frameworks | Description |
|---|---|---|
| Core AI | LlamaIndex, OpenAI Embeddings, Qdrant | Text chunking, embedding, and semantic search |
| ️ Backend | FastAPI | REST API for upload + query |
| Frontend | Streamlit | Clean UI for drag-and-drop & chat |
| Infra | Docker | Containerized local Qdrant DB |
| Langs | Python 3.10+ | Written cleanly in modular src/ layout |
WiiDcos/
├── Data/
│ └── uploads/ # uploaded PDFs live here
├── docker/
│ └── docker-compose.yml # Qdrant container setup
├── src/
│ ├── api/ # FastAPI backend
│ │ ├── main.py
│ │ └── routes/
│ │ ├── upload.py
│ │ └── query.py
│ ├── core/ # core configs & logic
│ │ ├── config.py
│ │ ├── qdrant_store.py
│ │ └── retriever.py
│ ├── workers/ # ingestion jobs
│ │ └── ingestion.py
│ └── ui/ # streamlit front-end
│ └── app.py
├── .env # store your API keys here
├── requirements.txt
└── README.md
1️ PDF Upload → Ingestion
- Upload a PDF through the Streamlit UI or call the backend
/uploadroute. - Document is stored in
Data/uploads/. - LlamaIndex reads and chunks the document.
- OpenAI Embeddings (or any embedding model) converts chunks to vector embeddings.
- Qdrant stores the vectors for semantic retrieval efficiently.
2️ Query → Retrieval + Answer
- User provides a query in the chat interface.
- Top-k most similar chunks are retrieved from Qdrant by the system based on semantic similarity.
- Retrieved chunks are passed as context to an LLM (OpenAI or other) with low temperature: 0.1–0.2.
- The LLM generates a grounded, relevant answer that quotes or utilizes the retrieved chunks.
3️ Local-first architecture
- Qdrant is executed locally in Docker.
- PDFs and indexes are stored locally to disk.
- Only embedding calls and LLM calls get sent out to external APIs (unless you flip to fully local models).
These commands are tested for local development. Adjust the commands for macOS / Linux if necessary.
git clone https://github.com/<your-username>/WiiDcos.git
cd WiiDcosconda create -n willdocs python=3.10 -y
conda activate willdocspip install -r req.txtCreate a .env file in project root and insert your API keys (example):
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
# OPTIONAL:
# QDRANT_URL=http://localhost:6333
# QDRANT_API_KEY=your_qdrant_key_if_anycd docker
docker compose up -dOpen the Qdrant dashboard at: http://localhost:6333/dashboard
From project root:
uvicorn src.api.main:app --reload --port 8000Open the API docs at: http://127.0.0.1:8000/docs
Routes you should expect:
POST /upload— Accepts PDF uploads and triggers ingestionGET /query— Query the indexed documents (orPOST /querydepending on implementation)
streamlit run src/ui/app.pyOpen the UI at: http://localhost:8501
Typical flow:
- Drag & drop a PDF
- Wait for " Uploaded & indexed"
- Ask: "What is this PDF about?"
- Receive a short, sourced answer
Query:
What is this PDF about?
Answer (example):
The PDF is regarding some research papers and studies on speech separation using deep learning, transformer, RNN, and LSTM models.
- Chat History — Store chat sessions for greater continuity in the UI.
- Streaming Responses — Offer streaming token-by-token output for a ChatGPT-like experience.
- Multi-PDF Support — Index multiple PDFs and enable users to choose which document(s) to query.
- Offline Embeddings — Replace OpenAI embeddings with offline sentence-transformers to reduce API usage.
- Full Docker Compose — Add FastAPI + Streamlit services in
docker-composefor one-command runs. - Model Flexibility — Add a config toggle to alternate between OpenAI and local LLMs (e.g., Llama 2 family).
- Qdrant unreachable: Ensure Docker container running and ports open (default
6333). - Embedding errors: Ensure your
OPENAI_API_KEYis valid and has quota. Use local embeddings if rate-limited. - Slow indexing: Chunk size and overlap settings control speed vs. retrieval quality. Modify in
src/core/retriever.pyor LlamaIndex settings.
- For quick sharing, you can deploy the Streamlit app to [Streamlit Cloud] or the backend to services like Render or Railway.
- For production, remember: managed vector DB (Qdrant Cloud), secrets management, HTTPS, authentication, and rate-limiting.
Starting requirements.txt snippet. Pin versions to what you've locally tested.
fastapi>=0.95
uvicorn[standard]>=0.22
streamlit>=1.25
qdrant-client>=1.2
llama-index>=0.8
openai>=0.27
python-dotenv>=1.0
pydantic>=1.10
pandas>=1.5
tqdm>=4.64
pdfminer.six>=20221105
sentence-transformers>=2.2
Bump package versions for compatibility with your environment.
WiiDcos RAG = Upload PDFs → Store in Qdrant → Query via LlamaIndex → Answer with OpenAI → Chat via Streamlit.
A small but end-to-end production-ready RAG stack, built with logs, tests, and plenty of debugging.
Author: Vishal Fulsundar
M.S. Data Science — The George Washington University
LinkedIn