Turn your code into any language with our Code Converter. It's the ultimate tool for multi-language programming. Start converting now!
This tutorial will guide you step-by-step through building a full-stack Retrieval-Augmented Generation (RAG) chatbot using FastAPI, OpenAI's language model, and Streamlit. By the end, you will have a working chatbot that can answer questions based on the content of uploaded PDF documents.
Table of Contents:
Retrieval-Augmented Generation (RAG) is a powerful approach that combines information retrieval with generative AI models. In this project, we will build a chatbot that can answer user questions based on the content of uploaded PDF documents. The system uses:
FastAPI
for building a RESTful API backend.LangChain
for chaining together retrieval and generation logic.OpenAI
for language model and embeddings.Chroma
as a local vector database for storing and searching document embeddings.Streamlit
for a simple, interactive web UI.Your project should have the following structure:
chatbot-rag/
├── data/ # Directory to hold the local vector database
├── api.py # FastAPI server
├── app.py # Streamlit web application
├── chatbot.py # Core chatbot logic
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── .env # Environment variables (e.g., API keys)
A virtual environment isolates your project dependencies. You can use venv
, conda
, or uv
:
Using venv
:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Using conda
:
conda create -n chatbot-rag python=3.11
conda activate chatbot-rag
Using uv
:
uv init
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Navigate to your project directory and install dependencies:
With pip
:
pip install -r requirements.txt
With conda
:
conda install --file requirements.txt
With uv
:
uv add -r requirements.txt
uv sync
To keep sensitive information like API keys secure, store them in a .env
file. This file should not be committed to version control.
Create a .env
file in your project root and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key
This file contains the core logic for document storage, retrieval, and question answering.
We use dotenv
for loading environment variables, langchain
for chaining logic, and logging
for monitoring.
import os
from dotenv import load_dotenv, find_dotenv
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_core.documents.base import Document
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers import PyPDFParser
import logging
Logging helps you monitor your application and debug issues.
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
We load the API key from .env
and initialize the embedding and LLM objects.
load_dotenv(find_dotenv())
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
logger.error("OPENAI_API_KEY is not set")
raise ValueError("OPENAI_API_KEY is not set")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large", api_key=OPENAI_API_KEY)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=OPENAI_API_KEY)
A RAG system needs a vector database to store document embeddings for efficient similarity search. Chroma
is a simple, local vector database.
chroma = Chroma(
collection_name="documents",
collection_metadata={"name": "documents", "description": "store documents"},
persist_directory="./data",
embedding_function=embeddings,
)
retriever = chroma.as_retriever(search_kwargs={"k": 2}) # Retrieve top 2 relevant docs
The prompt guides the LLM to answer based on the retrieved context.
TEMPLATE = """
Here is the context:
<context>
{context}
</context>
And here is the question that must be answered using that context:
<question>
{input}
</question>
Please read through the provided context carefully. Then, analyze the question and attempt to find a
direct answer to the question within the context.
If you are able to find a direct answer, provide it and elaborate on relevant points from the
context using bullet points "-".
If you cannot find a direct answer based on the provided context, outline the most relevant points
that give hints to the answer of the question.
If no answer or relevant points can be found, or the question is not related to the context, simply
state the following sentence without any additional text:
i couldnt find an answer did not find an answer to your question.
Output your response in plain text without using the tags <answer> and </answer> and ensure you are not
quoting context text in your response since it must not be part of the answer.
"""
PROMPT = ChatPromptTemplate.from_template(TEMPLATE)
These chains connect the retriever and the LLM, so that relevant documents are injected into the prompt before generating an answer.
llm_chain = create_stuff_documents_chain(llm, PROMPT)
retrieval_chain = create_retrieval_chain(retriever, llm_chain)
def store_document(documents: list[Document]) -> str:
chroma.add_documents(documents=documents)
return "document stored successfully"
parser = PyPDFParser()
def parse_pdf(file_content: bytes) -> list[Document]:
blob = Blob(data=file_content)
return [doc for doc in parser.lazy_parse(blob)]
def retrieve_document(query: str) -> list[Document]:
return retriever.invoke(input=query)
def ask_question(query: str) -> str:
response = retrieval_chain.invoke({"input": query})
return response["answer"]
FastAPI
provides a modern, fast web framework for building APIs.
from fastapi import FastAPI, UploadFile
from chatbot import retrieve_document, store_document, parse_pdf, ask_question
from pydantic import BaseModel
from typing import List
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(
title="Chatbot RAG",
description="A simple chatbot using OpenAI. Enables asking questions and getting answers based on uploaded documents.",
version="0.1",
)
Pydantic
models ensure that API requests and responses have the correct structure and types.
class DocumentResponse(BaseModel):
documents: List
total: int
query: str
error: str = None
class DocumentUploadResponse(BaseModel):
documents: List
total: int
status: str
error: str = None
class AskResponse(BaseModel):
query: str
answer: str
error: str = None
@app.get("/")
def read_root():
return {
"service": "RAG Chatbot using OPENAI",
"description": "Welcome to Chatbot RAG API",
"status": "running",
}
@app.get("/documents/{query}")
def search_documents(query: str) -> DocumentResponse:
try:
documents = retrieve_document(query)
return {"documents": documents, "total": len(documents), "query": query}
except Exception as e:
logger.error(f"Error searching documents: {e}", exc_info=True)
return {"error": str(e), "documents": [], "total": 0, "query": query}
@app.post("/documents")
async def upload_documents(files: List[UploadFile]) -> DocumentUploadResponse:
try:
documents = []
for file in files:
if file.content_type != "application/pdf":
logger.error(f"Unsupported file type: {file.content_type}")
raise ValueError("Only PDF files are supported")
content = await file.read()
parsed_docs = parse_pdf(content)
documents.extend(parsed_docs)
status = store_document(documents)
return {"documents": documents, "total": len(documents), "status": status}
except Exception as e:
logger.error(f"Error uploading documents: {e}", exc_info=True)
return {"error": str(e), "status": "failed", "documents": [], "total": 0}
@app.get("/ask")
def ask(query: str) -> AskResponse:
try:
answer = ask_question(query)
return {"query": query, "answer": answer}
except Exception as e:
logger.error(f"Error asking question: {e}", exc_info=True)
return {"error": str(e), "query": query, "answer": ""}
Streamlit
provides a simple way to build interactive web apps for your Python projects.
import streamlit as st
import requests
This function sends a question to the FastAPI
backend and returns the answer.
def ask(query: str) -> str:
with st.spinner("Asking the chatbot..."):
response = requests.get(f"{API_URL}/ask?query={query}")
if response.status_code == 200:
data = response.json()
return data["answer"]
else:
return "I couldn't find an answer to your question."
API_URL = "http://localhost:8000" # Change if deploying elsewhere
st.set_page_config(page_title="Chatbot", page_icon="🤖")
st.title("Chatbot RAG")
Allow users to upload multiple PDF files, which are sent to the backend for parsing and storage.
uploaded_files = st.file_uploader(
"Upload your PDF documents", type="pdf", accept_multiple_files=True
)
if uploaded_files:
files = [
("files", (file.name, file.getvalue(), "application/pdf"))
for file in uploaded_files
]
try:
with st.spinner("Uploading files..."):
response = requests.post(f"{API_URL}/documents/", files=files)
if response.status_code == 200:
st.success("Files uploaded successfully")
uploaded_files = None
else:
st.error("Failed to upload files")
except Exception as e:
st.error(f"Error uploading files: {e}")
Provide a chat-like interface for users to interact with the chatbot.
with st.chat_message(name="ai", avatar="ai"):
st.write("Hello! I'm the Chatbot RAG. How can I help you today?")
query = st.chat_input(placeholder="Type your question here...")
if query:
with st.chat_message("user"):
st.write(query)
answer = ask(query)
with st.chat_message("ai"):
st.write(answer)
fastapi dev api.py #if you want to run for prodcution run fastapi run api.py
streamlit run app.py
http://127.0.0.1:8000
http://localhost:8501
by defaultTo help you understand how to use the RAG chatbot, this section provides a step-by-step walkthrough with example screenshots from the chatbot-rag/images folder.
Start by uploading one or more PDF files that the chatbot will use to answer your questions. On the Streamlit
web interface, click the Upload your PDF docs button and select your files.
Once uploaded, you should see a confirmation message indicating that your files were uploaded successfully.
After uploading your documents, you can interact with the chatbot using the chat input at the bottom of the page. Type your question related to the content of your uploaded PDFs and press Enter.
The chatbot will process your question, retrieve relevant information from your documents, and display an answer in the chat window.
Congratulations! You have built a full-stack Retrieval-Augmented Generation (RAG) chatbot using FastAPI
, OpenAI
, and Streamlit
. You can now upload PDF documents and interact with the chatbot to get answers based on the content of those documents.
This project demonstrates how to combine modern Python tools to create a practical, educational AI application. You can extend this project by adding authentication, deploying to the cloud, or supporting more document types.
Liked what you read? You'll love what you can learn from our AI-powered Code Explainer. Check it out!
View Full Code Auto-Generate My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!