0% found this document useful (0 votes)

22 views

dev.to

Uploaded by

9m8cr5k72j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

dev.to

Uploaded by

9m8cr5k72j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Nasser Maronie

Posted on Jul 1, 2024

184

Build Your Own RAG App: A Step-by-Step

Guide to Setup LLM locally using Ollama,
Python, and ChromaDB
#ollama #llm #python #rag

In an era where data privacy is paramount, setting up your own local language
model (LLM) provides a crucial solution for companies and individuals alike. This
tutorial is designed to guide you through the process of creating a custom chatbot
using Ollama, Python 3, and ChromaDB, all hosted locally on your system. Here
are the key reasons why you need this tutorial:

Full Customization: Hosting your own Retrieval-Augmented Generation (RAG)

application locally means you have complete control over the setup and
customization. You can fine-tune the model to fit your specific needs without
relying on external services.
Enhanced Privacy: By setting up your LLM model locally, you avoid the risks
associated with sending sensitive data over the internet. This is especially
important for companies that handle confidential information. Training your
model with private data locally ensures that your data stays within your control.
Data Security: Using third-party LLM models can expose your data to potential
breaches and misuse. Local deployment mitigates these risks by keeping your
training data, such as PDF documents, within your secure environment.
Control Over Data Processing: When you host your own LLM, you have the
ability to manage and process your data exactly how you want. This includes
embedding your private data into your ChromaDB vector store, ensuring that
your data processing meets your standards and requirements.
Independence from Internet Connectivity: Running your chatbot locally means
you are not dependent on an internet connection. This guarantees
uninterrupted service and access to your chatbot, even in offline scenarios.

This tutorial will empower you to build a robust and secure local chatbot, tailored
to your needs, without compromising on privacy or control.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is an advanced technique that combines
the strengths of information retrieval and text generation to create more accurate
and contextually relevant responses. Here's a breakdown of how RAG works and
why it's beneficial:

What is RAG?
RAG is a hybrid model that enhances the capabilities of language models by
incorporating an external knowledge base or document store. The process
involves two main components:

Retrieval: In this phase, the model retrieves relevant documents or pieces of

information from an external source, such as a database or a vector store,
based on the input query.
Generation: The retrieved information is then used by a generative language
model to produce a coherent and contextually appropriate response.

How Does RAG Work?

Query Input: The user inputs a query or question.
Document Retrieval: The system uses the query to search an external
knowledge base, retrieving the most relevant documents or snippets of
information.
Response Generation: The generative model processes the retrieved
information, integrating it with its own knowledge to generate a detailed and
accurate response.
Output: The final response, enriched with specific and relevant details from the
knowledge base, is presented to the user.

Benefits of RAG
Enhanced Accuracy: By leveraging external data, RAG models can provide
more precise and detailed answers, especially for domain-specific queries.
Contextual Relevance: The retrieval component ensures that the generated
response is grounded in relevant and up-to-date information, improving the
overall quality of the response.
Scalability: RAG systems can be easily scaled to incorporate vast amounts of
data, enabling them to handle a wide range of queries and topics.
Flexibility: These models can be adapted to various domains by simply
updating or expanding the external knowledge base, making them highly
versatile.

Why Use RAG Locally?

Privacy and Security: Running a RAG model locally ensures that sensitive data
remains secure and private, as it does not need to be sent to external servers.
Customization: You can tailor the retrieval and generation processes to suit
your specific needs, including integrating proprietary data sources.
Independence: A local setup ensures that your system remains operational
even without internet connectivity, providing consistent and reliable service.

By setting up a local RAG application with tools like Ollama, Python, and
ChromaDB, you can enjoy the benefits of advanced language models while
maintaining control over your data and customization options.

GPU
Running large language models (LLMs) like the ones used in Retrieval-Augmented
Generation (RAG) requires significant computational power. One of the key
components that enable efficient processing and embedding of data in these
models is the Graphics Processing Unit (GPU). Here's why GPUs are essential for
this task and how they impact the performance of your local LLM setup:

What is a GPU?
A GPU is a specialized processor designed to accelerate the rendering of images
and videos. Unlike Central Processing Units (CPUs), which are optimized for
sequential processing tasks, GPUs excel at parallel processing. This makes them
particularly well-suited for the complex mathematical computations required by
machine learning and deep learning models.

Why GPUs Matter for LLMs

Parallel Processing Power: GPUs can handle thousands of operations
simultaneously, significantly speeding up tasks such as training and inference
in LLMs. This parallelism is crucial for the heavy computational loads
associated with processing large datasets and generating responses in real-
time.
Efficiency in Handling Large Models: LLMs like those used in RAG require
substantial memory and computational resources. GPUs are equipped with
high-bandwidth memory (HBM) and multiple cores, making them capable of
managing the large-scale matrix multiplications and tensor operations needed
by these models.
Faster Data Embedding and Retrieval: In a local RAG setup, embedding data
into a vector store like ChromaDB and retrieving relevant documents quickly is
essential for performance. High-performance GPUs can accelerate these
processes, ensuring that your chatbot responds promptly and accurately.
Improved Training Times: Training an LLM involves adjusting millions (or even
billions) of parameters. GPUs can drastically reduce the time required for this
training phase compared to CPUs, enabling more frequent updates and
refinements to your model.

Choosing the Right GPU

When setting up a local LLM, the choice of GPU can significantly impact
performance. Here are some factors to consider:

Memory Capacity: Larger models require more GPU memory. Look for GPUs
with higher VRAM (video RAM) to accommodate extensive datasets and model
parameters.
Compute Capability: The more CUDA cores a GPU has, the better it can handle
parallel processing tasks. GPUs with higher compute capabilities are more
efficient for deep learning tasks.
Bandwidth: Higher memory bandwidth allows for faster data transfer between
the GPU and its memory, improving overall processing speed.

Examples of High-Performance GPUs for LLMs

NVIDIA RTX 3090: Known for its high VRAM (24 GB) and powerful CUDA cores,
it's a popular choice for deep learning tasks.
NVIDIA A100: Designed specifically for AI and machine learning, it offers
exceptional performance with large memory capacity and high compute power.
AMD Radeon Pro VII: Another strong contender, with high memory bandwidth
and efficient processing capabilities.

Investing in a high-performance GPU is crucial for running LLM models locally. It

ensures faster data processing, efficient model training, and quick response
generation, making your local RAG application more robust and reliable. By
leveraging the power of GPUs, you can fully realize the benefits of hosting your
own custom chatbot, tailored to your specific needs and data privacy
requirements.

Prerequisites
Before diving into the setup, ensure you have the following prerequisites in place:

Python 3: Python is a versatile programming language that you'll use to write

the code for your RAG app.
ChromaDB: A vector database that will store and manage the embeddings of
our data.
Ollama: To download and serve custom LLMs in our local machine.

Step 1: Install Python 3 and setup your environment

To install and setup our Python 3 environment, follow these steps:
Download and setup Python 3 on your machine.
Then make sure your Python 3 installed and run successfully:

$ python3 --version
# Python 3.11.7

Create a folder for your project, for example, local-rag :

$ mkdir local-rag
$ cd local-rag

Create a virtual environment named venv :

$ python3 -m venv venv

Activate the virtual environment:

$ source venv/bin/activate
# Windows
# venv\Scripts\activate

Step 2: Install ChromaDB and other dependencies

Install ChromaDB using pip:
$ pip install --q chromadb

Install Langchain tools to work seamlessly with your model:

$ pip install --q unstructured langchain langchain-text-splitters

$ pip install --q "unstructured[all-docs]"

Install Flask to serve your app as a HTTP service:

$ pip install --q flask

Step 3: Install Ollama

To install Ollama, follow these steps:
Head to Ollama download page, and download the installer for your operating
system.
Verify your Ollama installation by running:

$ ollama --version
# ollama version is 0.1.47

Pull the LLM model you need. For example, to use the Mistral model:
$ ollama pull mistral

Pull the text embedding model. For instance, to use the Nomic Embed Text model:

$ ollama pull nomic-embed-text

Then run your Ollama models:

$ ollama serve

Build the RAG app

Now that you've set up your environment with Python, Ollama, ChromaDB and
other dependencies, it's time to build your custom local RAG app. In this section,
we'll walk through the hands-on Python code and provide an overview of how to
structure your application.

app.py

This is the main Flask application file. It defines routes for embedding files to the
vector database, and retrieving the response from the model.
import os
from dotenv import load_dotenv

load_dotenv()

from flask import Flask, request, jsonify

from embed import embed
from query import query
from get_vector_db import get_vector_db

TEMP_FOLDER = os.getenv('TEMP_FOLDER', './_temp')

os.makedirs(TEMP_FOLDER, exist_ok=True)

app = Flask(__name__)

@app.route('/embed', methods=['POST'])
def route_embed():
if 'file' not in request.files:
return jsonify({"error": "No file part"}), 400

file = request.files['file']

if file.filename == '':
return jsonify({"error": "No selected file"}), 400

embedded = embed(file)

if embedded:
return jsonify({"message": "File embedded successfully"}), 200

return jsonify({"error": "File embedded unsuccessfully"}), 400

@app.route('/query', methods=['POST'])
def route_query():
data = request.get_json()
response = query(data.get('query'))

if response:
return jsonify({"message": response}), 200

return jsonify({"error": "Something went wrong"}), 400

if __name__ == '__main__':
app.run(host="0.0.0.0", port=8080, debug=True)

embed.py

This module handles the embedding process, including saving uploaded files,
loading and splitting data, and adding documents to the vector database.

import os
from datetime import datetime
from werkzeug.utils import secure_filename
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from get_vector_db import get_vector_db

TEMP_FOLDER = os.getenv('TEMP_FOLDER', './_temp')

# Function to check if the uploaded file is allowed (only PDF files)

def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in {'pdf'}

# Function to save the uploaded file to the temporary folder

def save_file(file):
# Save the uploaded file with a secure filename and return the file path
ct = datetime.now()
ts = ct.timestamp()
filename = str(ts) + "_" + secure_filename(file.filename)
file_path = os.path.join(TEMP_FOLDER, filename)
file.save(file_path)

return file_path

# Function to load and split the data from the PDF file
def load_and_split_data(file_path):
# Load the PDF file and split the data into chunks
loader = UnstructuredPDFLoader(file_path=file_path)
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=10
chunks = text_splitter.split_documents(data)

return chunks

# Main function to handle the embedding process

def embed(file):
# Check if the file is valid, save it, load and split the data, add to the datab
if file.filename != '' and file and allowed_file(file.filename):
file_path = save_file(file)
chunks = load_and_split_data(file_path)
db = get_vector_db()
db.add_documents(chunks)
db.persist()
os.remove(file_path)

return True

return False

query.py

This module processes user queries by generating multiple versions of the query,
retrieving relevant documents, and providing answers based on the context.

import os
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever
from get_vector_db import get_vector_db

LLM_MODEL = os.getenv('LLM_MODEL', 'mistral')

# Function to get the prompt templates for generating alternative questions and answ
def get_prompt():
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is to generate
different versions of the given user question to retrieve relevant documents
a vector database. By generating multiple perspectives on the user question,
goal is to help the user overcome some of the limitations of the distance-ba
similarity search. Provide these alternative questions separated by newlines
Original question: {question}""",
)

template = """Answer the question based ONLY on the following context:

{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

return QUERY_PROMPT, prompt

# Main function to handle the query process

def query(input):
if input:
# Initialize the language model with the specified model name
llm = ChatOllama(model=LLM_MODEL)
# Get the vector database instance
db = get_vector_db()
# Get the prompt templates
QUERY_PROMPT, prompt = get_prompt()

# Set up the retriever to generate multiple queries using the language model
retriever = MultiQueryRetriever.from_llm(
db.as_retriever(),
llm,
prompt=QUERY_PROMPT
)

# Define the processing chain to retrieve context, generate the answer, and
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

response = chain.invoke(input)

return response

return None

get_vector_db.py

This module initializes and returns the vector database instance used for storing
and retrieving document embeddings.

import os
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores.chroma import Chroma

CHROMA_PATH = os.getenv('CHROMA_PATH', 'chroma')

COLLECTION_NAME = os.getenv('COLLECTION_NAME', 'local-rag')
TEXT_EMBEDDING_MODEL = os.getenv('TEXT_EMBEDDING_MODEL', 'nomic-embed-text')

def get_vector_db():
embedding = OllamaEmbeddings(model=TEXT_EMBEDDING_MODEL,show_progress=True)

db = Chroma(
collection_name=COLLECTION_NAME,
persist_directory=CHROMA_PATH,
embedding_function=embedding
)

return db
Run your app!
Create .env file to store your environment variables:

TEMP_FOLDER = './_temp'
CHROMA_PATH = 'chroma'
COLLECTION_NAME = 'local-rag'
LLM_MODEL = 'mistral'
TEXT_EMBEDDING_MODEL = 'nomic-embed-text'

Run the app.py file to start your app server:

$ python3 app.py

Once the server is running, you can start making requests to the following
endpoints:

Example command to embed a PDF file (e.g., resume.pdf): ```bash

$ curl --request POST \

--url http://localhost:8080/embed \
--header 'Content-Type: multipart/form-data' \
--form file=@/Users/nassermaronie/Documents/Nasser-resume.pdf

Response
{
"message": "File embedded successfully"
}

- Example command to ask a question to your model:

```bash

$ curl --request POST \

--url http://localhost:8080/query \
--header 'Content-Type: application/json' \
--data '{ "query": "Who is Nasser?" }'

# Response
{
"message": "Nasser Maronie is a Full Stack Developer with experience in web and mo
}

Conclusion
By following these instructions, you can effectively run and interact with your
custom local RAG app using Python, Ollama, and ChromaDB, tailored to your
needs. Adjust and expand the functionality as necessary to enhance the
capabilities of your application.

By harnessing the capabilities of local deployment, you not only safeguard

sensitive information but also optimize performance and responsiveness. Whether
you're enhancing customer interactions or streamlining internal processes, a
locally deployed RAG application offers flexibility and robustness to adapt and
grow with your requirements.

Check the source code in this repo:

https://github.com/firstpersoncode/local-rag

Happy coding!
Top comments (1)

KRISHNANUNNI RAYIRAMKANDATH • Jan 2

Perfect. The only issue I encountered was related to lang chain. The below
command ws required as part of step 2.

pip install langchain-community langchain-core or pip install --upgrade langchain

Code of Conduct • Report abuse

Pieces.app PROMOTED

A Workflow Copilot. Tailored to You.

Our desktop app, with its intelligent copilot, streamlines coding by generating
snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

Nasser Maronie

As a professional in the tech industry, I enjoy delving into complex problems and sharing
solutions that help others on their coding journey.

LOCATION
Indonesia
EDUCATION
University of AMIKOM Yogyakarta
WORK
Fullstack Engineer
JOINED
Jun 20, 2024

More from Nasser Maronie

Building a Web Page Summarization App with Next.js, OpenAI, LangChain, and Supabase
#llm #langchain #openai #supabase

From Zero to Chatbot: How Large Language Models (LLMs) Work and How to Harness
Them Easily
#llm #node #openai #promptengineering

Building a Custom Chatbot with Next.js, Langchain, OpenAI, and Supabase.

#llm #langchain #openai #nextjs
Neon PROMOTED

Learn about the top 3 features in the latest version of Postgres.

See Article

Programming Backend with Go
From Everand
Programming Backend with Go
Julian Braun
No ratings yet
Learning AWS
From Everand
Learning AWS
Aurobindo Sarkar
4/5 (4)
C++ Networking 101
From Everand
C++ Networking 101
Anais Sutherland
No ratings yet
Rust for Network Programming and Automation
From Everand
Rust for Network Programming and Automation
Brian Anderson
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Ix It Practical File
100% (3)
Ix It Practical File
19 pages
PHP Microservices
From Everand
PHP Microservices
Carlos Pérez Sánchez
3/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Instant Parallel Processing with Gearman
From Everand
Instant Parallel Processing with Gearman
John Ewart
No ratings yet
A practical guide to making your AI chatbot smarter with RAG • The Register
No ratings yet
A practical guide to making your AI chatbot smarter with RAG • The Register
17 pages
www.anyscale.com
No ratings yet
www.anyscale.com
78 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Local Multi-Agent RAG Superbot Using GraphRAG, AutoGen, Ollama, And Chainlit. _ by Karthik Rajan _ AI Advances
No ratings yet
Local Multi-Agent RAG Superbot Using GraphRAG, AutoGen, Ollama, And Chainlit. _ by Karthik Rajan _ AI Advances
23 pages
Information Technology HandBook
From Everand
Information Technology HandBook
Duong Tran
3/5 (1)
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Cloud Infrastructure and Data Center
From Everand
Cloud Infrastructure and Data Center
Duong Tran
No ratings yet
Intelligence at the Edge: Using SAS with the Internet of Things
From Everand
Intelligence at the Edge: Using SAS with the Internet of Things
CSPtrade2
No ratings yet
Learning Nagios - Third Edition
From Everand
Learning Nagios - Third Edition
Wojciech Kocjan
No ratings yet
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
From Everand
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
Poonam Devi
No ratings yet
Hyper-V 2016 Best Practices
From Everand
Hyper-V 2016 Best Practices
Benedict Berger
No ratings yet
Mastering System Programming with C: Files, Processes, and IPC
From Everand
Mastering System Programming with C: Files, Processes, and IPC
Larry Jones
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Mastering C: Advanced Techniques and Best Practices
From Everand
Mastering C: Advanced Techniques and Best Practices
Adam Jones
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Rag - LLM
No ratings yet
Rag - LLM
16 pages
building RAG apps
No ratings yet
building RAG apps
32 pages
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
From Everand
Linux Essentials for Hackers & Pentesters: Kali Linux Basics for Wireless Hacking, Penetration Testing, VPNs, Proxy Servers and Networking Commands
Linux Advocate Team
No ratings yet
Linux Essentials for Hackers & Pentesters
From Everand
Linux Essentials for Hackers & Pentesters
Linux Advocate Team
No ratings yet
IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
From Everand
IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
Anant Jhingran
5/5 (1)
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Mastering the Art of Cloud Computing with Google Cloud Platform: Unraveling the Secrets of Experts
From Everand
Mastering the Art of Cloud Computing with Google Cloud Platform: Unraveling the Secrets of Experts
Steve Jones
No ratings yet
C++ Networking 101: Unlocking Sockets, Protocols, VPNs, and Asynchronous I/O with 75+ sample programs
From Everand
C++ Networking 101: Unlocking Sockets, Protocols, VPNs, and Asynchronous I/O with 75+ sample programs
Anais Sutherland
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Rust for Network Programming and Automation: Learn to Design and Automate Networks, Performance Optimization, and Packet Analysis with low-level Rust
From Everand
Rust for Network Programming and Automation: Learn to Design and Automate Networks, Performance Optimization, and Packet Analysis with low-level Rust
Brian Anderson
No ratings yet
Programming Backend with Go: Build robust and scalable backends for your applications using the efficient and powerful tools of the Go ecosystem
From Everand
Programming Backend with Go: Build robust and scalable backends for your applications using the efficient and powerful tools of the Go ecosystem
Julian Braun
No ratings yet
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
ASP.NET Core 1.0 High Performance
From Everand
ASP.NET Core 1.0 High Performance
James Singleton
No ratings yet
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
Data Analytics with Google Cloud Platform: Build Real Time Data Analytics on Google Cloud Platform
From Everand
Data Analytics with Google Cloud Platform: Build Real Time Data Analytics on Google Cloud Platform
Murari Ramuka
No ratings yet
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
SAP Basis Configuration Frequently Asked Questions
From Everand
SAP Basis Configuration Frequently Asked Questions
Equity Press
3.5/5 (4)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Building Scalable Systems with C: Optimizing Performance and Portability
From Everand
Building Scalable Systems with C: Optimizing Performance and Portability
Larry Jones
No ratings yet
DeepSeek vs. ChatGPT – Why DeepSeek is the Superior AI.
From Everand
DeepSeek vs. ChatGPT – Why DeepSeek is the Superior AI.
Gary Thatcher
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Building Telephony Systems with OpenSER
From Everand
Building Telephony Systems with OpenSER
Goncalves Flavio E.
No ratings yet
Six Minute Guide to IPv6
From Everand
Six Minute Guide to IPv6
Daryl Moon
5/5 (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
100% (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
39 pages
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
From Everand
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
Aman Dhingra
No ratings yet
Zig for Systems Programmers: Simplicity, Safety, and Maintainability in Low-Level Development
From Everand
Zig for Systems Programmers: Simplicity, Safety, and Maintainability in Low-Level Development
Robert Johnson
No ratings yet
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
AWS Certified Solutions Architect #1 Audio Crash Course Guide To Master Exams, Practice Test Questions, Cloud Practitioner and Security
From Everand
AWS Certified Solutions Architect #1 Audio Crash Course Guide To Master Exams, Practice Test Questions, Cloud Practitioner and Security
Jamie Murphy
No ratings yet
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
OpenCL Programming by Example
From Everand
OpenCL Programming by Example
Ravishekhar Banger
No ratings yet
The FPGA Programming Handbook: An essential guide to FPGA design for transforming ideas into hardware using SystemVerilog and VHDL
From Everand
The FPGA Programming Handbook: An essential guide to FPGA design for transforming ideas into hardware using SystemVerilog and VHDL
Frank Bruno
No ratings yet
005.13 M319 OpenSSL Functions
No ratings yet
005.13 M319 OpenSSL Functions
573 pages
Blood Bank Management System: Guided by
No ratings yet
Blood Bank Management System: Guided by
27 pages
QS AbsoluteQ v6.3.0 Release Notes
No ratings yet
QS AbsoluteQ v6.3.0 Release Notes
7 pages
Kematian Stealer forked from PowerShell Token Grabber
No ratings yet
Kematian Stealer forked from PowerShell Token Grabber
11 pages
Create A Composable Commerce Site With Contentful and Medusa - Contentful
No ratings yet
Create A Composable Commerce Site With Contentful and Medusa - Contentful
21 pages
Release Notes: Analyzer Update April 2020
No ratings yet
Release Notes: Analyzer Update April 2020
25 pages
Final Year Project (The Auction)
No ratings yet
Final Year Project (The Auction)
8 pages
22PLC15A
No ratings yet
22PLC15A
3 pages
Primavera Risk Analysis For Primavera and Contractor
100% (1)
Primavera Risk Analysis For Primavera and Contractor
48 pages
FInal
No ratings yet
FInal
9 pages
Web Programming Worksheet
No ratings yet
Web Programming Worksheet
8 pages
Software Quick Start Guide PDF
No ratings yet
Software Quick Start Guide PDF
24 pages
Co curricular Question paper Unit 4
No ratings yet
Co curricular Question paper Unit 4
3 pages
Upload A Document - Scribd2
No ratings yet
Upload A Document - Scribd2
2 pages
Anwar Akther Vali 04jun
No ratings yet
Anwar Akther Vali 04jun
4 pages
Timi Boy Report
No ratings yet
Timi Boy Report
25 pages
Windows 11 22H2 To 23H2 Delta
No ratings yet
Windows 11 22H2 To 23H2 Delta
20 pages
satya sai html file (1)
No ratings yet
satya sai html file (1)
40 pages
ABB Zenon Template v.8.10 - User Manual ABB Template
No ratings yet
ABB Zenon Template v.8.10 - User Manual ABB Template
48 pages
Q120
No ratings yet
Q120
4 pages
Novalink P9eig
No ratings yet
Novalink P9eig
62 pages
IP Lab - Final Lab Manual 2024-25
No ratings yet
IP Lab - Final Lab Manual 2024-25
90 pages
Haramaya University
No ratings yet
Haramaya University
29 pages
HTML and JavaScript BASICS 4th Edition Barksdale Test Bankpdf download
100% (4)
HTML and JavaScript BASICS 4th Edition Barksdale Test Bankpdf download
14 pages
Intership Bhuttu
No ratings yet
Intership Bhuttu
15 pages
384-4-474 AhnLab EPP Security Target_1.2
No ratings yet
384-4-474 AhnLab EPP Security Target_1.2
35 pages
Red PPT Template-56-60
No ratings yet
Red PPT Template-56-60
5 pages
CSS EPA Solution
No ratings yet
CSS EPA Solution
87 pages
Node.js Exam
No ratings yet
Node.js Exam
13 pages

dev.to

Uploaded by

dev.to

Uploaded by

Nasser Maronie

Posted on Jul 1, 2024

Build Your Own RAG App: A Step-by-Step

Full Customization: Hosting your own Retrieval-Augmented Generation (RAG)

Retrieval: In this phase, the model retrieves relevant documents or pieces of

How Does RAG Work?

Why Use RAG Locally?

Why GPUs Matter for LLMs

Choosing the Right GPU

Examples of High-Performance GPUs for LLMs

Investing in a high-performance GPU is crucial for running LLM models locally. It

Python 3: Python is a versatile programming language that you'll use to write

Step 1: Install Python 3 and setup your environment

Create a folder for your project, for example, local-rag :

Create a virtual environment named venv :

$ python3 -m venv venv

Activate the virtual environment:

Step 2: Install ChromaDB and other dependencies

Install Langchain tools to work seamlessly with your model:

$ pip install --q unstructured langchain langchain-text-splitters

Install Flask to serve your app as a HTTP service:

$ pip install --q flask

Step 3: Install Ollama

$ ollama pull nomic-embed-text

Then run your Ollama models:

Build the RAG app

from flask import Flask, request, jsonify

TEMP_FOLDER = os.getenv('TEMP_FOLDER', './_temp')

return jsonify({"error": "File embedded unsuccessfully"}), 400

return jsonify({"error": "Something went wrong"}), 400

TEMP_FOLDER = os.getenv('TEMP_FOLDER', './_temp')

# Function to check if the uploaded file is allowed (only PDF files)

# Function to save the uploaded file to the temporary folder

# Main function to handle the embedding process

LLM_MODEL = os.getenv('LLM_MODEL', 'mistral')

template = """Answer the question based ONLY on the following context:

return QUERY_PROMPT, prompt

# Main function to handle the query process

CHROMA_PATH = os.getenv('CHROMA_PATH', 'chroma')

Run the app.py file to start your app server:

Example command to embed a PDF file (e.g., resume.pdf): ```bash

$ curl --request POST \

- Example command to ask a question to your model:

$ curl --request POST \

By harnessing the capabilities of local deployment, you not only safeguard

Check the source code in this repo:

KRISHNANUNNI RAYIRAMKANDATH • Jan 2

pip install langchain-community langchain-core or pip install --upgrade langchain

Code of Conduct • Report abuse

A Workflow Copilot. Tailored to You.

Read the docs

More from Nasser Maronie

Building a Custom Chatbot with Next.js, Langchain, OpenAI, and Supabase.

Top 3 Features in Postgres 17

Learn about the top 3 features in the latest version of Postgres.

You might also like