Spring AI project 🚀 with Ollama and Retrieval-augmented generation (RAG)

Note

Ollama is an open-source platform that enables users to run LLaMA and other LLMs directly on their local machines.

Llama(Large Language Model Meta AI ) is the LLM, It is a family of open-source large language models developed by Meta.

Objective:

Setup and Ollama server and ollama CLI (locally/offline).
Download and run llama3.2 model (3B parameter, 2.0GB size).
Interact with LLM though Chat GUI applicaiton, CLI and through API.
Build a Spring AI project that talks to Ollama server and uses our llama3.2 model to respond to our questions.
Impliment RAG using in-memory VectorStore(SIMPLE).
Feed articles and document into VectorStore using llama3.2 Embedding model.
Expose 5 endpoints:
- GET /chat/{ID}/ask?token=<QUESTION> - To ask a question without using RAG.
- GET /chat/{ID}/askUsingRAG?token=<QUESTION> - To ask a question using RAG.
- GET /chat/{ID}/history - Look at last 5 chat history.
- POST /rag/load?fromAbsolutePath=<PATH> - To replenish RAG with fresh new up-to-date information.
- POST /rag/offloadToFile - To offload all the data in VectorStore into a JSON file and look at embeddings.
Add In-Memory Chat Memory Repository to maintain Chat History.

There is no better place to learn from then the direct source i.e. https://docs.spring.io/spring-ai/reference/index.html

Download and install Ollama.dmg from https://ollama.com/download (ollama CLI should be installed along with the Ollama chat GUI client and Ollama )
Choose model that you would like to work with from https://ollama.com/library (For the sake of this demo and the limitation of laptop(MacBook Air Apple M2 8GB) I'm choosing Llama 3.2(2.0GB size, 3billion parameters)
Open terminal and run

    ollama run llama3.2

You are all set 👍 by default Ollama server runs on port "11434" but if you would like to change you can do so using the below command

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

Open Ollama settings and put Ollama on AirPlane mode - This will make sure our data stays local by disabling Turbo mode and web search.

As always we start by open Spring Initializr portal
Choose the following dependencies
- Lombok
- Spring-WebFlux or Spring-Web
- Spring-ai-pdf-document-reader
- Spring-ai-starter-model-ollama
- Spring-ai-tika-document-reader
- Spring-ai-starter-model-chat-memory (If time permits)
Download/Generate project, unzip and follow through with me
Add implementation 'org.springframework.ai:spring-ai-vector-store' dependency manually to enable use of SimpleVectorStore

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ai-chat-app		ai-chat-app
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
BUILDUI.md		BUILDUI.md
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle