This deployment is based on the validated pattern framework, utilizing GitOps for seamless provisioning of all operators and applications. It deploys a Retail Recommendation System that leverages machine learning models to provide personalized item suggestions to customers, enhancing store sales by considering their preferences and demographics.
The pattern harnesses Red Hat OpenShift AI to deploy and serve recommendation at scale. It integrates the Feast Feature Store for feature management, EDB Postgres to store user and item embeddings, and a simple user interface (UI) to facilitate customer interactions with the system. Running on Red Hat OpenShift, this demo showcases a scalable, enterprise-ready solution for retail recommendations.
- Podman
- Red Hat Openshift cluster running in AWS. Supported regions are : us-east-1 us-east-2 us-west-1 us-west-2 ca-central-1 sa-east-1 eu-west-1 eu-west-2 eu-west-3 eu-central-1 eu-north-1 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-southeast-1 ap-southeast-2 ap-south-1.
- GPU Node to run Hugging Face Text Generation Inference server on Red Hat OpenShift cluster.
- Create a fork of the git repository.
- Red Hat OpenShift AI: Deploys and serves recommendation at scale.
- Two-Tower Architecture: Utilizes separate neural networks to generate user and item embeddings for personalized recommendations.
- Feast Feature Store: Manages and serves features for training and real-time inference.
- EDB Postgres with PGVector: Stores user and item embeddings, enabling fast similarity searches.
- Simple UI: Allows users to browse recommendations, add items to cart, purchase, or rate products.
- Kafka Integration: Records user interactions for continuous learning and dataset updates.
- Monitoring Dashboard: Provides performance metrics using Prometheus and Grafana.
- GitOps Deployment: Ensures an end-to-end, reproducible setup of the demo.
[Diagram Placeholder: Schematic diagram for workflow of Retail Recommendation System]
The workflow consists of the following steps:
- Data Ingestion
- Data originates from parquet files containing users, items, and interactions.
- Sample datasets are generated and embedded into parquet format.
- Feast scans feature definitions, validates them, and syncs metadata to its registry.
- Feature Definition with Feast
- Defines data sources, features, views, and services.
- Initializes EDB PGVector as the offline store for historical data.
- Training
- Retrieves historical features from the offline store using get_historical_features.
- Trains a Two-Tower model:
- User Tower: Encodes user specifics (e.g., interaction history, demographics).
- Item Tower: Encodes item specifics (e.g., metadata).
- Outputs fixed-length embeddings in a shared vector space.
- Batch Scoring
- Generates embeddings for all users and items using trained encoders.
- Attaches timestamps and pushes embeddings to the online store (EDB PGVector).
- Materialization
- Computes the latest feature values and precomputes top-k recommendations for each user.
- Stores results in the online store for fast retrieval.
- Serving
- Existing Users: Fetches precomputed top items from the online store.
- New Users: Embeds the user in real-time using the user tower model, performs a similarity search against item embeddings in PGVector, and retrieves top-k items.
- User interactions are logged via Kafka. Data Ingestion [Diagram Placeholder: Schematic diagram for ingestion of data into Feast]
Raw data (users, items, interactions) from parquet files is ingested into Feast, stored as feature views in the offline store (EDB PGVector).
Training and Batch Scoring [Diagram Placeholder: Schematic diagram for training and batch scoring]
The Two-Tower model is trained on historical data, and embeddings are generated and stored in PGVector for later use.
Serving Recommendations [Diagram Placeholder: Schematic diagram for serving recommendations]
For existing users, precomputed recommendations are fetched; for new users, embeddings are computed on-the-fly, followed by a similarity search.
Download Diagrams View and download all diagrams in our open-source tooling site:
Open Diagrams
Components Deployed Model Servers for User and Item Towers: Deployed via OpenShift AI to serve the Two-Tower models for real-time embedding generation. Feast Feature Store: Manages feature definitions and serves data for training and inference, using EDB PGVector as the offline store. EDB Postgres with PGVector: Acts as both offline (historical data) and online (real-time embeddings) stores. Embedding Generation Job: A batch job that generates and populates embeddings into the vector database. Recommendation UI: A simple web application for users to interact with recommendations. Kafka: Logs user interactions to enable continuous dataset updates. Prometheus: Collects metrics from the application and model servers. Grafana: Visualizes system performance and recommendation metrics.
To run the demo, ensure the Podman is running on your machine.Fork the rag-llm-gitops repo into your organization
Replace the token and the api server url in the command below to login to the OpenShift cluster.
oc login --token=<token> --server=<api_server_url> # login to Openshift cluster
git clone https://github.com/<<your-username>>/rag-llm-gitops.git
cd rag-llm-gitops
This pattern deploys IBM Granite 3.1-8B-Instruct out of box. Run the following command to configure vault with the model Id.
# Copy values-secret.yaml.template to ~/values-secret-rag-llm-gitops.yaml.
# You should never check-in these files
# Add secrets to the values-secret.yaml that needs to be added to the vault.
cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml
To deploy a model that can requires an Hugging Face token, grab the Hugging Face token and accept the terms and conditions on the model page. Edit ~/values-secret-rag-llm-gitops.yaml to replace the model Id
and the Hugging Face
token.
secrets:
- name: hfmodel
fields:
- name: hftoken
value: null
- name: modelId
value: "ibm-granite/granite-3.1-8b-instruct"
- name: minio
fields:
- name: MINIO_ROOT_USER
value: minio
- name: MINIO_ROOT_PASSWORD
value: null
onMissingValue: generate
As a pre-requisite to deploy the application using the validated pattern, GPU nodes should be provisioned along with Node Feature Discovery Operator and NVIDIA GPU operator. To provision GPU Nodes
Following command will take about 5-10 minutes.
./pattern.sh make create-gpu-machineset
Wait till the nodes are provisioned and running.
Alternatiely, follow the instructions to manually install GPU nodes, Node Feature Discovery Operator and NVIDIA GPU operator.
*Note:: This pattern supports two types of vector databases, EDB Postgres for Kubernetes and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector DB. To deploy Redis, change the global.db.type to REDIS in values-global.yaml.
---
global:
pattern: rag-llm-gitops
options:
useCSV: false
syncPolicy: Automatic
installPlanApproval: Automatic
# Possible value for db.type = [REDIS, EDB]
db:
index: docs
type: EDB # <--- Default is EDB, Change the db type to REDIS for Redis deployment
main:
clusterGroupName: hub
multiSourceConfig:
enabled: true
Following commands will take about 15-20 minutes
Validated pattern will be deployed
./pattern.sh make install
- Login to the OpenShift web console.
- Navigate to the Workloads --> Pods.
- Select the
rag-llm
project from the drop down. - Following pods should be up and running.
Note: If the hf-text-generation-server is not running, make sure you have followed the steps to configure a node with GPU from the instructions provided above.
- Click the
Application box
icon in the header, and selectRetrieval-Augmented-Generation (RAG) LLM Demonstration UI
-
It will use the default provider and model configured as part of the application deployment. The default provider is a Hugging Face model server running in the OpenShift. The model server is deployed with this valdiated pattern and requires a node with GPU.
-
Enter any company name
-
Enter the product as
RedHat OpenShift
-
Click the
Generate
button, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document.
You can optionally add additional providers. The application supports the following providers
- Hugging Face Text Generation Inference Server
- OpenAI
- NVIDIA
Click on the Add Provider
tab to add a new provider. Fill in the details and click Add Provider
button. The provider should be added in the Providers
dropdown uder Chatbot
tab.
Follow the instructions in step 3 to generate the proposal document using the OpenAI provider.
You can provide rating to the model by clicking on the Rate the model
radio button. The rating will be captured as part of the metrics and can help the company which model to deploy in prodcution.
By default, Grafana application is deployed in llm-monitoring
namespace.To launch the Grafana Dashboard, follow the instructions below:
- Grab the credentials of Grafana Application
- Navigate to Workloads --> Secrets
- Click on the grafana-admin-credentials and copy the GF_SECURITY_ADMIN_USER, GF_SECURITY_ADMIN_PASSWORD
- Launch Grafana Dashboard
GOTO: Test Plan
EDB Postgres for Kubernetes is distributed under the EDB Limited Usage License Agreement, available at enterprisedb.com/limited-use-license.