Skip to content

Commit 0392821

Browse files
sloppSean Lopp
andauthored
First draft glean example (NVIDIA#272)
* first draft glean example * pr feedback * clean up readme and add notebook * clean up readme and add notebook --------- Co-authored-by: Sean Lopp <[email protected]>
1 parent 8b01b67 commit 0392821

File tree

15 files changed

+3702
-1
lines changed

15 files changed

+3702
-1
lines changed

community/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,4 +66,8 @@ Community examples are sample code and deployments for RAG pipelines that are no
6666

6767
* [LLM Prompt Design Helper using NIM](./llm-prompt-design-helper/)
6868

69-
This tool demonstrates how to utilize a user-friendly interface to interact with NVIDIA NIMs, including those available in the API catalog, self-deployed NIM endpoints, and NIMs hosted on Hugging Face. It also provides settings to integrate RAG pipelines with either local and temporary vector stores or self-hosted search engines. Developers can use this tool to design system prompts, few-shot prompts, and configure LLM settings.
69+
This tool demonstrates how to utilize a user-friendly interface to interact with NVIDIA NIMs, including those available in the API catalog, self-deployed NIM endpoints, and NIMs hosted on Hugging Face. It also provides settings to integrate RAG pipelines with either local and temporary vector stores or self-hosted search engines. Developers can use this tool to design system prompts, few-shot prompts, and configure LLM settings.
70+
71+
* [Chatbot with RAG and Glean](./chat-and-rag-glean/)
72+
73+
This tool shows how to build a chat interface that uses NVIDIA NIMs along with the Glean Search API to enable internal knowledge base search, chat, and retrieval.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Enterprise Knowledge Base Chatbot
2+
3+
This repository includes a demo of a simple chat bot that answers questions based on a company's internal knowledge repository.
4+
5+
![chat_interace_1](./chat_interfaced_1.png)
6+
7+
8+
![chat_interace_2](./chat_interface_2.png)
9+
10+
11+
The implementation includes:
12+
13+
- Gradio chat interface
14+
- LangGraph agent
15+
- NVIDIA NIM microservices
16+
- Chroma DB for a lightweight vector DB
17+
- An internal knowledge base stored in Glean and available over the Glean Search API
18+
19+
This example uses NVIDIA NIMs which can be hosted completely on-premise, which combined with the Glean on-premise offering, allows organizations to use LLMs for internal knowledge search, chat, and retrieval without any data leaving their environment.
20+
21+
The example architecture and possible extensions are shown below.
22+
23+
![sample_architecture](./glean_example_architecture.png)
24+
25+
## Pre-requisites
26+
27+
This example uses hosted NVIDIA NIMs for the foundational LLMs. In order to use these hosted LLMds you will need a NVIDIA API key which is available at https://build.nvidia.com.
28+
29+
```bash
30+
export NVIDIA_API_KEY="nvapi-YOUR-KEY"
31+
```
32+
33+
This example also requires a Glean instance and API key. We recommend using a development sandbox for initial testing.
34+
35+
```bash
36+
export GLEAN_API_KEY="YOUR-GLEAN-API-KEY"
37+
export GLEAN_API_BASE_URL="https://your-org.glean.com/rest/api/v1"
38+
```
39+
40+
## Getting Started - Demo Application
41+
42+
- Clone the repository and navigate to this example.
43+
44+
```bash
45+
git clone https://github.com/NVIDIA/GenerativeAIExamples
46+
cd GenerativeAIExamples/community/chat-and-rag-glean
47+
```
48+
49+
- Install the necessary dependencies, we recommend using `uv` as Python installation and package manager.
50+
51+
```bash
52+
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv
53+
uv python install # install python
54+
uv sync # install the dependencies for this project
55+
```
56+
57+
- Run the chat app
58+
59+
```bash
60+
uv run glean_example/src/app/app.py
61+
```
62+
63+
After running this command, open a browser window to `http://127.0.0.1:7860`. The web application allows a user to enter a prompt. The logs will show the main steps the application takes to answer the prompt. Full logs will be displayed in the terminal.
64+
65+
### Customizing the LLMs
66+
67+
The specific LLMs used for the agent and embeddings are specified inside of the file `glean_example/src/agent.py`:
68+
69+
```python
70+
model = ChatNVIDIA(
71+
model="meta/llama-3.3-70b-instruct", api_key=os.getenv("NVIDIA_API_KEY")
72+
)
73+
embeddings = NVIDIAEmbeddings(
74+
model="nvidia/llama-3.2-nv-embedqa-1b-v2",
75+
api_key=os.getenv("NVIDIA_API_KEY"),
76+
truncate="NONE",
77+
)
78+
```
79+
80+
81+
The main LLM used is `meta/llama-3.3-70b-instruct`. Update this model name to use a different LLM.
82+
83+
The main embedding model used is `meta/llama-3.2-nv-embedqa-1b-v2`. Update this model name to use a different embedding model.
84+
85+
### Using on-prem
86+
87+
You may way to build an application similar to this demo that is hosted on-premise or in your private cloud so that no internal data leaves your systems.
88+
89+
- Ensure you are using the [Glean "Cloud-prem" option](https://help.glean.com/en/articles/10093412-glean-deployment-options). Update the `GLEAN_API_BASE_URL` to use your on-prem Glean installation.
90+
- Follow the appropriate [NVIDIA NIM deployment guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html) for your environment. You will need to deploy at least one NVIDIA NIM foundational LLM and one NVIDIA NIM embedding model. The result of following this guide will be two on-premise URL endpoints.
91+
- Update the file `glean_example/src/agent.py` to use the on-prem endpoints:
92+
93+
```python
94+
model = ChatNVIDIA(
95+
model="meta/llama-3.3-70b-instruct",
96+
base_url="http://localhost:8000/v1", # Update to the on-prem URL where your NVIDIA NIM is running
97+
api_key=os.getenv("NVIDIA_API_KEY")
98+
)
99+
embeddings = NVIDIAEmbeddings(
100+
model="nvidia/llama-3.2-nv-embedqa-1b-v2",
101+
base_url="http://localhost:8000/v1", # Update to the on-prem URL where your NVIDIA NIM is running
102+
api_key=os.getenv("NVIDIA_API_KEY"),
103+
truncate="NONE",
104+
)
105+
```
106+
107+
108+
109+
## Getting Started - Jupyter Notebook
110+
111+
Further details about the code, and an example that calls a chatbot without a web application, is available in the Jupyter Notebook `nvidia_nim_langgraph_glean_example.iypnb`.
112+
113+
```
114+
uv run jupyter lab
115+
```
173 KB
Loading
159 KB
Loading
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
import os
2+
from langchain_chroma import Chroma
3+
from typing import List, Tuple, Optional, Any
4+
from langgraph.graph import StateGraph, START, END
5+
from pydantic import BaseModel
6+
from glean_example.src.glean_utils.utils import (
7+
glean_search,
8+
documents_from_glean_response,
9+
)
10+
from glean_example.src.prompts import PROMPT_GLEAN_QUERY, PROMPT_ANSWER
11+
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
12+
import logging
13+
14+
model = ChatNVIDIA(
15+
model="meta/llama-3.3-70b-instruct", api_key=os.getenv("NVIDIA_API_KEY")
16+
)
17+
embeddings = NVIDIAEmbeddings(
18+
model="nvidia/llama-3.2-nv-embedqa-1b-v2",
19+
api_key=os.getenv("NVIDIA_API_KEY"),
20+
truncate="NONE",
21+
)
22+
23+
glean_api_key = os.getenv("GLEAN_API_KEY")
24+
base_url = os.getenv("GLEAN_API_BASE_URL")
25+
chroma_db_path = "."
26+
27+
logger = logging.getLogger("gradio_log")
28+
29+
30+
class InfoBotState(BaseModel):
31+
messages: List[Tuple[str, str]] = None
32+
glean_query: Optional[str] = None
33+
glean_results: Optional[List[str]] = None
34+
db: Optional[Any] = None
35+
answer_candidate: Optional[str] = None
36+
37+
38+
def call_glean(state: InfoBotState):
39+
"""Call the Glean Search API with a user query and it will return relevant results"""
40+
logger.info("Calling Glean")
41+
response = glean_search(
42+
query=state.glean_query, api_key=glean_api_key, base_url=base_url
43+
)
44+
state.glean_results = documents_from_glean_response(response)
45+
return state
46+
47+
48+
def add_embeddings(state: InfoBotState):
49+
"""Update the vector DB with glean search results"""
50+
logger.info("Adding Embeddings")
51+
db = Chroma.from_texts(
52+
state.glean_results, embedding=embeddings, persist_directory=chroma_db_path
53+
)
54+
state.db = db
55+
return state
56+
57+
58+
def answer_candidates(state: InfoBotState):
59+
"""Use RAG to get most likely answer"""
60+
logger.info("RAG on Embeddings")
61+
most_recent_message: Tuple[str, str] = state.messages[-1]
62+
role, query = most_recent_message
63+
retriever = state.db.as_retriever(search_kwargs={"k": 1})
64+
docs = retriever.invoke(query)
65+
state.answer_candidate = docs[0].page_content
66+
return state
67+
68+
69+
def create_glean_query(state: InfoBotState):
70+
"""parses the user message and creates an appropriate glean query"""
71+
logger.info("Glean Query from User Message")
72+
most_recent_message: Tuple[str, str] = state.messages[-1]
73+
role, query = most_recent_message
74+
75+
llm = PROMPT_GLEAN_QUERY | model
76+
response = llm.invoke({"query": query})
77+
78+
state.glean_query = response.content
79+
80+
return state
81+
82+
83+
def call_bot(state: InfoBotState):
84+
"""the main agent responsible for taking all the context and answering the question"""
85+
logger.info("Generate final answer")
86+
87+
llm = PROMPT_ANSWER | model
88+
89+
response = llm.invoke(
90+
{
91+
"messages": state.messages,
92+
"glean_query": state.glean_query,
93+
"glean_search_result_documents": state.glean_results,
94+
"answer_candidate": state.answer_candidate,
95+
}
96+
)
97+
state.messages.append(("agent", response.content))
98+
return state
99+
100+
101+
# Define the graph
102+
103+
graph = StateGraph(InfoBotState)
104+
graph.add_node("call_bot", call_bot)
105+
graph.add_node("call_glean", call_glean)
106+
graph.add_node("answer_candidates", answer_candidates)
107+
graph.add_node("create_glean_query", create_glean_query)
108+
graph.add_node("add_embeddings", add_embeddings)
109+
110+
graph.add_edge(START, "create_glean_query")
111+
graph.add_edge("create_glean_query", "call_glean")
112+
graph.add_edge("call_glean", "add_embeddings")
113+
graph.add_edge("add_embeddings", "answer_candidates")
114+
graph.add_edge("answer_candidates", "call_bot")
115+
graph.add_edge("call_bot", END)
116+
agent = graph.compile()
117+
118+
119+
if __name__ == "__main__":
120+
msg = "do I need to take PTO if I am sick"
121+
history = []
122+
history.append(("user", msg))
123+
messages = history
124+
response = agent.invoke({"messages": messages})
125+
logger.info(response["messages"][-1][1])
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import logging
2+
3+
import gradio as gr
4+
from glean_example.src.app.css import css, theme
5+
from glean_example.src.agent import agent
6+
from typing import List
7+
from pathlib import Path
8+
from gradio_log import Log
9+
10+
log_file = "/tmp/gradio_log.txt"
11+
Path(log_file).touch()
12+
13+
ch = logging.FileHandler(log_file)
14+
ch.setLevel(logging.DEBUG)
15+
16+
17+
logger = logging.getLogger("gradio_log")
18+
logger.setLevel(logging.DEBUG)
19+
for handler in logger.handlers:
20+
logger.removeHandler(handler)
21+
logger.addHandler(ch)
22+
23+
24+
def convert_to_langchain_history(history: List):
25+
if len(history) < 1:
26+
return []
27+
28+
langchain_history = []
29+
for msg_pair in history:
30+
msg1, msg2 = msg_pair
31+
if msg1 == "user":
32+
langchain_history.append(("user", msg2))
33+
if msg1 != "user":
34+
langchain_history.append(("system", msg2))
35+
36+
return langchain_history
37+
38+
39+
def agent_predict(msg: str, history: List) -> str:
40+
history = convert_to_langchain_history(history)
41+
42+
history.append(("user", msg))
43+
response = agent.invoke(input={"messages": history})
44+
return response["messages"][-1][1]
45+
46+
47+
chatbot = gr.Chatbot(label="NVBot Lite", elem_id="chatbot", show_copy_button=True)
48+
49+
with gr.Blocks(theme=theme, css=css) as chat:
50+
chat_interface = gr.ChatInterface(
51+
fn=agent_predict,
52+
chatbot=chatbot,
53+
title="NVIDIA Information Demo",
54+
autofocus=True,
55+
fill_height=True,
56+
)
57+
58+
Log(log_file=log_file)
59+
60+
# chat_interface.render()
61+
62+
if __name__ == "__main__":
63+
chat.queue().launch(share=False)
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import os
2+
import pathlib
3+
import gradio as gr
4+
5+
bot_title = os.getenv("BOT_TITLE", "NVIDIA Inference Microservice")
6+
7+
header = f"""
8+
<span style="color:#76B900;font-weight:600;font-size:28px">
9+
{bot_title}
10+
</span>
11+
"""
12+
13+
styles = pathlib.Path(__file__).parent.joinpath("style.css").resolve()
14+
with open(styles, "r") as file:
15+
css = file.read()
16+
17+
theme = gr.themes.Monochrome(
18+
primary_hue="emerald", secondary_hue="green", font=["sans-serif"]
19+
).set(
20+
button_primary_background_fill="#76B900",
21+
button_primary_background_fill_dark="#76B900",
22+
button_primary_background_fill_hover="#569700",
23+
button_primary_background_fill_hover_dark="#569700",
24+
button_primary_text_color="#000000",
25+
button_primary_text_color_dark="#ffffff",
26+
button_secondary_background_fill="#76B900",
27+
button_secondary_background_fill_dark="#76B900",
28+
button_secondary_background_fill_hover="#569700",
29+
button_secondary_background_fill_hover_dark="#569700",
30+
button_secondary_text_color="#000000",
31+
button_secondary_text_color_dark="#ffffff",
32+
slider_color="#76B900",
33+
color_accent="#76B900",
34+
color_accent_soft="#76B900",
35+
body_text_color="#000000",
36+
body_text_color_dark="#ffffff",
37+
color_accent_soft_dark="#76B900",
38+
border_color_accent="#ededed",
39+
border_color_accent_dark="#3d3c3d",
40+
block_title_text_color="#000000",
41+
block_title_text_color_dark="#ffffff",
42+
)
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
.header {
2+
padding: 60px;
3+
text-align: center;
4+
color: #76b900;
5+
font-size: 30px;
6+
}
7+
8+
#chatbot {
9+
flex-grow: 2;
10+
overflow: auto;
11+
}
12+
13+
footer {
14+
visibility: hidden;
15+
}
16+
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import os
2+
from glean_example.src.glean_utils.utils import (
3+
glean_search,
4+
documents_from_glean_response,
5+
)
6+
7+
api_key = os.getenv("GLEAN_API_KEY")
8+
base_url = "https://nvidia-be.glean.com/rest/api/v1"
9+
10+
response = glean_search(
11+
query="us holidays",
12+
api_key=api_key,
13+
base_url=base_url,
14+
)
15+
16+
documents = documents_from_glean_response(response)
17+
18+
print(documents)

0 commit comments

Comments
 (0)