Skip to content

Commit df391f7

Browse files
authored
revisions based on feedback (NVIDIA#274)
* revisions based on feedback * feedback on prompt use case
1 parent 0392821 commit df391f7

File tree

8 files changed

+100
-75
lines changed

8 files changed

+100
-75
lines changed

community/chat-and-rag-glean/README.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,7 @@
22

33
This repository includes a demo of a simple chat bot that answers questions based on a company's internal knowledge repository.
44

5-
![chat_interace_1](./chat_interfaced_1.png)
6-
7-
8-
![chat_interace_2](./chat_interface_2.png)
9-
5+
![chat_interace_1](./chat_interface_1.png)
106

117
The implementation includes:
128

@@ -16,15 +12,15 @@ The implementation includes:
1612
- Chroma DB for a lightweight vector DB
1713
- An internal knowledge base stored in Glean and available over the Glean Search API
1814

19-
This example uses NVIDIA NIMs which can be hosted completely on-premise, which combined with the Glean on-premise offering, allows organizations to use LLMs for internal knowledge search, chat, and retrieval without any data leaving their environment.
15+
This example uses NVIDIA NIM microservices which can be hosted completely on-premise or in a company's private cloud, which combined with the Glean cloud-prem offering, allows organizations to create internal knowledge search, chat, and retrieval applications without any data leaving their environment.
2016

2117
The example architecture and possible extensions are shown below.
2218

2319
![sample_architecture](./glean_example_architecture.png)
2420

2521
## Pre-requisites
2622

27-
This example uses hosted NVIDIA NIMs for the foundational LLMs. In order to use these hosted LLMds you will need a NVIDIA API key which is available at https://build.nvidia.com.
23+
This example uses hosted NVIDIA NIMs for the foundational LLMs. In order to use these hosted LLMs you will need a NVIDIA API key which is available at https://build.nvidia.com.
2824

2925
```bash
3026
export NVIDIA_API_KEY="nvapi-YOUR-KEY"
@@ -82,23 +78,23 @@ The main LLM used is `meta/llama-3.3-70b-instruct`. Update this model name to us
8278

8379
The main embedding model used is `meta/llama-3.2-nv-embedqa-1b-v2`. Update this model name to use a different embedding model.
8480

85-
### Using on-prem
81+
### Using in a private network
8682

87-
You may way to build an application similar to this demo that is hosted on-premise or in your private cloud so that no internal data leaves your systems.
83+
You may way to build an application similar to this demo that is hosted in your private environment so that no internal data leaves your systems.
8884

89-
- Ensure you are using the [Glean "Cloud-prem" option](https://help.glean.com/en/articles/10093412-glean-deployment-options). Update the `GLEAN_API_BASE_URL` to use your on-prem Glean installation.
90-
- Follow the appropriate [NVIDIA NIM deployment guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html) for your environment. You will need to deploy at least one NVIDIA NIM foundational LLM and one NVIDIA NIM embedding model. The result of following this guide will be two on-premise URL endpoints.
91-
- Update the file `glean_example/src/agent.py` to use the on-prem endpoints:
85+
- Ensure you are using the [Glean "Cloud-prem" option](https://help.glean.com/en/articles/10093412-glean-deployment-options). Update the `GLEAN_API_BASE_URL` to use your cloud-prem Glean installation.
86+
- Follow the appropriate [NVIDIA NIM deployment guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html) for your environment. You will need to deploy at least one NVIDIA NIM foundational LLM and one NVIDIA NIM embedding model. The result of following this guide will be two private URL endpoints.
87+
- Update the file `glean_example/src/agent.py` to use the private endpoints:
9288

9389
```python
9490
model = ChatNVIDIA(
9591
model="meta/llama-3.3-70b-instruct",
96-
base_url="http://localhost:8000/v1", # Update to the on-prem URL where your NVIDIA NIM is running
92+
base_url="http://localhost:8000/v1", # Update to the URL where your NVIDIA NIM is running
9793
api_key=os.getenv("NVIDIA_API_KEY")
9894
)
9995
embeddings = NVIDIAEmbeddings(
10096
model="nvidia/llama-3.2-nv-embedqa-1b-v2",
101-
base_url="http://localhost:8000/v1", # Update to the on-prem URL where your NVIDIA NIM is running
97+
base_url="http://localhost:8000/v1", # Update to the URL where your NVIDIA NIM is running
10298
api_key=os.getenv("NVIDIA_API_KEY"),
10399
truncate="NONE",
104100
)
132 KB
Loading
-173 KB
Binary file not shown.
-159 KB
Binary file not shown.

community/chat-and-rag-glean/glean_example/src/agent.py

Lines changed: 53 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -29,25 +29,60 @@
2929

3030
class InfoBotState(BaseModel):
3131
messages: List[Tuple[str, str]] = None
32-
glean_query: Optional[str] = None
32+
glean_query_required: Optional[bool] = None
3333
glean_results: Optional[List[str]] = None
3434
db: Optional[Any] = None
3535
answer_candidate: Optional[str] = None
3636

37+
def determine_user_intent(state: InfoBotState):
38+
"""parses the user message and determines whether or not to call glean"""
39+
40+
# in this example the intent mapping is straight forward, either:
41+
# - determining the question requires context and routing to glean
42+
# - or answering with the LLMs foundational world knowledge
43+
# in practice, this initial step could be an agent responsible for many actions such as
44+
# - parsing multi-modal inputs
45+
# - asking the user clarifying questions
46+
# - running the promopt through custom guardrails, eg screening for sensitive HR topics
47+
48+
logger.info("Thinking about question")
49+
most_recent_message: Tuple[str, str] = state.messages[-1]
50+
role, query = most_recent_message
51+
52+
llm = PROMPT_GLEAN_QUERY | model
53+
response = llm.invoke({"query": query})
54+
55+
if "Yes" in response.content:
56+
logger.info("I will need to check Glean to answer")
57+
state.glean_query_required = True
58+
59+
if "No" in response.content:
60+
state.glean_query_required = False
61+
62+
return state
63+
64+
def route_glean(state: InfoBotState):
65+
if state.glean_query_required:
66+
return "call_glean"
67+
68+
if not state.glean_query_required:
69+
return "summarize_answer"
3770

3871
def call_glean(state: InfoBotState):
3972
"""Call the Glean Search API with a user query and it will return relevant results"""
4073
logger.info("Calling Glean")
74+
most_recent_message: Tuple[str, str] = state.messages[-1]
75+
role, query = most_recent_message
4176
response = glean_search(
42-
query=state.glean_query, api_key=glean_api_key, base_url=base_url
77+
query=query, api_key=glean_api_key, base_url=base_url
4378
)
4479
state.glean_results = documents_from_glean_response(response)
4580
return state
4681

4782

4883
def add_embeddings(state: InfoBotState):
4984
"""Update the vector DB with glean search results"""
50-
logger.info("Adding Embeddings")
85+
logger.info("Understanding search results... adding embeddings")
5186
db = Chroma.from_texts(
5287
state.glean_results, embedding=embeddings, persist_directory=chroma_db_path
5388
)
@@ -57,7 +92,7 @@ def add_embeddings(state: InfoBotState):
5792

5893
def answer_candidates(state: InfoBotState):
5994
"""Use RAG to get most likely answer"""
60-
logger.info("RAG on Embeddings")
95+
logger.info("Understanding search results... querying embeddings")
6196
most_recent_message: Tuple[str, str] = state.messages[-1]
6297
role, query = most_recent_message
6398
retriever = state.db.as_retriever(search_kwargs={"k": 1})
@@ -66,30 +101,15 @@ def answer_candidates(state: InfoBotState):
66101
return state
67102

68103

69-
def create_glean_query(state: InfoBotState):
70-
"""parses the user message and creates an appropriate glean query"""
71-
logger.info("Glean Query from User Message")
72-
most_recent_message: Tuple[str, str] = state.messages[-1]
73-
role, query = most_recent_message
74-
75-
llm = PROMPT_GLEAN_QUERY | model
76-
response = llm.invoke({"query": query})
77-
78-
state.glean_query = response.content
79-
80-
return state
81-
82-
83-
def call_bot(state: InfoBotState):
104+
def summarize_answer(state: InfoBotState):
84105
"""the main agent responsible for taking all the context and answering the question"""
85-
logger.info("Generate final answer")
106+
logger.info("Generating final answer")
86107

87108
llm = PROMPT_ANSWER | model
88109

89110
response = llm.invoke(
90111
{
91112
"messages": state.messages,
92-
"glean_query": state.glean_query,
93113
"glean_search_result_documents": state.glean_results,
94114
"answer_candidate": state.answer_candidate,
95115
}
@@ -101,23 +121,26 @@ def call_bot(state: InfoBotState):
101121
# Define the graph
102122

103123
graph = StateGraph(InfoBotState)
104-
graph.add_node("call_bot", call_bot)
124+
graph.add_node("determine_user_intent", determine_user_intent)
105125
graph.add_node("call_glean", call_glean)
106-
graph.add_node("answer_candidates", answer_candidates)
107-
graph.add_node("create_glean_query", create_glean_query)
108126
graph.add_node("add_embeddings", add_embeddings)
109-
110-
graph.add_edge(START, "create_glean_query")
111-
graph.add_edge("create_glean_query", "call_glean")
127+
graph.add_node("answer_candidates", answer_candidates)
128+
graph.add_node("summarize_answer", summarize_answer)
129+
graph.add_edge(START, "determine_user_intent")
130+
graph.add_conditional_edges(
131+
"determine_user_intent",
132+
route_glean,
133+
{"call_glean": "call_glean", "summarize_answer": "summarize_answer"}
134+
)
112135
graph.add_edge("call_glean", "add_embeddings")
113136
graph.add_edge("add_embeddings", "answer_candidates")
114-
graph.add_edge("answer_candidates", "call_bot")
115-
graph.add_edge("call_bot", END)
137+
graph.add_edge("answer_candidates", "summarize_answer")
138+
graph.add_edge("summarize_answer", END)
116139
agent = graph.compile()
117140

118141

119142
if __name__ == "__main__":
120-
msg = "do I need to take PTO if I am sick"
143+
msg = "What's the latest on the new API project?"
121144
history = []
122145
history.append(("user", msg))
123146
messages = history

community/chat-and-rag-glean/glean_example/src/app/app.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,12 @@
66
from typing import List
77
from pathlib import Path
88
from gradio_log import Log
9+
import os
910

1011
log_file = "/tmp/gradio_log.txt"
12+
if Path(log_file).exists:
13+
os.remove(log_file)
14+
1115
Path(log_file).touch()
1216

1317
ch = logging.FileHandler(log_file)
@@ -44,13 +48,13 @@ def agent_predict(msg: str, history: List) -> str:
4448
return response["messages"][-1][1]
4549

4650

47-
chatbot = gr.Chatbot(label="NVBot Lite", elem_id="chatbot", show_copy_button=True)
51+
chatbot = gr.Chatbot(label="Ask away!", elem_id="chatbot", show_copy_button=True)
4852

4953
with gr.Blocks(theme=theme, css=css) as chat:
5054
chat_interface = gr.ChatInterface(
5155
fn=agent_predict,
5256
chatbot=chatbot,
53-
title="NVIDIA Information Demo",
57+
title="ACME Corp Help Agent",
5458
autofocus=True,
5559
fill_height=True,
5660
)

community/chat-and-rag-glean/glean_example/src/prompts.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,23 @@
22

33
PROMPT_GLEAN_QUERY_TEMPLATE = """
44
5-
You are part of an agent graph. Your job is to take the user input message and construct a simple and optimized natural language query that will be passed to an API. The API expects a natural language query and returns documents that might answer that query. The documents are sourced from an internal knowledge base at a company.
5+
You are part of an agent graph. Your job is to take the user input message and decide if access to a company knowledge base is needed to answer the question.
66
77
Examples
88
99
User Query: how many days off do I get this year?
10-
Suggested API Query: holiday benefit page
10+
Answer: Yes
1111
1212
User Query: tell me about the company mission
13-
Suggested API Query: company mission statement
13+
Answer: Yes
1414
15-
Please reply with a Suggested API Query for the following User Query. Reply with only the suggested query, nothing else.
15+
User Query: tell me a funny joke
16+
Answer: No
17+
18+
Reply with only Yes or No, nothing else.
1619
1720
User Query: {query}
18-
Suggested API Query:
21+
Answer:
1922
2023
"""
2124

@@ -27,8 +30,6 @@
2730
2831
Message History: {messages}
2932
30-
Glean Search: {glean_query}
31-
3233
All Supporting Documents from Glean:
3334
3435
{glean_search_result_documents}

community/chat-and-rag-glean/nvidia_nim_langgraph_glean_example.ipynb

Lines changed: 23 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
"- Chroma DB for storing cached query results and performing RAG\n",
2020
"- LangGraph for creating an agent\n",
2121
"\n",
22-
"Best of all, because both Glean and NVIDIA NIMs can be deployed in your private cloud, it is possible to create this type of enterprise chatbot without any data leaving your control.\n",
22+
"Best of all, because both Glean and NVIDIA NIMs can be deployed in your private environment, it is possible to create this type of enterprise chatbot without any data leaving your control.\n",
2323
"\n",
2424
"To get started with this notebook, set the following environment variables. You will need a Glean deployment, a Glean API key, and a [NVIDA API Key](https://build.nvidia.com)."
2525
]
@@ -41,7 +41,7 @@
4141
"id": "d0e2689f-335d-4203-8121-530641257de9",
4242
"metadata": {},
4343
"source": [
44-
"We start by instantiating the LLM and embedding model. You can update this code to use different foundational LLMs, or add the `base_url` parameter if you are using on-premise NVIDIA NIMs."
44+
"We start by instantiating the LLM and embedding model. You can update this code to use different foundational LLMs, or add the `base_url` parameter if you are using private NVIDIA NIM microservices."
4545
]
4646
},
4747
{
@@ -88,47 +88,51 @@
8888
"id": "3beda74c-7885-4885-babd-8166a049fd38",
8989
"metadata": {},
9090
"source": [
91-
"While the model is able to interpret our question and formulate a response, it does not have access to any information about company-specific policies. To add this type of information we will follow a two multi-step process: \n",
91+
"While the model is able to interpret our question and formulate a response, it does not have access to any information about company-specific policies. To add this type of information we will follow a multi-step process: \n",
9292
"\n",
93-
"1. Have the LLM translate the user's question into a query for the Glean knowledge base.\n",
94-
"2. Query the Glean knowledge base using the Glean search API to get the most relevant supporting documents.\n",
93+
"1. Have the LLM interpret the user's question and add any relevant context. Most free form questions can be passed directly to the Glean Search API.\n",
94+
"2. Add relevant context about the user and then query the Glean knowledge base using the Glean search API to get the most relevant supporting documents. \n",
9595
"3. Embed those supporting documents into a local vector DB.\n",
9696
"4. Use a retriever model to fetch the most relevant supporting document based on the user's original question.\n",
97-
"5. Take the most relevant supporting document and add it to the LLM by adding it to the model's prompt (RAG).\n",
98-
"6. Ask the model to answer the user's question with this new relevant context.\n",
97+
"5. Take the most relevant supporting document and add it to the LLM by adding it to the LLM's prompt (RAG).\n",
98+
"6. Ask the LLM to summarize the results and answer the user's question with this new relevant context.\n",
9999
"\n",
100100
"To help organize these steps we use a LangGraph agent. The full implementation of the agent is available in the file `glean_example/src/agent.py`. The following code samples explain some core concepts of that code.\n",
101101
"\n",
102102
"```python\n",
103103
"class InfoBotState(BaseModel):\n",
104104
" messages: List[Tuple[str, str]] = None\n",
105-
" glean_query: Optional[str] = None\n",
105+
" glean_query_required: Optional[bool] = None\n",
106106
" glean_results: Optional[List[str]] = None\n",
107107
" db: Optional[Any] = None\n",
108108
" answer_candidate: Optional[str] = None\n",
109109
"\n",
110110
"graph = StateGraph(InfoBotState)\n",
111-
"graph.add_node(\"call_bot\", call_bot)\n",
111+
"graph.add_node(\"determine_user_intent\", determine_user_intent)\n",
112112
"graph.add_node(\"call_glean\", call_glean)\n",
113-
"graph.add_node(\"answer_candidates\", answer_candidates)\n",
114-
"graph.add_node(\"create_glean_query\", create_glean_query)\n",
115113
"graph.add_node(\"add_embeddings\", add_embeddings)\n",
116-
"\n",
117-
"graph.add_edge(START, \"create_glean_query\")\n",
118-
"graph.add_edge(\"create_glean_query\", \"call_glean\")\n",
114+
"graph.add_node(\"answer_candidates\", answer_candidates)\n",
115+
"graph.add_node(\"summarize_answer\", summarize_answer)\n",
116+
"graph.add_edge(START, \"determine_user_intent\")\n",
117+
"graph.add_conditional_edges(\n",
118+
" \"determine_user_intent\",\n",
119+
" route_glean, \n",
120+
" {\"call_glean\": \"call_glean\", \"summarize_answer\": \"summarize_answer\"}\n",
121+
")\n",
119122
"graph.add_edge(\"call_glean\", \"add_embeddings\")\n",
120123
"graph.add_edge(\"add_embeddings\", \"answer_candidates\")\n",
121-
"graph.add_edge(\"answer_candidates\", \"call_bot\")\n",
122-
"graph.add_edge(\"call_bot\", END)\n",
124+
"graph.add_edge(\"answer_candidates\", \"summarize_answer\")\n",
125+
"graph.add_edge(\"summarize_answer\", END)\n",
123126
"agent = graph.compile()\n",
127+
"\n",
124128
"```\n",
125129
"\n",
126130
"This code is responsible for creating the agent. Each node represents a function responsible for implementing one of the six steps in our process. The `InfoBotState` is a special type of dictionary that will hold all of the information the agent needs through each step of the process. \n",
127131
"\n",
128132
"The source of each function is also available in `glean_example/src/agent.py`. For example, the implementation of `call_bot` is: \n",
129133
"\n",
130134
"```python\n",
131-
"def call_bot(state: InfoBotState):\n",
135+
"def summarize_answer(state: InfoBotState):\n",
132136
" \"\"\"the main agent responsible for taking all the context and answering the question\"\"\"\n",
133137
" logger.info(\"Generate final answer\")\n",
134138
"\n",
@@ -137,7 +141,6 @@
137141
" response = llm.invoke(\n",
138142
" {\n",
139143
" \"messages\": state.messages,\n",
140-
" \"glean_query\": state.glean_query,\n",
141144
" \"glean_search_result_documents\": state.glean_results,\n",
142145
" \"answer_candidate\": state.answer_candidate,\n",
143146
" }\n",
@@ -146,15 +149,13 @@
146149
" return state\n",
147150
"```\n",
148151
"\n",
149-
"This function takes the NVIDIA NIM foundational LLM model and invokes it with a specific prompt and the information available in the agent state. The prompt tells the agent what to do, injecting the relevant information fronm the agent state. You can see the prompts in the file `glean_example/src/prompts.py`. For example, the `PROMPT_ANSWER` is: \n",
152+
"This function takes the NVIDIA NIM foundational LLM model and invokes it with a specific prompt and the information available in the agent state. The prompt tells the agent what to do, injecting the relevant information from the agent state. You can see the prompts in the file `glean_example/src/prompts.py`. For example, the `PROMPT_ANSWER` is: \n",
150153
"\n",
151154
"```raw\n",
152155
"You are the final part of an agent graph. Your job is to answer the user's question based on the information below. Include a url citation in your answer.\n",
153156
"\n",
154157
"Message History: {messages}\n",
155158
"\n",
156-
"Glean Search: {glean_query}\n",
157-
"\n",
158159
"All Supporting Documents from Glean: \n",
159160
"\n",
160161
"{glean_search_result_documents}\n",
@@ -227,7 +228,7 @@
227228
"source": [
228229
"from glean_example.src.agent import agent\n",
229230
"\n",
230-
"msg = \"do I need to take PTO if I am sick\"\n",
231+
"msg = \"What's the latest on the new API project?\"\n",
231232
"history = []\n",
232233
"history.append((\"user\", msg))\n",
233234
"response = agent.invoke({\"messages\": history})\n",

0 commit comments

Comments
 (0)