surjyasahoo
diff --git a/‎community/chat-and-rag-glean/README.md‎
Lines changed: 10 additions & 14 deletions b/‎community/chat-and-rag-glean/README.md‎
Lines changed: 10 additions & 14 deletions
diff --git a/‎community/chat-and-rag-glean/chat_interface_1.png‎
132 KB b/‎community/chat-and-rag-glean/chat_interface_1.png‎
132 KB
diff --git a/‎community/chat-and-rag-glean/chat_interface_2.png‎
-173 KB b/‎community/chat-and-rag-glean/chat_interface_2.png‎
-173 KB
diff --git a/‎community/chat-and-rag-glean/chat_interfaced_1.png‎
-159 KB b/‎community/chat-and-rag-glean/chat_interfaced_1.png‎
-159 KB
diff --git a/‎community/chat-and-rag-glean/glean_example/src/agent.py‎
Lines changed: 53 additions & 30 deletions b/‎community/chat-and-rag-glean/glean_example/src/agent.py‎
Lines changed: 53 additions & 30 deletions
diff --git a/‎community/chat-and-rag-glean/glean_example/src/app/app.py‎
Lines changed: 6 additions & 2 deletions b/‎community/chat-and-rag-glean/glean_example/src/app/app.py‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎community/chat-and-rag-glean/glean_example/src/prompts.py‎
Lines changed: 8 additions & 7 deletions b/‎community/chat-and-rag-glean/glean_example/src/prompts.py‎
Lines changed: 8 additions & 7 deletions
diff --git a/‎community/chat-and-rag-glean/nvidia_nim_langgraph_glean_example.ipynb‎
Lines changed: 23 additions & 22 deletions b/‎community/chat-and-rag-glean/nvidia_nim_langgraph_glean_example.ipynb‎
Lines changed: 23 additions & 22 deletions
@@ -2,11 +2,7 @@
 
 This repository includes a demo of a simple chat bot that answers questions based on a company's internal knowledge repository. 
 
-![chat_interace_1](./chat_interfaced_1.png)
-
-
-![chat_interace_2](./chat_interface_2.png)
-
+![chat_interace_1](./chat_interface_1.png)
 
 The implementation includes:
 
@@ -16,15 +12,15 @@ The implementation includes:
 - Chroma DB for a lightweight vector DB
 - An internal knowledge base stored in Glean and available over the Glean Search API
 
-This example uses NVIDIA NIMs which can be hosted completely on-premise, which combined with the Glean on-premise offering, allows organizations to use LLMs for internal knowledge search, chat, and retrieval without any data leaving their environment.
+This example uses NVIDIA NIM microservices which can be hosted completely on-premise or in a company's private cloud, which combined with the Glean cloud-prem offering, allows organizations to create internal knowledge search, chat, and retrieval applications without any data leaving their environment.
 
 The example architecture and possible extensions are shown below.
 
 ![sample_architecture](./glean_example_architecture.png)
 
 ## Pre-requisites 
 
-This example uses hosted NVIDIA NIMs for the foundational LLMs. In order to use these hosted LLMds you will need a NVIDIA API key which is available at https://build.nvidia.com.
+This example uses hosted NVIDIA NIMs for the foundational LLMs. In order to use these hosted LLMs you will need a NVIDIA API key which is available at https://build.nvidia.com.
 
 ```bash
 export NVIDIA_API_KEY="nvapi-YOUR-KEY"
@@ -82,23 +78,23 @@ The main LLM used is `meta/llama-3.3-70b-instruct`. Update this model name to us
 
 The main embedding model used is `meta/llama-3.2-nv-embedqa-1b-v2`. Update this model name to use a different embedding model.
 
-### Using on-prem 
+### Using in a private network 
 
-You may way to build an application similar to this demo that is hosted on-premise or in your private cloud so that no internal data leaves your systems.
+You may way to build an application similar to this demo that is hosted in your private environment so that no internal data leaves your systems.
 
-- Ensure you are using the [Glean "Cloud-prem" option](https://help.glean.com/en/articles/10093412-glean-deployment-options). Update the `GLEAN_API_BASE_URL` to use your on-prem Glean installation. 
-- Follow the appropriate [NVIDIA NIM deployment guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html) for your environment. You will need to deploy at least one NVIDIA NIM foundational LLM and one NVIDIA NIM embedding model. The result of following this guide will be two on-premise URL endpoints.
-- Update the file `glean_example/src/agent.py` to use the on-prem endpoints: 
+- Ensure you are using the [Glean "Cloud-prem" option](https://help.glean.com/en/articles/10093412-glean-deployment-options). Update the `GLEAN_API_BASE_URL` to use your cloud-prem Glean installation. 
+- Follow the appropriate [NVIDIA NIM deployment guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html) for your environment. You will need to deploy at least one NVIDIA NIM foundational LLM and one NVIDIA NIM embedding model. The result of following this guide will be two private URL endpoints.
+- Update the file `glean_example/src/agent.py` to use the private endpoints: 
 
     ```python
     model = ChatNVIDIA(
         model="meta/llama-3.3-70b-instruct", 
-        base_url="http://localhost:8000/v1", # Update to the on-prem URL where your NVIDIA NIM is running
+        base_url="http://localhost:8000/v1", # Update to the URL where your NVIDIA NIM is running
         api_key=os.getenv("NVIDIA_API_KEY")
     )
     embeddings = NVIDIAEmbeddings(
         model="nvidia/llama-3.2-nv-embedqa-1b-v2",
-        base_url="http://localhost:8000/v1", # Update to the on-prem URL where your NVIDIA NIM is running
+        base_url="http://localhost:8000/v1", # Update to the URL where your NVIDIA NIM is running
         api_key=os.getenv("NVIDIA_API_KEY"),
         truncate="NONE",
     )
 
@@ -29,25 +29,60 @@
 
 class InfoBotState(BaseModel):
     messages: List[Tuple[str, str]] = None
-    glean_query: Optional[str] = None
+    glean_query_required: Optional[bool] = None
     glean_results: Optional[List[str]] = None
     db: Optional[Any] = None
     answer_candidate: Optional[str] = None
 
+def determine_user_intent(state: InfoBotState):
+    """parses the user message and determines whether or not to call glean"""
+
+    # in this example the intent mapping is straight forward, either: 
+    #  - determining the question requires context and routing to glean
+    #  - or answering with the LLMs foundational world knowledge
+    # in practice, this initial step could be an agent responsible for many actions such as
+    #  - parsing multi-modal inputs
+    #  - asking the user clarifying questions
+    #  - running the promopt through custom guardrails, eg screening for sensitive HR topics
+    
+    logger.info("Thinking about question")
+    most_recent_message: Tuple[str, str] = state.messages[-1]
+    role, query = most_recent_message
+
+    llm = PROMPT_GLEAN_QUERY | model
+    response = llm.invoke({"query": query})
+
+    if "Yes" in response.content:
+        logger.info("I will need to check Glean to answer")
+        state.glean_query_required = True
+    
+    if "No" in response.content:
+        state.glean_query_required = False
+
+    return state
+
+def route_glean(state: InfoBotState):
+    if state.glean_query_required:
+        return "call_glean"
+
+    if not state.glean_query_required:
+        return "summarize_answer"
 
 def call_glean(state: InfoBotState):
     """Call the Glean Search API with a user query and it will return relevant results"""
     logger.info("Calling Glean")
+    most_recent_message: Tuple[str, str] = state.messages[-1]
+    role, query = most_recent_message
     response = glean_search(
-        query=state.glean_query, api_key=glean_api_key, base_url=base_url
+        query=query, api_key=glean_api_key, base_url=base_url
     )
     state.glean_results = documents_from_glean_response(response)
     return state
 
 
 def add_embeddings(state: InfoBotState):
     """Update the vector DB with glean search results"""
-    logger.info("Adding Embeddings")
+    logger.info("Understanding search results... adding embeddings")
     db = Chroma.from_texts(
         state.glean_results, embedding=embeddings, persist_directory=chroma_db_path
     )
@@ -57,7 +92,7 @@ def add_embeddings(state: InfoBotState):
 
 def answer_candidates(state: InfoBotState):
     """Use RAG to get most likely answer"""
-    logger.info("RAG on Embeddings")
+    logger.info("Understanding search results... querying embeddings")
     most_recent_message: Tuple[str, str] = state.messages[-1]
     role, query = most_recent_message
     retriever = state.db.as_retriever(search_kwargs={"k": 1})
@@ -66,30 +101,15 @@ def answer_candidates(state: InfoBotState):
     return state
 
 
-def create_glean_query(state: InfoBotState):
-    """parses the user message and creates an appropriate glean query"""
-    logger.info("Glean Query from User Message")
-    most_recent_message: Tuple[str, str] = state.messages[-1]
-    role, query = most_recent_message
-
-    llm = PROMPT_GLEAN_QUERY | model
-    response = llm.invoke({"query": query})
-
-    state.glean_query = response.content
-
-    return state
-
-
-def call_bot(state: InfoBotState):
+def summarize_answer(state: InfoBotState):
     """the main agent responsible for taking all the context and answering the question"""
-    logger.info("Generate final answer")
+    logger.info("Generating final answer")
 
     llm = PROMPT_ANSWER | model
 
     response = llm.invoke(
         {
             "messages": state.messages,
-            "glean_query": state.glean_query,
             "glean_search_result_documents": state.glean_results,
             "answer_candidate": state.answer_candidate,
         }
@@ -101,23 +121,26 @@ def call_bot(state: InfoBotState):
 # Define the graph
 
 graph = StateGraph(InfoBotState)
-graph.add_node("call_bot", call_bot)
+graph.add_node("determine_user_intent", determine_user_intent)
 graph.add_node("call_glean", call_glean)
-graph.add_node("answer_candidates", answer_candidates)
-graph.add_node("create_glean_query", create_glean_query)
 graph.add_node("add_embeddings", add_embeddings)
-
-graph.add_edge(START, "create_glean_query")
-graph.add_edge("create_glean_query", "call_glean")
+graph.add_node("answer_candidates", answer_candidates)
+graph.add_node("summarize_answer", summarize_answer)
+graph.add_edge(START, "determine_user_intent")
+graph.add_conditional_edges(
+    "determine_user_intent",
+    route_glean, 
+    {"call_glean": "call_glean", "summarize_answer": "summarize_answer"}
+)
 graph.add_edge("call_glean", "add_embeddings")
 graph.add_edge("add_embeddings", "answer_candidates")
-graph.add_edge("answer_candidates", "call_bot")
-graph.add_edge("call_bot", END)
+graph.add_edge("answer_candidates", "summarize_answer")
+graph.add_edge("summarize_answer", END)
 agent = graph.compile()
 
 
 if __name__ == "__main__":
-    msg = "do I need to take PTO if I am sick"
+    msg = "What's the latest on the new API project?"
     history = []
     history.append(("user", msg))
     messages = history
 
@@ -6,8 +6,12 @@
 from typing import List
 from pathlib import Path
 from gradio_log import Log
+import os
 
 log_file = "/tmp/gradio_log.txt"
+if Path(log_file).exists:
+    os.remove(log_file)
+    
 Path(log_file).touch()
 
 ch = logging.FileHandler(log_file)
@@ -44,13 +48,13 @@ def agent_predict(msg: str, history: List) -> str:
     return response["messages"][-1][1]
 
 
-chatbot = gr.Chatbot(label="NVBot Lite", elem_id="chatbot", show_copy_button=True)
+chatbot = gr.Chatbot(label="Ask away!", elem_id="chatbot", show_copy_button=True)
 
 with gr.Blocks(theme=theme, css=css) as chat:
     chat_interface = gr.ChatInterface(
         fn=agent_predict,
         chatbot=chatbot,
-        title="NVIDIA Information Demo",
+        title="ACME Corp Help Agent",
         autofocus=True,
         fill_height=True,
     )
 
@@ -2,20 +2,23 @@
 
 PROMPT_GLEAN_QUERY_TEMPLATE = """ 
 
-You are part of an agent graph. Your job is to take the user input message and construct a simple and optimized natural language query that will be passed to an API. The API expects a natural language query and returns documents that might answer that query. The documents are sourced from an internal knowledge base at a company.
+You are part of an agent graph. Your job is to take the user input message and decide if access to a company knowledge base is needed to answer the question.
 
 Examples
 
 User Query: how many days off do I get this year?
-Suggested API Query: holiday benefit page
+Answer: Yes
 
 User Query: tell me about the company mission
-Suggested API Query: company mission statement
+Answer: Yes
 
-Please reply with a Suggested API Query for the following User Query. Reply with only the suggested query, nothing else.
+User Query: tell me a funny joke
+Answer: No
+
+Reply with only Yes or No, nothing else.
 
 User Query: {query}
-Suggested API Query: 
+Answer:
 
 """
 
@@ -27,8 +30,6 @@
 
 Message History: {messages}
 
-Glean Search: {glean_query}
-
 All Supporting Documents from Glean: 
 
 {glean_search_result_documents}
 
@@ -19,7 +19,7 @@
     "- Chroma DB for storing cached query results and performing RAG\n",
     "- LangGraph for creating an agent\n",
     "\n",
-    "Best of all, because both Glean and NVIDIA NIMs can be deployed in your private cloud, it is possible to create this type of enterprise chatbot without any data leaving your control.\n",
+    "Best of all, because both Glean and NVIDIA NIMs can be deployed in your private environment, it is possible to create this type of enterprise chatbot without any data leaving your control.\n",
     "\n",
     "To get started with this notebook, set the following environment variables. You will need a Glean deployment, a Glean API key, and a [NVIDA API Key](https://build.nvidia.com)."
    ]
@@ -41,7 +41,7 @@
    "id": "d0e2689f-335d-4203-8121-530641257de9",
    "metadata": {},
    "source": [
-    "We start by instantiating the LLM and embedding model. You can update this code to use different foundational LLMs, or add the `base_url` parameter if you are using on-premise NVIDIA NIMs."
+    "We start by instantiating the LLM and embedding model. You can update this code to use different foundational LLMs, or add the `base_url` parameter if you are using private NVIDIA NIM microservices."
    ]
   },
   {
@@ -88,47 +88,51 @@
    "id": "3beda74c-7885-4885-babd-8166a049fd38",
    "metadata": {},
    "source": [
-    "While the model is able to interpret our question and formulate a response, it does not have access to any information about company-specific policies. To add this type of information we will follow a two multi-step process: \n",
+    "While the model is able to interpret our question and formulate a response, it does not have access to any information about company-specific policies. To add this type of information we will follow a multi-step process: \n",
     "\n",
-    "1. Have the LLM translate the user's question into a query for the Glean knowledge base.\n",
-    "2. Query the Glean knowledge base using the Glean search API to get the most relevant supporting documents.\n",
+    "1. Have the LLM interpret the user's question and add any relevant context. Most free form questions can be passed directly to the Glean Search API.\n",
+    "2. Add relevant context about the user and then query the Glean knowledge base using the Glean search API to get the most relevant supporting documents. \n",
     "3. Embed those supporting documents into a local vector DB.\n",
     "4. Use a retriever model to fetch the most relevant supporting document based on the user's original question.\n",
-    "5. Take the most relevant supporting document and add it to the LLM by adding it to the model's prompt (RAG).\n",
-    "6. Ask the model to answer the user's question with this new relevant context.\n",
+    "5. Take the most relevant supporting document and add it to the LLM by adding it to the LLM's prompt (RAG).\n",
+    "6. Ask the LLM to summarize the results and answer the user's question with this new relevant context.\n",
     "\n",
     "To help organize these steps we use a LangGraph agent. The full implementation of the agent is available in the file `glean_example/src/agent.py`. The following code samples explain some core concepts of that code.\n",
     "\n",
     "```python\n",
     "class InfoBotState(BaseModel):\n",
     "    messages: List[Tuple[str, str]] = None\n",
-    "    glean_query: Optional[str] = None\n",
+    "    glean_query_required: Optional[bool] = None\n",
     "    glean_results: Optional[List[str]] = None\n",
     "    db: Optional[Any] = None\n",
     "    answer_candidate: Optional[str] = None\n",
     "\n",
     "graph = StateGraph(InfoBotState)\n",
-    "graph.add_node(\"call_bot\", call_bot)\n",
+    "graph.add_node(\"determine_user_intent\", determine_user_intent)\n",
     "graph.add_node(\"call_glean\", call_glean)\n",
-    "graph.add_node(\"answer_candidates\", answer_candidates)\n",
-    "graph.add_node(\"create_glean_query\", create_glean_query)\n",
     "graph.add_node(\"add_embeddings\", add_embeddings)\n",
-    "\n",
-    "graph.add_edge(START, \"create_glean_query\")\n",
-    "graph.add_edge(\"create_glean_query\", \"call_glean\")\n",
+    "graph.add_node(\"answer_candidates\", answer_candidates)\n",
+    "graph.add_node(\"summarize_answer\", summarize_answer)\n",
+    "graph.add_edge(START, \"determine_user_intent\")\n",
+    "graph.add_conditional_edges(\n",
+    "    \"determine_user_intent\",\n",
+    "    route_glean, \n",
+    "    {\"call_glean\": \"call_glean\", \"summarize_answer\": \"summarize_answer\"}\n",
+    ")\n",
     "graph.add_edge(\"call_glean\", \"add_embeddings\")\n",
     "graph.add_edge(\"add_embeddings\", \"answer_candidates\")\n",
-    "graph.add_edge(\"answer_candidates\", \"call_bot\")\n",
-    "graph.add_edge(\"call_bot\", END)\n",
+    "graph.add_edge(\"answer_candidates\", \"summarize_answer\")\n",
+    "graph.add_edge(\"summarize_answer\", END)\n",
     "agent = graph.compile()\n",
+    "\n",
     "```\n",
     "\n",
     "This code is responsible for creating the agent. Each node represents a function responsible for implementing one of the six steps in our process. The `InfoBotState` is a special type of dictionary that will hold all of the information the agent needs through each step of the process. \n",
     "\n",
     "The source of each function is also available in `glean_example/src/agent.py`. For example, the implementation of `call_bot` is: \n",
     "\n",
     "```python\n",
-    "def call_bot(state: InfoBotState):\n",
+    "def summarize_answer(state: InfoBotState):\n",
     "    \"\"\"the main agent responsible for taking all the context and answering the question\"\"\"\n",
     "    logger.info(\"Generate final answer\")\n",
     "\n",
@@ -137,7 +141,6 @@
     "    response = llm.invoke(\n",
     "        {\n",
     "            \"messages\": state.messages,\n",
-    "            \"glean_query\": state.glean_query,\n",
     "            \"glean_search_result_documents\": state.glean_results,\n",
     "            \"answer_candidate\": state.answer_candidate,\n",
     "        }\n",
@@ -146,15 +149,13 @@
     "    return state\n",
     "```\n",
     "\n",
-    "This function takes the NVIDIA NIM foundational LLM model and invokes it with a specific prompt and the information available in the agent state. The prompt tells the agent what to do, injecting the relevant information fronm the agent state. You can see the prompts in the file `glean_example/src/prompts.py`. For example, the `PROMPT_ANSWER` is: \n",
+    "This function takes the NVIDIA NIM foundational LLM model and invokes it with a specific prompt and the information available in the agent state. The prompt tells the agent what to do, injecting the relevant information from the agent state. You can see the prompts in the file `glean_example/src/prompts.py`. For example, the `PROMPT_ANSWER` is: \n",
     "\n",
     "```raw\n",
     "You are the final part of an agent graph. Your job is to answer the user's question based on the information below. Include a url citation in your answer.\n",
     "\n",
     "Message History: {messages}\n",
     "\n",
-    "Glean Search: {glean_query}\n",
-    "\n",
     "All Supporting Documents from Glean: \n",
     "\n",
     "{glean_search_result_documents}\n",
@@ -227,7 +228,7 @@
    "source": [
     "from glean_example.src.agent import agent\n",
     "\n",
-    "msg = \"do I need to take PTO if I am sick\"\n",
+    "msg = \"What's the latest on the new API project?\"\n",
     "history = []\n",
     "history.append((\"user\", msg))\n",
     "response = agent.invoke({\"messages\": history})\n",