aws · burneyh · Dec 11, 2023
diff --git a/...ativeai/llm-workshop/chatbot-apps/build_cahtbot_applications_using_rag_on_sagemaker.ipynb b/...ativeai/llm-workshop/chatbot-apps/build_cahtbot_applications_using_rag_on_sagemaker.ipynb
@@ -509,7 +509,7 @@
     "\n",
     "To achieve that, we will do following.\n",
     "\n",
-    "1. **Generate embedings for each of document in the knowledge library with Huggingface all-MiniLM-L6-v2 embedding model.**\n",
+    "1. **Generate embeddings for each of document in the knowledge library with Huggingface all-MiniLM-L6-v2 embedding model.**\n",
     "2. **Identify top K most relevant documents based on user query.**\n",
     "    - 2.1 **For a query of your interest, generate the embedding of the query using the same embedding model.**\n",
     "    - 2.2 **Search the indexes of top K most relevant documents in the embedding space using in-memory Faiss search.**\n",
@@ -689,7 +689,7 @@
    "id": "cfe4a131-9b09-4141-96c5-6a13751e99ff",
    "metadata": {},
    "source": [
-    "We generate embedings for each of document in the knowledge library with Huggingface all-MiniLM-L6-v2 embedding model.documents"
+    "We generate embeddings for each of document in the knowledge library with Huggingface all-MiniLM-L6-v2 embedding model.documents"
    ]
   },
   {

diff --git a/...wering_retrieval_augmented_generation/question_answering_Cohere+langchain_jumpstart.ipynb b/...wering_retrieval_augmented_generation/question_answering_Cohere+langchain_jumpstart.ipynb
@@ -253,7 +253,7 @@
    "source": [
     "## Step 2. Ask a question to LLM without providing the context\n",
     "\n",
-    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and anwering problem. Let's directly ask the model a question and see how they respond."
+    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and answering problem. Let's directly ask the model a question and see how they respond."
    ]
   },
   {
@@ -390,7 +390,7 @@
     "\n",
     "To achieve that, we will do following.\n",
     "\n",
-    "1. **Generate embedings for each of document in the knowledge library with Cohere Multilingual embedding model.**\n",
+    "1. **Generate embeddings for each of document in the knowledge library with Cohere Multilingual embedding model.**\n",
     "2. **Identify top K most relevant documents based on user query.**\n",
     "    - 2.1 **For a query of your interest, generate the embedding of the query using the same embedding model.**\n",
     "    - 2.2 **Search the indexes of top K most relevant documents in the embedding space using in-memory Faiss search.**\n",
@@ -741,7 +741,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Firstly, we **generate embedings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**"
+    "Firstly, we **generate embeddings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**"
    ]
   },
   {
@@ -788,7 +788,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Print out the top 3 most relevant docuemnts as below."
+    "Print out the top 3 most relevant documents as below."
    ]
   },
   {
@@ -839,7 +839,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Send the top 3 most relevant docuemnts and question into LLM to get a answer."
+    "Send the top 3 most relevant documents and question into LLM to get a answer."
    ]
   },
   {

diff --git a/.../question_answering_retrieval_augmented_generation/question_answering_jumpstart_knn.ipynb b/.../question_answering_retrieval_augmented_generation/question_answering_jumpstart_knn.ipynb
@@ -195,7 +195,7 @@
    "source": [
     "## Step 2. Ask a question to LLM without providing the context\n",
     "\n",
-    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and anwering problem. Let's directly ask the model a question and see how they respond."
+    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and answering problem. Let's directly ask the model a question and see how they respond."
    ]
   },
   {
@@ -302,7 +302,7 @@
     "\n",
     "To achieve that, we will do following.\n",
     "\n",
-    "* **Generate embedings for each of document in the knowledge library with the GPT-J-6B embedding model.**\n",
+    "* **Generate embeddings for each of document in the knowledge library with the GPT-J-6B embedding model.**\n",
     "* **Identify top K most relevant documents based on user query.**\n",
     "    * **For a query of your interest, generate the embedding of the query using the same embedding model.**\n",
     "    * **Search the indexes of top K most relevant documents in the embedding space using the SageMaker KNN algorithm.**\n",
@@ -405,7 +405,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4.2. Generate embedings for each of document in the knowledge library with the GPT-J-6B embedding model.\n",
+    "### 4.2. Generate embeddings for each of document in the knowledge library with the GPT-J-6B embedding model.\n",
     "\n",
     "For the purpose of the demo we will use [Amazon SageMaker FAQs](https://aws.amazon.com/sagemaker/faqs/) as knowledge library. The data are formatted in a CSV file with two columns Question and Answer. We use **only** the Answer column as the documents of knowledge library, from which relevant documents are retrieved based on a query. \n",
     "\n",
@@ -541,7 +541,7 @@
     "1. Start a training job to index the embedding knowledge data. The underlying algorithm used to index the data is [Faiss](https://github.com/facebookresearch/faiss).\n",
     "2. Start an endpoint to take the embedding of the query as input and return the top K nearest indexes of the documents.\n",
     "\n",
-    "**Note.** For the KNN training job, the features are N by P matrix, where N is the number of documetns in the knowledge library, P is the embedding dimension, and each row corresponds to an embedding of a document. The labels are ordinal integers starting from 0. During inference, given an embedding of query, the labels of the top K nearest documents with respect to the query are used as indexes to retrieve the corresponded textual documents.\n",
+    "**Note.** For the KNN training job, the features are N by P matrix, where N is the number of documents in the knowledge library, P is the embedding dimension, and each row corresponds to an embedding of a document. The labels are ordinal integers starting from 0. During inference, given an embedding of query, the labels of the top K nearest documents with respect to the query are used as indexes to retrieve the corresponded textual documents.\n",
     "\n",
     "\n"
    ]
@@ -649,7 +649,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Deploy the KNN endpoint for retrieving indexes of top K most relevant docuemnts."
+    "Deploy the KNN endpoint for retrieving indexes of top K most relevant documents."
    ]
   },
   {
@@ -740,7 +740,7 @@
     "context_embed_retrieve = construct_context(context_predictions_arr, df_knowledge[\"Answer\"])\n",
     "\n",
     "print(\n",
-    "    f\"{newline}{bold}Elastic time for computing the embedding of a query and retrieved the top K most relevant docuemnts: {time.time() - start} seconds.{unbold}{newline}\"\n",
+    "    f\"{newline}{bold}Elastic time for computing the embedding of a query and retrieved the top K most relevant documents: {time.time() - start} seconds.{unbold}{newline}\"\n",
     ")"
    ]
   },

diff --git a/...ion_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb b/...ion_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb
@@ -330,7 +330,7 @@
     "\n",
     "To achieve that, we will do following.\n",
     "\n",
-    "1. **Generate embedings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**\n",
+    "1. **Generate embeddings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**\n",
     "2. **Identify top K most relevant documents based on user query.**\n",
     "    - 2.1 **For a query of your interest, generate the embedding of the query using the same embedding model.**\n",
     "    - 2.2 **Search the indexes of top K most relevant documents in the embedding space using in-memory Faiss search.**\n",
@@ -670,7 +670,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Firstly, we **generate embedings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**"
+    "Firstly, we **generate embeddings for each of document in the knowledge library with SageMaker GPT-J-6B embedding model.**"
    ]
   },
   {
@@ -711,7 +711,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Print out the top 3 most relevant docuemnts as below."
+    "Print out the top 3 most relevant documents as below."
    ]
   },
   {
@@ -756,7 +756,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Send the top 3 most relevant docuemnts and question into LLM to get a answer."
+    "Send the top 3 most relevant documents and question into LLM to get a answer."
    ]
   },
   {

diff --git a/...wering_retrieval_augmented_generation/question_answering_pinecone_llama-2_jumpstart.ipynb b/...wering_retrieval_augmented_generation/question_answering_pinecone_llama-2_jumpstart.ipynb
@@ -128,7 +128,7 @@
    "source": [
     "## Step 2. Ask a question to LLM without providing the context\n",
     "\n",
-    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and anwering problem. Let's directly ask the model a question and see how they respond."
+    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and answering problem. Let's directly ask the model a question and see how they respond."
    ]
   },
   {
@@ -335,7 +335,7 @@
     "\n",
     "To achieve that, we will do following.\n",
     "\n",
-    "* Generate embedings for each of document in the knowledge library with the MiniLM embedding model.\n",
+    "* Generate embeddings for each of document in the knowledge library with the MiniLM embedding model.\n",
     "* Identify top K most relevant documents based on user query.\n",
     "    * For a query of your interest, generate the embedding of the query using the same embedding model.\n",
     "    * Search the indexes of top K most relevant documents in the embedding space using the SageMaker KNN algorithm.\n",