Update jupyter notebook references (NVIDIA#227)

emmanuel-ferdman · web-flow · commit 2cfd448ac171 · 2024-11-06T14:33:41.000-08:00
Signed-off-by: Emmanuel Ferdman &lt;emmanuelferdman@gmail.com&gt;
diff --git a/finetuning/Codegemma/lora.ipynb b/finetuning/Codegemma/lora.ipynb
@@ -9,7 +9,7 @@
     "\n",
     "CodeGemma is a groundbreaking new open model in the Gemini family of models from Google. CodeGemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. CodeGemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize CodeGemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n",
     "\n",
-    "In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [SFT notebook on Gemma Base Model](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/Gemma/sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which CodeGemma model you use. For the 7 billion parameter variant, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do.\n",
+    "In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [SFT notebook on Gemma Base Model](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/finetuning/Gemma/sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which CodeGemma model you use. For the 7 billion parameter variant, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do.\n",
     "\n",
     "We'll also learn how to export your custom model to TensorRT-LLM, an open-source library that accelerates and optimizes inference performance of the latest LLMs on the NVIDIA AI platform."
    ]
diff --git a/finetuning/StarCoder2/inference.ipynb b/finetuning/StarCoder2/inference.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In the previous [notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/StarCoder2/lora.ipynb), we show how to parameter efficiently finetune StarCoder2 model with a custom code (instruction, completion) pair dataset. We choose LoRA as our PEFT algorithnm and finetune for 50 interations. In this notebook, the goal is to demonstrate how to compile fintuned .nemo model into optimized TensorRT-LLM engines. The converted model engine can perform accelerated inference locally or be deployed to Triton Inference Server."
+    "In the previous [notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/finetuning/StarCoder2/lora.ipynb), we show how to parameter efficiently finetune StarCoder2 model with a custom code (instruction, completion) pair dataset. We choose LoRA as our PEFT algorithnm and finetune for 50 interations. In this notebook, the goal is to demonstrate how to compile fintuned .nemo model into optimized TensorRT-LLM engines. The converted model engine can perform accelerated inference locally or be deployed to Triton Inference Server."
    ]
   },
   {
diff --git a/finetuning/StarCoder2/lora.ipynb b/finetuning/StarCoder2/lora.ipynb
@@ -17,7 +17,7 @@
     "\n",
     "In this tutorial, we'll go over a popular Parameter-Efficient Fine-Tuning (PEFT) customization technique -- i.e. Low-Rank Adaptation (also known as LoRA) which enables the already upgraded StarCoder2 model to learn a new coding language or coding style.\n",
     "\n",
-    "Note that the subject 15B StarCoder2 model takes 30GB disk space and requires more than 80GB CUDA memory while performing PEFT on a single GPU. Therefore, the verified hardware configuration for this notebook and the subsequent [inference notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/StarCoder2/inference.ipynb) employ a single node machine with 8 80GB NVIDIA GPUs."
+    "Note that the subject 15B StarCoder2 model takes 30GB disk space and requires more than 80GB CUDA memory while performing PEFT on a single GPU. Therefore, the verified hardware configuration for this notebook and the subsequent [inference notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/finetuning/StarCoder2/inference.ipynb) employ a single node machine with 8 80GB NVIDIA GPUs."
    ]
   },
   {

Original file line number	Diff line number	Diff line change
`@@ -9,7 +9,7 @@`
`9`	`9`	`"\n",`
`10`	`10`	`"CodeGemma is a groundbreaking new open model in the Gemini family of models from Google. CodeGemma is just as powerful as previous models but compact enough to run locally on NVIDIA RTX GPUs. CodeGemma is available in 2 sizes: 2B and 7B parameters. With NVIDIA NeMo, you can customize CodeGemma to fit your usecase and deploy an optimized model on your NVIDIA GPU.\n",`
`11`	`11`	`"\n",`
`12`		- "In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [SFT notebook on Gemma Base Model](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/Gemma/sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which CodeGemma model you use. For the 7 billion parameter variant, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do.\n",
	`12`	+ "In this tutorial, we'll go over a specific kind of customization -- Low-rank adapter tuning to follow a specific output format (also known as LoRA). To learn how to perform full parameter supervised fine-tuning for instruction following (also known as SFT), see the [SFT notebook on Gemma Base Model](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/finetuning/Gemma/sft.ipynb). For LoRA, we'll perform all operations within the notebook on a single GPU. The compute resources needed for training depend on which CodeGemma model you use. For the 7 billion parameter variant, you'll need a GPU with 80GB of memory. For the 2 billion parameter model, 40GB will do.\n",
`13`	`13`	`"\n",`
`14`	`14`	`"We'll also learn how to export your custom model to TensorRT-LLM, an open-source library that accelerates and optimizes inference performance of the latest LLMs on the NVIDIA AI platform."`
`15`	`15`	`]`
Original file line number	Diff line number	Diff line change
`@@ -11,7 +11,7 @@`
`11`	`11`	`"cell_type": "markdown",`
`12`	`12`	`"metadata": {},`
`13`	`13`	`"source": [`
`14`		- "In the previous [notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/StarCoder2/lora.ipynb), we show how to parameter efficiently finetune StarCoder2 model with a custom code (instruction, completion) pair dataset. We choose LoRA as our PEFT algorithnm and finetune for 50 interations. In this notebook, the goal is to demonstrate how to compile fintuned .nemo model into optimized TensorRT-LLM engines. The converted model engine can perform accelerated inference locally or be deployed to Triton Inference Server."
	`14`	+ "In the previous [notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/finetuning/StarCoder2/lora.ipynb), we show how to parameter efficiently finetune StarCoder2 model with a custom code (instruction, completion) pair dataset. We choose LoRA as our PEFT algorithnm and finetune for 50 interations. In this notebook, the goal is to demonstrate how to compile fintuned .nemo model into optimized TensorRT-LLM engines. The converted model engine can perform accelerated inference locally or be deployed to Triton Inference Server."
`15`	`15`	`]`
`16`	`16`	`},`
`17`	`17`	`{`
Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@`
`17`	`17`	`"\n",`
`18`	`18`	`"In this tutorial, we'll go over a popular Parameter-Efficient Fine-Tuning (PEFT) customization technique -- i.e. Low-Rank Adaptation (also known as LoRA) which enables the already upgraded StarCoder2 model to learn a new coding language or coding style.\n",`
`19`	`19`	`"\n",`
`20`		`- "Note that the subject 15B StarCoder2 model takes 30GB disk space and requires more than 80GB CUDA memory while performing PEFT on a single GPU. Therefore, the verified hardware configuration for this notebook and the subsequent [inference notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/models/StarCoder2/inference.ipynb) employ a single node machine with 8 80GB NVIDIA GPUs."`
	`20`	`+ "Note that the subject 15B StarCoder2 model takes 30GB disk space and requires more than 80GB CUDA memory while performing PEFT on a single GPU. Therefore, the verified hardware configuration for this notebook and the subsequent [inference notebook](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/finetuning/StarCoder2/inference.ipynb) employ a single node machine with 8 80GB NVIDIA GPUs."`
`21`	`21`	`]`
`22`	`22`	`},`
`23`	`23`	`{`