Readme: Fix broken links in top level readme (NVIDIA#79)

shubhadeepd · web-flow · commit eff83618a4bc · 2024-03-26T23:28:54.000+05:30
diff --git a/README.md b/README.md
@@ -7,7 +7,8 @@
 State-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.
 
 ## NVIDIA NGC
-Generative AI Examples uses resources from the [NVIDIA NGC AI Development Catalog](https://ngc.nvidia.com).
+
+Generative AI Examples can use models and GPUs from the [NVIDIA NGC: AI Development Catalog](https://catalog.ngc.nvidia.com).
 
 Sign up for a [free NGC developer account](https://ngc.nvidia.com/signin) to access:
 
@@ -16,64 +17,72 @@ Sign up for a [free NGC developer account](https://ngc.nvidia.com/signin) to acc
 
 ## Retrieval Augmented Generation (RAG)
 
-A RAG pipeline embeds multimodal data --  such as documents, images, and video -- into a database connected to a LLM.  RAG lets users chat with their data!
+A RAG pipeline embeds multimodal data --  such as documents, images, and video -- into a database connected to a LLM.
+RAG lets users chat with their data!
 
 ### Developer RAG Examples
 
-The developer RAG examples run on a single VM. They demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's [open source connectors](#open-source-integrations). The examples are easy to deploy via [Docker Compose](https://docs.docker.com/compose/).
-
-Examples support local and remote inference endpoints. If you have a GPU, you can inference locally via [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). If you don't have a GPU, you can inference and embed remotely via [NVIDIA AI Foundations endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/).
-
-| Model         | Embedding           | Framework        | Description               | Multi-GPU | TRT-LLM | NVIDIA AI Foundation | Triton | Vector Database |
-|---------------|-----------------------|------------|-------------------------|-----------|------------|-------------|---------|--------|
-| llama-2 | e5-large-v2 | Llamaindex | Canonical QA Chatbot | [YES](RetrievalAugmentedGeneration/README.md#3-qa-chatbot-multi-gpu----a100h100l40s)        | [YES](RetrievalAugmentedGeneration/README.md#2-qa-chatbot----a100h100l40s-gpu)       | No | YES     | Milvus/[PGVector]((RetrievalAugmentedGeneration/README.md#2-qa-chatbot----a100h100l40s-gpu))|
-| mixtral_8x7b | nvolveqa_40k | Langchain | [Nvidia AI foundation based QA Chatbot](RetrievalAugmentedGeneration/README.md#1-qa-chatbot----nvidia-ai-foundation-inference-endpoint)  | No        | No       | YES | YES     | Milvus|
-| llama-2 | all-MiniLM-L6-v2 | Llama Index | [QA Chatbot, GeForce, Windows](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)  | NO        | YES        | NO | NO     | FAISS |
-| llama-2 | nvolveqa_40k | Langchain | [QA Chatbot, Task Decomposition Agent](./RetrievalAugmentedGeneration/README.md#5-qa-chatbot-with-task-decomposition-example----a100h100l40s) | No | No | YES | YES | FAISS
-| mixtral_8x7b | nvolveqa_40k | Langchain | [Minimilastic example showcasing RAG using Nvidia AI foundation models](./examples/README.md#rag-in-5-minutes-example)  | No        | No       | YES | YES     | FAISS|
-
-
+The developer RAG examples run on a single VM.
+The examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's [open source connectors](#open-source-integrations).
+The examples are easy to deploy with [Docker Compose](https://docs.docker.com/compose/).
+
+Examples support local and remote inference endpoints.
+If you have a GPU, you can inference locally with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
+If you don't have a GPU, you can inference and embed remotely with [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
+
+| Model                              | Embedding        | Framework  | Description                                                                                                                                                                                               | Multi-GPU                                                                  | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database    |
+| ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
+| mixtral_8x7b                       | nvolveqa_40k     | LangChain  | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)]                | No                                                                         | No      | Yes              | Yes    | Milvus or pgvector |
+| llama-2                            | e5-large-v2      | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)]                                        | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes     | No               | Yes    | Milvus or pgvector |
+| llama-2                            | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)]                                                                                                       | No                                                                         | Yes     | No               | No     | FAISS              |
+| llama-2                            | nvolveqa_40k     | LangChain  | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No                                                                         | No      | Yes              | Yes    | Milvus or pgvector |
+| mixtral_8x7b                       | nvolveqa_40k     | LangChain  | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)]                                               | No                                                                         | No      | Yes              | Yes    | FAISS              |
+| mixtral_8x7b<br>Deplot<br>Neva-22b | nvolveqa_40k     | Custom     | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)]                        | No                                                                         | No      | Yes              | No     | Milvus or pvgector |
+| llama-2                            | e5-large-v2      | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)]                                                                                 | Yes                                                                        | Yes     | No               | Yes    | Milvus or pgvector |
+| mixtral_8x7b                       | none             | PandasAI   | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)]                   | No                                                                         | No      | Yes              | No     | none               |
+| llama-2                            | nvolveqa_40k     | LangChain  | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)]                     | No                                                                         | No      | Yes              | No     | Milvus or pgvector |
 
 ### Enterprise RAG Examples
 
-The enterprise RAG examples run as microservies distributed across multiple VMs and GPUs. They show how RAG pipelines can be orchestrated with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).
+The enterprise RAG examples run as microservices distributed across multiple VMs and GPUs.
+These examples show how to orchestrate RAG pipelines with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).
 
-Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management. It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
+Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management.
+It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
 
-Enterprise RAG examples also support local and remote inference via [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA AI Foundations endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/).
+Enterprise RAG examples also support local and remote inference with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
 
-| Model         | Embedding           | Framework        | Description               | Multi-GPU | Multi-node | TRT-LLM | NVIDIA AI Foundation | Triton | Vector Database |
-|---------------|-----------------------|------------|--------|-------------------------|-----------|------------|-------------|---------|--------|
-| llama-2 | NV-Embed-QA | Llamaindex | QA Chatbot, Helm, k8s  | NO        | NO | [YES](./docs/developer-llm-operator/)         | NO | YES     | Milvus|
+| Model   | Embedding   | Framework  | Description                                                                | Multi-GPU | Multi-node | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
+| ------- | ----------- | ---------- | -------------------------------------------------------------------------- | --------- | ---------- | ------- | ---------------- | ------ | --------------- |
+| llama-2 | NV-Embed-QA | LlamaIndex | Chat bot, Kubernetes deployment [[README](./docs/developer-llm-operator/)] | No        | No         | Yes     | No               | Yes    | Milvus          |
 
 ## Tools
 
 Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.
 
-| Name | Description | Deployment | Tutorial |
-|------|-------------|------|--------|
-| Evaluation | Example open source RAG eval tool that uses synthetic data generation and LLM-as-a-judge |  [Docker compose file](./deploy/compose/docker-compose-evaluation.yaml) | [README](./docs/rag/evaluation.md) |]
-| Observability | Observability serves as an efficient mechanism for both monitoring and debugging RAG pipelines. |  [Docker compose file](./deploy/compose/docker-compose-observability.yaml) | [README](./docs/rag/observability.md) |]
+| Name          | Description                                                                                                                                                                   | NVIDIA Endpoints |
+| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------- |
+| Evaluation    | RAG evaluation using synthetic data generation and LLM-as-a-judge [[code](./tools/evaluation/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/evaluation.html)] | Yes              |
+| Observability | Monitoring and debugging RAG pipelines [[code](./tools/observability/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/observability.html)]                      | Yes              |
 
 ## Open Source Integrations
 
 These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.
 
 | Name | Framework | Chat | Text Embedding | Python | Description |
-|------|-----------|------|-----------|--------|-------------|
-|[NVIDIA AI Foundation Endpoints](https://python.langchain.com/docs/integrations/providers/nvidia) | [Langchain](https://www.langchain.com/) |[YES](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints)|[YES](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints)|[YES](https://pypi.org/project/langchain-nvidia-ai-endpoints/)|Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.|
-|[NVIDIA Triton + TensorRT-LLM](https://github.com/langchain-ai/langchain/tree/master/libs/partners/nvidia-trt) | [Langchain](https://www.langchain.com/) |[YES](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[YES](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[YES](https://pypi.org/project/langchain-nvidia-trt/)|This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.|
-|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html) | [LlamaIndex](https://www.llamaindex.ai/) |YES|YES|NO|Triton inference server provides API access to hosted LLM models over gRPC. |
-|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html) | [LlamaIndex](https://www.llamaindex.ai/) |YES|YES|NO|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |
+|------|-----------|------|----------------|--------|-------------|
+|[NVIDIA AI Foundation Endpoints](https://python.langchain.com/docs/integrations/providers/nvidia) | [Langchain](https://www.langchain.com/) |[Yes](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints)|[Yes](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints)|[Yes](https://pypi.org/project/langchain-nvidia-ai-endpoints/)|Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.|
+|[NVIDIA Triton + TensorRT-LLM](https://github.com/langchain-ai/langchain/tree/master/libs/partners/nvidia-trt) | [Langchain](https://www.langchain.com/) |[Yes](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[Yes](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[Yes](https://pypi.org/project/langchain-nvidia-trt/)|This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.|
+|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|Triton inference server provides API access to hosted LLM models over gRPC. |
+|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |
 
+## Support, Feedback, and Contributing
 
-## NVIDIA support
-In each example README we indicate the level of support provided.
+We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback.
+We invite contributions via GitHub Issues or pull requests!
 
-## Feedback / Contributions
-We're posting these examples on GitHub to support the NVIDIA LLM community, facilitate feedback. We invite contributions via GitHub Issues or pull requests!
+## Known Issues
 
-## Known issues
-- In each of the READMEs, we indicate any known issues and encourage the community to provide feedback.
-- The datasets provided as part of this project is under a different license for research and evaluation purposes.
-- This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
+- Some known issues are identified as TODOs in the Python code.
+- The datasets provided as part of this project are under a different license for research and evaluation purposes.
+- This project downloads and installs third-party open source software projects. Review the license terms of these open source projects before use.