Skip to content

Commit eff8361

Browse files
authored
Readme: Fix broken links in top level readme (NVIDIA#79)
1 parent 6de0008 commit eff8361

File tree

1 file changed

+47
-38
lines changed

1 file changed

+47
-38
lines changed

README.md

Lines changed: 47 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77
State-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.
88

99
## NVIDIA NGC
10-
Generative AI Examples uses resources from the [NVIDIA NGC AI Development Catalog](https://ngc.nvidia.com).
10+
11+
Generative AI Examples can use models and GPUs from the [NVIDIA NGC: AI Development Catalog](https://catalog.ngc.nvidia.com).
1112

1213
Sign up for a [free NGC developer account](https://ngc.nvidia.com/signin) to access:
1314

@@ -16,64 +17,72 @@ Sign up for a [free NGC developer account](https://ngc.nvidia.com/signin) to acc
1617

1718
## Retrieval Augmented Generation (RAG)
1819

19-
A RAG pipeline embeds multimodal data -- such as documents, images, and video -- into a database connected to a LLM. RAG lets users chat with their data!
20+
A RAG pipeline embeds multimodal data -- such as documents, images, and video -- into a database connected to a LLM.
21+
RAG lets users chat with their data!
2022

2123
### Developer RAG Examples
2224

23-
The developer RAG examples run on a single VM. They demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's [open source connectors](#open-source-integrations). The examples are easy to deploy via [Docker Compose](https://docs.docker.com/compose/).
24-
25-
Examples support local and remote inference endpoints. If you have a GPU, you can inference locally via [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). If you don't have a GPU, you can inference and embed remotely via [NVIDIA AI Foundations endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/).
26-
27-
| Model | Embedding | Framework | Description | Multi-GPU | TRT-LLM | NVIDIA AI Foundation | Triton | Vector Database |
28-
|---------------|-----------------------|------------|-------------------------|-----------|------------|-------------|---------|--------|
29-
| llama-2 | e5-large-v2 | Llamaindex | Canonical QA Chatbot | [YES](RetrievalAugmentedGeneration/README.md#3-qa-chatbot-multi-gpu----a100h100l40s) | [YES](RetrievalAugmentedGeneration/README.md#2-qa-chatbot----a100h100l40s-gpu) | No | YES | Milvus/[PGVector]((RetrievalAugmentedGeneration/README.md#2-qa-chatbot----a100h100l40s-gpu))|
30-
| mixtral_8x7b | nvolveqa_40k | Langchain | [Nvidia AI foundation based QA Chatbot](RetrievalAugmentedGeneration/README.md#1-qa-chatbot----nvidia-ai-foundation-inference-endpoint) | No | No | YES | YES | Milvus|
31-
| llama-2 | all-MiniLM-L6-v2 | Llama Index | [QA Chatbot, GeForce, Windows](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0) | NO | YES | NO | NO | FAISS |
32-
| llama-2 | nvolveqa_40k | Langchain | [QA Chatbot, Task Decomposition Agent](./RetrievalAugmentedGeneration/README.md#5-qa-chatbot-with-task-decomposition-example----a100h100l40s) | No | No | YES | YES | FAISS
33-
| mixtral_8x7b | nvolveqa_40k | Langchain | [Minimilastic example showcasing RAG using Nvidia AI foundation models](./examples/README.md#rag-in-5-minutes-example) | No | No | YES | YES | FAISS|
34-
35-
25+
The developer RAG examples run on a single VM.
26+
The examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's [open source connectors](#open-source-integrations).
27+
The examples are easy to deploy with [Docker Compose](https://docs.docker.com/compose/).
28+
29+
Examples support local and remote inference endpoints.
30+
If you have a GPU, you can inference locally with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
31+
If you don't have a GPU, you can inference and embed remotely with [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
32+
33+
| Model | Embedding | Framework | Description | Multi-GPU | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
34+
| ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
35+
| mixtral_8x7b | nvolveqa_40k | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
36+
| llama-2 | e5-large-v2 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
37+
| llama-2 | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)] | No | Yes | No | No | FAISS |
38+
| llama-2 | nvolveqa_40k | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
39+
| mixtral_8x7b | nvolveqa_40k | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
40+
| mixtral_8x7b<br>Deplot<br>Neva-22b | nvolveqa_40k | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
41+
| llama-2 | e5-large-v2 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
42+
| mixtral_8x7b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
43+
| llama-2 | nvolveqa_40k | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
3644

3745
### Enterprise RAG Examples
3846

39-
The enterprise RAG examples run as microservies distributed across multiple VMs and GPUs. They show how RAG pipelines can be orchestrated with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).
47+
The enterprise RAG examples run as microservices distributed across multiple VMs and GPUs.
48+
These examples show how to orchestrate RAG pipelines with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).
4049

41-
Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management. It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
50+
Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management.
51+
It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
4252

43-
Enterprise RAG examples also support local and remote inference via [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA AI Foundations endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/).
53+
Enterprise RAG examples also support local and remote inference with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
4454

45-
| Model | Embedding | Framework | Description | Multi-GPU | Multi-node | TRT-LLM | NVIDIA AI Foundation | Triton | Vector Database |
46-
|---------------|-----------------------|------------|--------|-------------------------|-----------|------------|-------------|---------|--------|
47-
| llama-2 | NV-Embed-QA | Llamaindex | QA Chatbot, Helm, k8s | NO | NO | [YES](./docs/developer-llm-operator/) | NO | YES | Milvus|
55+
| Model | Embedding | Framework | Description | Multi-GPU | Multi-node | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
56+
| ------- | ----------- | ---------- | -------------------------------------------------------------------------- | --------- | ---------- | ------- | ---------------- | ------ | --------------- |
57+
| llama-2 | NV-Embed-QA | LlamaIndex | Chat bot, Kubernetes deployment [[README](./docs/developer-llm-operator/)] | No | No | Yes | No | Yes | Milvus |
4858

4959
## Tools
5060

5161
Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.
5262

53-
| Name | Description | Deployment | Tutorial |
54-
|------|-------------|------|--------|
55-
| Evaluation | Example open source RAG eval tool that uses synthetic data generation and LLM-as-a-judge | [Docker compose file](./deploy/compose/docker-compose-evaluation.yaml) | [README](./docs/rag/evaluation.md) |]
56-
| Observability | Observability serves as an efficient mechanism for both monitoring and debugging RAG pipelines. | [Docker compose file](./deploy/compose/docker-compose-observability.yaml) | [README](./docs/rag/observability.md) |]
63+
| Name | Description | NVIDIA Endpoints |
64+
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------- |
65+
| Evaluation | RAG evaluation using synthetic data generation and LLM-as-a-judge [[code](./tools/evaluation/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/evaluation.html)] | Yes |
66+
| Observability | Monitoring and debugging RAG pipelines [[code](./tools/observability/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/observability.html)] | Yes |
5767

5868
## Open Source Integrations
5969

6070
These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.
6171

6272
| Name | Framework | Chat | Text Embedding | Python | Description |
63-
|------|-----------|------|-----------|--------|-------------|
64-
|[NVIDIA AI Foundation Endpoints](https://python.langchain.com/docs/integrations/providers/nvidia) | [Langchain](https://www.langchain.com/) |[YES](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints)|[YES](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints)|[YES](https://pypi.org/project/langchain-nvidia-ai-endpoints/)|Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.|
65-
|[NVIDIA Triton + TensorRT-LLM](https://github.com/langchain-ai/langchain/tree/master/libs/partners/nvidia-trt) | [Langchain](https://www.langchain.com/) |[YES](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[YES](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[YES](https://pypi.org/project/langchain-nvidia-trt/)|This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.|
66-
|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html) | [LlamaIndex](https://www.llamaindex.ai/) |YES|YES|NO|Triton inference server provides API access to hosted LLM models over gRPC. |
67-
|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html) | [LlamaIndex](https://www.llamaindex.ai/) |YES|YES|NO|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |
73+
|------|-----------|------|----------------|--------|-------------|
74+
|[NVIDIA AI Foundation Endpoints](https://python.langchain.com/docs/integrations/providers/nvidia) | [Langchain](https://www.langchain.com/) |[Yes](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints)|[Yes](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints)|[Yes](https://pypi.org/project/langchain-nvidia-ai-endpoints/)|Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.|
75+
|[NVIDIA Triton + TensorRT-LLM](https://github.com/langchain-ai/langchain/tree/master/libs/partners/nvidia-trt) | [Langchain](https://www.langchain.com/) |[Yes](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[Yes](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/trt/docs/llms.ipynb)|[Yes](https://pypi.org/project/langchain-nvidia-trt/)|This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.|
76+
|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|Triton inference server provides API access to hosted LLM models over gRPC. |
77+
|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |
6878

79+
## Support, Feedback, and Contributing
6980

70-
## NVIDIA support
71-
In each example README we indicate the level of support provided.
81+
We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback.
82+
We invite contributions via GitHub Issues or pull requests!
7283

73-
## Feedback / Contributions
74-
We're posting these examples on GitHub to support the NVIDIA LLM community, facilitate feedback. We invite contributions via GitHub Issues or pull requests!
84+
## Known Issues
7585

76-
## Known issues
77-
- In each of the READMEs, we indicate any known issues and encourage the community to provide feedback.
78-
- The datasets provided as part of this project is under a different license for research and evaluation purposes.
79-
- This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
86+
- Some known issues are identified as TODOs in the Python code.
87+
- The datasets provided as part of this project are under a different license for research and evaluation purposes.
88+
- This project downloads and installs third-party open source software projects. Review the license terms of these open source projects before use.

0 commit comments

Comments
 (0)