You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This application implements a GPU-accelerated Retrieval-Augmented Generation (RAG) based Question-Answering system using NVIDIA Inference Microservices (NIMs) and the LlamaIndex framework. It allows users to upload documents, process them, and then ask questions about the content.
cd GenerativeAIExamples/community/llm_video_series/video_1_llm_assistant_cloud_app
18
+
```
19
+
20
+
2. Create a virtual environment (using Python 3.9 as an example):
21
+
- Using `venv`:
22
+
```
23
+
python3.9 -m venv venv
24
+
source venv/bin/activate
25
+
```
26
+
- Using `conda`:
27
+
```
28
+
conda create -n llm-assistant-env python=3.9
29
+
conda activate llm-assistant-env
30
+
```
31
+
32
+
3. Install the required Python libraries using the requirements.txt file:
33
+
```
34
+
pip install -r requirements.txt
35
+
```
36
+
37
+
4. Set up your NVIDIA API Key:
38
+
- Sign up for an NVIDIA API Key on [build.nvidia.com](build.nvidia.com) if you haven't already.
39
+
- Set the API key as an environment variable:
40
+
```
41
+
export NVIDIA_API_KEY='your-api-key-here'
42
+
```
43
+
- Alternatively, you can directly edit the script and add your API key to the line:
44
+
```python
45
+
os.environ["NVIDIA_API_KEY"] = 'nvapi-XXXXXXXXXXXXXXXXXXXXXX' #Add NVIDIA API Key
46
+
```
47
+
48
+
## Usage
49
+
50
+
1. Run the script:
51
+
```
52
+
python app.py
53
+
```
54
+
55
+
2. Open the provided URL in your web browser to access the Gradio interface.
56
+
57
+
3. Use the interface to:
58
+
- Upload document files
59
+
- Load and process the documents
60
+
- Ask questions about the loaded documents
61
+
62
+
## How It Works
63
+
64
+
1. **Document Loading**: Users can upload multiple document files through the Gradio interface.
65
+
66
+
2. **Document Processing**: The application uses LlamaIndex to read and process the uploaded documents, splitting them into chunks.
67
+
68
+
3. **Embedding and Indexing**: The processed documents are embedded using NVIDIA's embedding model and stored in a Milvus vector database.
69
+
70
+
4. **Question Answering**: Users can ask questions through the chat interface. The application uses NIM with Llama 3 70B Instruct hosted on cloud to generate responses based on the relevant information retrieved from the indexed documents.
71
+
72
+
## Customization
73
+
74
+
You can customize various aspects of the application:
75
+
76
+
- Change the chunk size for text splitting
77
+
- Use different NVIDIA or open-source models for embedding or language modeling
78
+
- Adjust the number of similar documents retrieved for each query
79
+
80
+
## Troubleshooting
81
+
82
+
If you encounter any issues:
83
+
84
+
1. Ensure your NVIDIA API Key is correctly set.
85
+
2. Check that all required libraries are installed correctly.
86
+
3. Verify that the Milvus database is properly initialized.
0 commit comments