You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/local_language_model_clients/TensorRTLLM.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# dspy.TensorRTModel
2
2
3
-
TensorRT LLM by Nvidia happens to be one of the most optimized inference engine to run opensource Large Language Models locally or in production.
3
+
TensorRT LLM by Nvidia happens to be one of the most optimized inference engines to run open-source Large Language Models locally or in production.
4
4
5
5
### Prerequisites
6
6
7
-
Install TensorRT LLM by the following instruction[here](https://nvidia.github.io/TensorRT-LLM/installation/linux.html). You need to install dspy inside the same docker environment in which tensorrt is installed.
7
+
Install TensorRT LLM by the following instructions[here](https://nvidia.github.io/TensorRT-LLM/installation/linux.html). You need to install `dspy` inside the same Docker environment in which `tensorrt` is installed.
8
8
9
-
In order to use this module, you should have model weights file in engine format. To understand how we convert weights in torch (from huggingface models) to tensorrt engine format, you can check out [this documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
9
+
In order to use this module, you should have the model weights file in engine format. To understand how we convert weights in torch (from HuggingFace models) to TensorRT engine format, you can check out [this documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
10
10
11
11
### Running TensorRT model inside dspy
12
12
@@ -36,7 +36,7 @@ If `use_py_session` is set to `False`, the following kwargs are supported (This
36
36
-**sink_token_length** (`int`, optional): The sink token length. Defaults to `1`.
37
37
38
38
39
-
> Please note that, you have done the build processes properly before applying these customization. Because lot of customization depends on how the model engine was built. You can learn more about[here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
39
+
> Please note that you need to complete the build processes properly before applying these customizations, because a lot of customization depends on how the model engine was built. You can learn more [here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
40
40
41
41
42
42
Now to run the model, we need to add the following code:
@@ -51,7 +51,7 @@ This gives this result:
51
51
["nobody is perfect, and we all have our own unique struggles and challenges. But what sets us apart is how we respond to those challenges. Do we let them define us, or do we use them as opportunities to grow and learn?\nI know that I have my own personal struggles, and I'm sure you do too. But I also know that we are capable of overcoming them, and becoming the best versions of ourselves. So let's embrace our imperfections, and use them to fuel our growth and success.\nRemember, nobody is perfect, but everybody has the potential to be amazing. So let's go out there and make it happen!"]
52
52
```
53
53
54
-
You can also invoke chat mode, by just changing the prompt to chat format like this:
54
+
You can also invoke chat mode by just changing the prompt to chat format like this:
0 commit comments