You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -233,8 +233,9 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
233
233
3.**LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
234
234
4.**TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
235
235
5.**vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
236
-
6.**AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
237
-
7.**Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
236
+
6.**LightLLM**: Supports efficient single-node or multi-node deployment for FP8 and BF16.
237
+
7.**AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
238
+
8.**Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
238
239
239
240
Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation.
240
241
@@ -328,11 +329,15 @@ For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy
328
329
329
330
[vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well.
330
331
331
-
### 6.6 Recommended Inference Functionality with AMD GPUs
332
+
### 6.6 Inference with LightLLM (recommended)
333
+
334
+
[LightLLM](https://github.com/ModelTC/lightllm/tree/main) v1.0.1 supports single-machine and multi-machine tensor parallel deployment for DeepSeek-R1 (FP8/BF16) and provides mixed-precision deployment, with more quantization modes continuously integrated. For more details, please refer to [LightLLM instructions](https://lightllm-en.readthedocs.io/en/latest/getting_started/quickstart.html). Additionally, LightLLM offers PD-disaggregation deployment for DeepSeek-V2, and the implementation of PD-disaggregation for DeepSeek-V3 is in development.
335
+
336
+
### 6.7 Recommended Inference Functionality with AMD GPUs
332
337
333
338
In collaboration with the AMD team, we have achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the [SGLang instructions](#63-inference-with-lmdeploy-recommended).
334
339
335
-
### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs
340
+
### 6.8 Recommended Inference Functionality with Huawei Ascend NPUs
336
341
The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
0 commit comments