We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 94527a0 commit 7aff172Copy full SHA for 7aff172
torchtitan/models/deepseek_v3/README.md
@@ -33,3 +33,16 @@ CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml"
33
- Activation checkpointing
34
- Tensor Parallel (TP)
35
- Expert Parallel (EP)
36
+
37
38
+## To be added
39
+- Modeling
40
+ - Merge DeepSeek-V3 and Llama4 MoE common components
41
+- Parallelism
42
+ - Context Parallel support for DeepSeek-V3
43
+ - PP support for DeepSeek-V3
44
+- torch.compile
45
+- Quantization
46
+- Testing
47
+ - perfomance and loss converging tests
48
+ - CI integration
0 commit comments