Skip to content

Commit 27028bb

Browse files
committed
[DSV3] Add PP support for DSV3
1 parent b74918a commit 27028bb

File tree

5 files changed

+403
-9
lines changed

5 files changed

+403
-9
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Download tokenizer:
2+
3+
```
4+
# DeepSeek tokenizer (automatically downloads tokenizer.json and tokenizer_config.json)
5+
python scripts/download_tokenizer.py --repo_id deepseek-ai/DeepSeek-V3
6+
```
7+
8+
Run:
9+
10+
Single GPU - debug_model
11+
```
12+
NGPU=1 LOG_RANK=0 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh
13+
```
14+
15+
FSDP:
16+
17+
```
18+
NGPU=8 LOG_RANK=0 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh --parallelism.data_parallel_shard_degree 8
19+
20+
# OOM
21+
NGPU=8 LOG_RANK=0 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml" ./run_train.sh --parallelism.data_parallel_shard_degree 8
22+
```
23+
24+
PP:
25+
26+
for additional logging use: TORCH_LOGS=+pp
27+
28+
```
29+
NGPU=2 LOG_RANK=0,1 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh --parallelism.pipeline_parallel_degree 2
30+
31+
NGPU=4 LOG_RANK=0,4 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh --parallelism.pipeline_parallel_degree 4
32+
33+
# works with AC=none, but why doesn't this work with AC=full?
34+
NGPU=8 LOG_RANK=0,7 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml" ./run_train.sh --parallelism.pipeline_parallel_degree 8 --parallelism.pipeline_parallel_schedule Interleaved1F1B
35+
```

torchtitan/models/deepseek_v3/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from torchtitan.protocols.train_spec import register_train_spec, TrainSpec
1616

1717
from .infra.parallelize import parallelize_deepseekv3
18+
from .infra.pipeline import pipeline_deepseekv3
1819
from .model.args import DeepSeekV3ModelArgs
1920
from .model.model import DeepSeekV3Model
2021

@@ -116,7 +117,7 @@
116117
cls=DeepSeekV3Model,
117118
config=deepseekv3_configs,
118119
parallelize_fn=parallelize_deepseekv3,
119-
pipelining_fn=None,
120+
pipelining_fn=pipeline_deepseekv3,
120121
build_optimizers_fn=build_optimizers,
121122
build_lr_schedulers_fn=build_lr_schedulers,
122123
build_dataloader_fn=build_hf_dataloader,

0 commit comments

Comments
 (0)