在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group. #4086

Gpwner · 2025-05-05T12:07:07Z

Describe the bug
在华为NPU上 SFT Qwen3 8B 的时候出现了这个错误：

run sh: `/home/ma-user/anaconda3/envs/PyTorch-2.1.0/bin/python3.9 /share/code/ms-swift/swift/cli/sft.py --model_type=qwen3 --dataset=/share/code/QwenInfer/gigaspeech_continuation_qwen.jsonl --model=/share/code/Qwen3-8B --num_train_epochs=5 --train_type=full --output_dir=outputs --eval_steps=1000 --save_steps=1000 --device_map=npu --ddp_backend hccl --per_device_train_batch_size=30 --dataloader_num_workers=20 --lazy_tokenize true --torch_dtype=bfloat16 --check_model=false --max_length=2048 --learning_rate=1e-3 --warmup_steps=1000 --lr_scheduler_type=cosine --dataset_prefix=/share/DATA/ --gradient_accumulation_steps=1 --dataset_num_proc=2 --save_total_limit=5`
[2025-05-05 20:04:57,571] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to npu (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[INFO:swift] Successfully registered `/share/code/ms-swift/swift/llm/dataset/data/dataset_info.json`.
[INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Loading the model using model_dir: /share/code/Qwen3-8B
Traceback (most recent call last):
  File "/share/code/ms-swift/swift/cli/sft.py", line 7, in <module>
    sft_main()
  File "/share/code/ms-swift/swift/llm/train/sft.py", line 281, in sft_main
    return SwiftSft(args).main()
  File "/share/code/ms-swift/swift/llm/train/sft.py", line 29, in __init__
    super().__init__(args)
  File "/share/code/ms-swift/swift/llm/base.py", line 18, in __init__
    self.args = self._parse_args(args)
  File "/share/code/ms-swift/swift/llm/base.py", line 30, in _parse_args
    args, remaining_argv = parse_args(self.args_class, args)
  File "/share/code/ms-swift/swift/utils/utils.py", line 151, in parse_args
    args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/hf_argparser.py", line 358, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 303, in __init__
  File "/share/code/ms-swift/swift/llm/argument/train_args.py", line 170, in __post_init__
    self.training_args = TrainerFactory.get_training_args(self)
  File "/share/code/ms-swift/swift/trainers/trainer_factory.py", line 64, in get_training_args
    return training_args_cls(**args_dict)
  File "<string>", line 152, in __init__
  File "/share/code/ms-swift/swift/trainers/arguments.py", line 132, in __post_init__
    super().__post_init__()
  File "/share/code/ms-swift/swift/trainers/arguments.py", line 118, in __post_init__
    super().__post_init__()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/training_args.py", line 1761, in __post_init__
    self.device
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/training_args.py", line 2297, in device
    return self._setup_devices
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/utils/generic.py", line 67, in __get__
    cached = self.fget(obj)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/training_args.py", line 2224, in _setup_devices
    self.distributed_state = PartialState(**accelerator_state_kwargs)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/state.py", line 271, in __init__
    self.num_processes = torch.distributed.get_world_size()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1492, in get_world_size
    return _get_group_size(group)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 785, in _get_group_size
    default_pg = _get_default_group()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 940, in _get_default_group
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Your hardware and system info
OS：

VERSION="2.0 (aarch64)"
ID="hce"
VERSION_ID="2.0"
PRETTY_NAME="Huawei Cloud EulerOS 2.0 (aarch64)"
ANSI_COLOR="0;31"

NPU:

+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0                   Version: 24.1.0                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B2               | OK            | 93.6        49                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3383 / 65536         |
+===========================+===============+====================================================+
| 1     910B2               | OK            | 95.4        51                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3367 / 65536         |
+===========================+===============+====================================================+
| 2     910B2               | OK            | 90.5        48                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3367 / 65536         |
+===========================+===============+====================================================+
| 3     910B2               | OK            | 91.6        51                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 4     910B2               | OK            | 91.5        49                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 5     910B2               | OK            | 97.1        52                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 6     910B2               | OK            | 98.0        49                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 7     910B2               | OK            | 94.1        51                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3366 / 65536

Additional context
SFT 脚本：

export HCCL_ASYNC_ERROR_HANDLING=0
export HCCL_CONNECT_TIMEOUT=7200
export HCCL_EXEC_TIMEOUT=7200
export HCCL_IF_BASE_PORT=64000

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \


swift sft \
--model_type=qwen3 \
--dataset=/share/code/QwenInfer/gigaspeech_continuation_qwen.jsonl \
--model=/share/code/Qwen3-8B \
--num_train_epochs=5 \
--train_type=full \
--output_dir=outputs \
--eval_steps=1000 \
--save_steps=1000 \
--device_map=npu \
--ddp_backend hccl \
--per_device_train_batch_size=30 \
--dataloader_num_workers=20 \
--lazy_tokenize true \
--torch_dtype=bfloat16 \
--check_model=false \
--max_length=2048 \
--learning_rate=1e-3 \
--warmup_steps=1000 \
--lr_scheduler_type=cosine \
--dataset_prefix=/share/DATA/ \
--gradient_accumulation_steps=1 \
--dataset_num_proc=2  \
--save_total_limit=5

感谢！

The text was updated successfully, but these errors were encountered:

Jintao-Huang · 2025-05-05T12:13:53Z

shell中间的空行需要删除

NPROC_PER_NODE=8 \
swift sft \

Gpwner · 2025-05-05T12:49:53Z

感谢感谢哈

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group. #4086

在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group. #4086

Gpwner commented May 5, 2025

Jintao-Huang commented May 5, 2025 •

edited

Loading

Gpwner commented May 5, 2025

在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group. #4086

在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group. #4086

Comments

Gpwner commented May 5, 2025

Jintao-Huang commented May 5, 2025 • edited Loading

Gpwner commented May 5, 2025

Jintao-Huang commented May 5, 2025 •

edited

Loading