Skip to content

在NPU上SFTQwen3遇到Default process group has not been initialized, please make sure to call init_process_group. #4086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Gpwner opened this issue May 5, 2025 · 2 comments

Comments

@Gpwner
Copy link

Gpwner commented May 5, 2025

Describe the bug
在华为NPU上 SFT Qwen3 8B 的时候出现了这个错误:

Image

run sh: `/home/ma-user/anaconda3/envs/PyTorch-2.1.0/bin/python3.9 /share/code/ms-swift/swift/cli/sft.py --model_type=qwen3 --dataset=/share/code/QwenInfer/gigaspeech_continuation_qwen.jsonl --model=/share/code/Qwen3-8B --num_train_epochs=5 --train_type=full --output_dir=outputs --eval_steps=1000 --save_steps=1000 --device_map=npu --ddp_backend hccl --per_device_train_batch_size=30 --dataloader_num_workers=20 --lazy_tokenize true --torch_dtype=bfloat16 --check_model=false --max_length=2048 --learning_rate=1e-3 --warmup_steps=1000 --lr_scheduler_type=cosine --dataset_prefix=/share/DATA/ --gradient_accumulation_steps=1 --dataset_num_proc=2 --save_total_limit=5`
[2025-05-05 20:04:57,571] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to npu (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[INFO:swift] Successfully registered `/share/code/ms-swift/swift/llm/dataset/data/dataset_info.json`.
[INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Loading the model using model_dir: /share/code/Qwen3-8B
Traceback (most recent call last):
  File "/share/code/ms-swift/swift/cli/sft.py", line 7, in <module>
    sft_main()
  File "/share/code/ms-swift/swift/llm/train/sft.py", line 281, in sft_main
    return SwiftSft(args).main()
  File "/share/code/ms-swift/swift/llm/train/sft.py", line 29, in __init__
    super().__init__(args)
  File "/share/code/ms-swift/swift/llm/base.py", line 18, in __init__
    self.args = self._parse_args(args)
  File "/share/code/ms-swift/swift/llm/base.py", line 30, in _parse_args
    args, remaining_argv = parse_args(self.args_class, args)
  File "/share/code/ms-swift/swift/utils/utils.py", line 151, in parse_args
    args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/hf_argparser.py", line 358, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 303, in __init__
  File "/share/code/ms-swift/swift/llm/argument/train_args.py", line 170, in __post_init__
    self.training_args = TrainerFactory.get_training_args(self)
  File "/share/code/ms-swift/swift/trainers/trainer_factory.py", line 64, in get_training_args
    return training_args_cls(**args_dict)
  File "<string>", line 152, in __init__
  File "/share/code/ms-swift/swift/trainers/arguments.py", line 132, in __post_init__
    super().__post_init__()
  File "/share/code/ms-swift/swift/trainers/arguments.py", line 118, in __post_init__
    super().__post_init__()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/training_args.py", line 1761, in __post_init__
    self.device
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/training_args.py", line 2297, in device
    return self._setup_devices
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/utils/generic.py", line 67, in __get__
    cached = self.fget(obj)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/training_args.py", line 2224, in _setup_devices
    self.distributed_state = PartialState(**accelerator_state_kwargs)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/state.py", line 271, in __init__
    self.num_processes = torch.distributed.get_world_size()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1492, in get_world_size
    return _get_group_size(group)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 785, in _get_group_size
    default_pg = _get_default_group()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 940, in _get_default_group
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Your hardware and system info
OS:

VERSION="2.0 (aarch64)"
ID="hce"
VERSION_ID="2.0"
PRETTY_NAME="Huawei Cloud EulerOS 2.0 (aarch64)"
ANSI_COLOR="0;31"

NPU:

+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0                   Version: 24.1.0                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B2               | OK            | 93.6        49                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3383 / 65536         |
+===========================+===============+====================================================+
| 1     910B2               | OK            | 95.4        51                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3367 / 65536         |
+===========================+===============+====================================================+
| 2     910B2               | OK            | 90.5        48                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3367 / 65536         |
+===========================+===============+====================================================+
| 3     910B2               | OK            | 91.6        51                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 4     910B2               | OK            | 91.5        49                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 5     910B2               | OK            | 97.1        52                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 6     910B2               | OK            | 98.0        49                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3366 / 65536         |
+===========================+===============+====================================================+
| 7     910B2               | OK            | 94.1        51                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3366 / 65536         

Additional context
SFT 脚本:

export HCCL_ASYNC_ERROR_HANDLING=0
export HCCL_CONNECT_TIMEOUT=7200
export HCCL_EXEC_TIMEOUT=7200
export HCCL_IF_BASE_PORT=64000

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \


swift sft \
--model_type=qwen3 \
--dataset=/share/code/QwenInfer/gigaspeech_continuation_qwen.jsonl \
--model=/share/code/Qwen3-8B \
--num_train_epochs=5 \
--train_type=full \
--output_dir=outputs \
--eval_steps=1000 \
--save_steps=1000 \
--device_map=npu \
--ddp_backend hccl \
--per_device_train_batch_size=30 \
--dataloader_num_workers=20 \
--lazy_tokenize true \
--torch_dtype=bfloat16 \
--check_model=false \
--max_length=2048 \
--learning_rate=1e-3 \
--warmup_steps=1000 \
--lr_scheduler_type=cosine \
--dataset_prefix=/share/DATA/ \
--gradient_accumulation_steps=1 \
--dataset_num_proc=2  \
--save_total_limit=5


感谢!

@Jintao-Huang
Copy link
Collaborator

Jintao-Huang commented May 5, 2025

shell中间的空行需要删除

NPROC_PER_NODE=8 \
swift sft \

@Gpwner
Copy link
Author

Gpwner commented May 5, 2025

感谢感谢哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants