-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
Thanks to the open-sourcing of the ERNIE-4.5 series—this is truly exciting.
We have added Megatron training support for both ERNIE-4.5 and ERNIE-4.5-MoE (CPT/SFT/DPO). For best practices, please refer to this PR: modelscope/ms-swift#4757
Training shell:
# 4 * 51GiB, 16s/it
CUDA_VISIBLE_DEVICES=0,1,2,3 \
megatron sft \
--load ERNIE-4.5-21B-A3B-PT-mcore \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
'AI-ModelScope/alpaca-gpt4-data-en#500' \
'swift/self-cognition#500' \
--expert_model_parallel_size 4 \
--moe_grouped_gemm true \
--moe_shared_expert_overlap true \
--moe_aux_loss_coeff 0.01 \
--micro_batch_size 4 \
--global_batch_size 16 \
--recompute_granularity full \
--recompute_method uniform \
--recompute_num_layers 1 \
--finetune true \
--cross_entropy_loss_fusion true \
--lr 1e-5 \
--lr_warmup_fraction 0.05 \
--min_lr 1e-6 \
--save megatron_output/ERNIE-4.5-21B-A3B-PT \
--eval_interval 100 \
--save_interval 100 \
--max_length 2048 \
--max_epochs 1 \
--num_workers 8 \
--dataset_num_proc 8 \
--no_save_optim true \
--no_save_rng true \
--sequence_parallel true \
--optimizer_cpu_offload true \
--use_precision_aware_optimizer true \
--attention_backend flash \
--model_author swift \
--model_name swift-robot
zhiqiu and DesmonDayzhiqiu and DesmonDay
Metadata
Metadata
Assignees
Labels
No labels
Activity
lugimzzz commentedon Jul 3, 2025
We're thrilled to see ERNIE model support integrated into MS-SWIFT! Thank you for making our model accessible through your excellent framework. Great work! 👏