Skip to content

🚀[Fine-tuning] ERNIE-4.5-MoE Megatron Training Implementation and Best Practices👋 #966

@Jintao-Huang

Description

@Jintao-Huang

Thanks to the open-sourcing of the ERNIE-4.5 series—this is truly exciting.

We have added Megatron training support for both ERNIE-4.5 and ERNIE-4.5-MoE (CPT/SFT/DPO). For best practices, please refer to this PR: modelscope/ms-swift#4757

Training shell:

# 4 * 51GiB, 16s/it
CUDA_VISIBLE_DEVICES=0,1,2,3 \
megatron sft \
    --load ERNIE-4.5-21B-A3B-PT-mcore \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
    --expert_model_parallel_size 4 \
    --moe_grouped_gemm true \
    --moe_shared_expert_overlap true \
    --moe_aux_loss_coeff 0.01 \
    --micro_batch_size 4 \
    --global_batch_size 16 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --finetune true \
    --cross_entropy_loss_fusion true \
    --lr 1e-5 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-6 \
    --save megatron_output/ERNIE-4.5-21B-A3B-PT \
    --eval_interval 100 \
    --save_interval 100 \
    --max_length 2048 \
    --max_epochs 1 \
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim true \
    --no_save_rng true \
    --sequence_parallel true \
    --optimizer_cpu_offload true \
    --use_precision_aware_optimizer true \
    --attention_backend flash \
    --model_author swift \
    --model_name swift-robot

Training GPU memory usage:
image

Training log:
image

Results:
image

Activity

lugimzzz

lugimzzz commented on Jul 3, 2025

@lugimzzz
Contributor

We're thrilled to see ERNIE model support integrated into MS-SWIFT! Thank you for making our model accessible through your excellent framework. Great work! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    🚀[Fine-tuning] ERNIE-4.5-MoE Megatron Training Implementation and Best Practices👋 · Issue #966 · PaddlePaddle/ERNIE