feat: support megatron wandb #4074

firefighter-eric · 2025-05-04T05:36:27Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Add support megatron wandb in ms-swift
Add arguments like megatron llm
https://github.com/NVIDIA/Megatron-LM/blob/89d758b28f57d71fe74a6e1b85d2f93a4302121a/megatron/training/arguments.py#L1514

Write the detail information belongs to this PR.

Experiment results

NPROC_PER_NODE=1 \
CUDA_VISIBLE_DEVICES=0 \
megatron sft \
    --load data/models/Qwen/Qwen3-0.6B-mcore \
    --dataset data/swift/Qwen3-SFT-Mixin/qwen3_32b_distill_1k.jsonl \
    --tensor_model_parallel_size 1 \
    --micro_batch_size 1 \
    --global_batch_size 16 \
    --recompute_granularity selective \
    --train_iters 31 \
    --eval_iters 5 \
    --finetune true \
    --cross_entropy_loss_fusion true \
    --lr 1e-5 \
    --lr_warmup_iters 10 \
    --min_lr 1e-6 \
    --save runs/test/qwen3-0.6b-megatron \
    --save_interval 100 \
    --max_length 2048 \
    --system 'You are a helpful assistant.' \
    --num_workers 4 \
    --no_save_optim true \
    --no_save_rng true \
    --dataset_num_proc 4 \
    --bf16 true \
    --log_interval 1 \
    --use_flash_attn true \
    --wandb_project llm101 \
    --wandb_exp_name qwen3-0.6b-megatron

Resolves #4071

* main: fix enable_cache (modelscope#4091) Support ulysses for llm/mllm,dpo/sft (modelscope#4085) update docs (modelscope#4078) feat: support megatron wandb (modelscope#4074) feat: add run name support (modelscope#4072) fix padding_side left (modelscope#4069) bump version support MiMo-7B (modelscope#4067) fix packing eval streaming (modelscope#4066) Support empty think loss scale (modelscope#4065) support qwen3-moe awq (modelscope#4059) Fix grpo eval when gas > 1 (modelscope#4057) fix rollout(modelscope#4055) updates GRPOTrainer compatible with trl 0.17 (modelscope#3969) support Qwen2.5-Omni-3B (modelscope#4052) update wechat (modelscope#4047) # Conflicts: # swift/llm/train/tuner.py

feat: support megatron wandb

6a1d7df

Jintao-Huang approved these changes May 4, 2025

View reviewed changes

Jintao-Huang merged commit 11ed8b9 into modelscope:main May 4, 2025
2 checks passed

Jintao-Huang mentioned this pull request May 4, 2025

🚀 Best Practices for Training Qwen3/Qwen3-MoE #4030

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support megatron wandb #4074

feat: support megatron wandb #4074

firefighter-eric commented May 4, 2025 •

edited

Loading

feat: support megatron wandb #4074

feat: support megatron wandb #4074

Conversation

firefighter-eric commented May 4, 2025 • edited Loading

PR type

PR information

Experiment results

firefighter-eric commented May 4, 2025 •

edited

Loading