Pulse · modelscope/ms-swift · GitHub

July 5, 2025 – July 12, 2025

Overview

40 Active pull requests

65 Active issues

2 Releases published by 1 person

v3.6.0
published Jul 8, 2025
v3.6.1 Patch release v3.6.1
published Jul 11, 2025

36 Pull requests merged by 10 people

[model] support Kimi-K2 template
#4925 merged Jul 12, 2025
[megatron] support lora modules_to_save
#4916 merged Jul 11, 2025
[doc] fix reranker index
#4921 merged Jul 11, 2025
[bugfix] fix profiling patch
#4915 merged Jul 11, 2025
feat: rlhf generation samples log to swanlab
#4907 merged Jul 11, 2025
[megatron] Support dpo lora
#4913 merged Jul 11, 2025
[megatron] add logging jsonl
#4908 merged Jul 11, 2025
Fix FlorenceTemplate for florence2
#4871 merged Jul 11, 2025
Fix the template suffix of qwen3 embedding
#4909 merged Jul 11, 2025
[megatron] update to mcore 0.13
#4903 merged Jul 10, 2025
[megatron] fix pp mla
#4904 merged Jul 10, 2025
update framework.txt
#4896 merged Jul 10, 2025
[megatron] support LoRA & support loss_scale
#4812 merged Jul 9, 2025
Fix: Correct training hang for Keye-VL on DeepSpeed with mixed data
#4889 merged Jul 9, 2025
[model] fix qwen eos_token
#4888 merged Jul 9, 2025
optimize imports
#4883 merged Jul 9, 2025
fix seq_cls generation_config
#4882 merged Jul 9, 2025
fix loss_scale sp
#4880 merged Jul 9, 2025
[SP] clean up imports
#4878 merged Jul 9, 2025
[grpo] fix server arg check
#4865 merged Jul 8, 2025
[web-ui]Modify open parameter for Accordion
#4859 merged Jul 8, 2025
[dataset] fix dataset ddp write conflict
#4860 merged Jul 7, 2025
Support Kwai-Keye/Keye-VL-8B-Preview
#4856 merged Jul 7, 2025
[template] fix qwen3 remove '<think></think>'
#4857 merged Jul 7, 2025
[grpo] update doc
#4853 merged Jul 7, 2025
Fix test bug
#4851 merged Jul 7, 2025
[grpo] fix offpolicy check
#4852 merged Jul 7, 2025
[grpo]Fix bug when repeatedly call inputs_to_rolloutrequest
#4823 merged Jul 7, 2025
[grpo] deprecated params for 3.6
#4848 merged Jul 7, 2025
[megatron] fix eval_iters -1
#4847 merged Jul 7, 2025
fix bug: grpo train error for deepseek model
#4833 merged Jul 7, 2025
[megatron] Fix the display issue for train_type=lora
#4845 merged Jul 7, 2025
update stream & fix bugs
#4842 merged Jul 7, 2025
[Feature] SwanLab Lark callback
#4830 merged Jul 6, 2025
fix multimodal padding_free prediction_step
#4839 merged Jul 6, 2025
[train] fix multimodal packing & padding_free
#4838 merged Jul 6, 2025

4 Pull requests opened by 4 people

[grpo] entropy mask
#4850 opened Jul 7, 2025
[Feature] Add Swanlab Slack notification
#4887 opened Jul 9, 2025
[Feature] 支持类似GYM环境训练接口，实现端到端的RL训练
#4890 opened Jul 9, 2025
fix loss_scale bug when meeting <image>,<audio>,<video>
#4922 opened Jul 11, 2025

28 Issues closed by 14 people

评测时，生成参数改了没效果，评测的配置文件还是显示默认参数
#3758 closed Jul 12, 2025
grpo的生成式奖励模型如何接收输入
#4912 closed Jul 11, 2025
Megatron-SWIFT训练导出32B模型显存报错
#3768 closed Jul 11, 2025
[Question] Does Megatron-SWIFT restore the streaming-dataset offset when resuming from a checkpoint?
#4505 closed Jul 11, 2025
hf格式模型文件转megatron报错: CUDA error: operation not supported
#4713 closed Jul 11, 2025
支持Qwen3 MoE的Megatron LoRA训练
#4126 closed Jul 11, 2025
sft A3B模型一直卡在这里不动了
#4799 closed Jul 11, 2025
评测时，长度最大只能输出2048，不知道为啥。。。。
#3761 closed Jul 11, 2025
SFT利用上一次加载数据的缓存
#3762 closed Jul 11, 2025
Does Swift support multimodal interleave data in training GRPO?
#4905 closed Jul 10, 2025
ModuleNotFoundError: No module named 'vllm_ascend.distributed'
#4886 closed Jul 10, 2025
resume post-train have error ：Mixed using with peft is not allowed now.
#4894 closed Jul 10, 2025
镜像swift3.5.3中的megatron无法被正确找到并加载
#4866 closed Jul 10, 2025
About the "500 Internal Server" Error in vllm-server
#4862 closed Jul 9, 2025
带有思考过程的多轮对话数据，微调qwen3-32B，想问一下，损失计算的时候是不是思考过程只计算最后一轮的<think>\n\n</think>的内容呢？
#4810 closed Jul 9, 2025
GRPO 是否支持动态数据筛选
#4884 closed Jul 9, 2025
Feature Request: RTX 5090 Support with ms-swift docker image with CUDA 12.8
#4834 closed Jul 9, 2025
timeout报错
#4872 closed Jul 8, 2025
自定义注册数据集,第二次运行是否从cache传入数据
#4869 closed Jul 8, 2025
rlhf.py: error: unrecognized arguments GRPO训练无法启动
#4868 closed Jul 8, 2025
error when finetuning qwen3 in modelscope notebook.
#4811 closed Jul 8, 2025
DDP环境下FileNotFoundError问题
#4840 closed Jul 7, 2025
开启了ignore_empty_think，框架会自动删除<think>\n\n</think>\n\n，导致模型不思考
#4854 closed Jul 7, 2025
ALL_PARALLEL_STYLES argument of type 'NoneType' is not iterable
#4843 closed Jul 7, 2025
GRPO训练结果异常
#4800 closed Jul 7, 2025
grpo + gen_rm 流程中的GenRMPlugin是否重复跑了数据
#4846 closed Jul 7, 2025
Padding free feature
#4439 closed Jul 6, 2025
支持Gemma-3n模型
#4759 closed Jul 5, 2025

37 Issues opened by 33 people

多机多卡微调 torch.distributed.DistStoreError: wait timeout after 900000ms
#4924 opened Jul 12, 2025
设置remove_unused_columns为false，在gkd的compute_loss函数中仍然无法透传额外字段
#4923 opened Jul 11, 2025
微调GME
#4920 opened Jul 11, 2025
请问是否有支持RLOO、REINFORCE++的计划？
#4919 opened Jul 11, 2025
QwenVL2.5 图文数据和纯文本数据混合训练会卡住
#4918 opened Jul 11, 2025
训练Gemma3模型出错
#4917 opened Jul 11, 2025
这个项目能支持bert类的模型吗？比如Modern-Bert？
#4914 opened Jul 11, 2025
Prefix prompt for Embedding training
#4911 opened Jul 11, 2025
web ui 设置混合数据集比例
#4910 opened Jul 11, 2025
多模态GRPO最佳实践训练，训练到90个steps timeout
#4906 opened Jul 10, 2025
Megatron-Swift请支持一下多模态大模型的训练，比如Qwen2.5VL
#4902 opened Jul 10, 2025
GRPO的生成式reward模型是否支持vllm预测
#4901 opened Jul 10, 2025
swift deploy的服务，很容易提前触发停止符
#4900 opened Jul 10, 2025
task_type='seq_cls'不能直接infer？
#4899 opened Jul 10, 2025
Qwen2.5VL grounding微调数据问题
#4898 opened Jul 10, 2025
多节点训练失效
#4897 opened Jul 10, 2025
LoRA微调embedding时，支持多模态数据混合训练吗，数据中包含单模态文到文图到图、跨模态图文以及混合多模态数据，目前直接将各个类型的数据写在一个jsonl里报错如下：
#4895 opened Jul 10, 2025
max_model_len is not working
#4893 opened Jul 9, 2025
TypeError: argument of type 'NoneType' is not iterable
#4892 opened Jul 9, 2025
How to construct my fine-tuning dataset?
#4891 opened Jul 9, 2025
MAX_PIXELS in the env variables is not working when model_type is mimo_vl
#4885 opened Jul 9, 2025
请问对于ovis模型，怎么正确将MAX_PARTITION传入？依靠环境变量vllm serve启动不生效
#4881 opened Jul 9, 2025
deploy后client无法连接
#4879 opened Jul 9, 2025
max_length of completion exceed max_completion_length
#4877 opened Jul 8, 2025
微调qwen3的时候训练一直卡在use_logits_to_keep: True这一步不动
#4876 opened Jul 8, 2025
qwen3训练卡在use_logits_to_keep: True环节一直不动
#4875 opened Jul 8, 2025
channel loss 训练报错
#4874 opened Jul 8, 2025
swift有npu的官方镜像吗
#4873 opened Jul 8, 2025
grop训练到中间某个step报错
#4867 opened Jul 8, 2025
reward model dataset inference
#4864 opened Jul 8, 2025
Support for fine-tuning more multimodal embedding models (beyond GME)
#4861 opened Jul 7, 2025
per_device_train_batch_size 变大代码报错
#4858 opened Jul 7, 2025
Evaluation don't run during training for custom dataset
#4855 opened Jul 7, 2025
qwen2.5vl是否支持4bit的kv_cache量化？
#4849 opened Jul 7, 2025
使用ms-swift sft之后模型的config.json文件变了，导致我不能直接使用vllm部署模型
#4844 opened Jul 7, 2025
grpo + gen_rm padding index error
#4841 opened Jul 7, 2025
Need to update requirements.txt
#4837 opened Jul 5, 2025

37 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

awq量化qwen2.5-vl-7b报错
#4828 commented on Jul 5, 2025 • 0 new comments
Ovis-2B 预训练报错
#4818 commented on Jul 5, 2025 • 0 new comments
async_infer无法实现异步调用的疑问
#3717 commented on Jul 6, 2025 • 0 new comments
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 commented on Jul 6, 2025 • 0 new comments
Qwen2-VL-2B 预训练到后期会出现梯度爆炸，其他VLM不会出现
#4819 commented on Jul 6, 2025 • 0 new comments
开启断点训练后，为什么剩余时间越来越大了，epoch 和max_steps也和断点训练前接不上呢
#3783 commented on Jul 7, 2025 • 0 new comments
Use the feature of resume_from_checkpoint when using python code to run finetuning.
#3774 commented on Jul 7, 2025 • 0 new comments
[有人碰到过吗？]qwen2.5vl微调agent出现坐标点偏移问题
#4831 commented on Jul 7, 2025 • 0 new comments
Qwen2-Audio using flash attention: error occurs:RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
#2542 commented on Jul 7, 2025 • 0 new comments
请教下GRPO训练时出现模型多次异常触碰到Max_length的问题
#4758 commented on Jul 7, 2025 • 0 new comments
deepspeed AutoTP + ZeRO
#3797 commented on Jul 8, 2025 • 0 new comments
关于ms-swift eval 回测自定义数据集的问题，而不得不使用evalscope来解决评测，希望尽快支持system字段
#3792 commented on Jul 8, 2025 • 0 new comments
How to specify the split (train/validation) for the dataset in cli
#3789 commented on Jul 8, 2025 • 0 new comments
使用注册数据集的方式，可以增加二进制的视频或图片进去吗？
#2286 commented on Jul 8, 2025 • 0 new comments
qwen2.5-vl的awq量化问题
#4762 commented on Jul 8, 2025 • 0 new comments
Trained Qwen 3 model seems to be broken.
#4835 commented on Jul 8, 2025 • 0 new comments
读完第一个batch，更新参数时卡住
#3809 commented on Jul 9, 2025 • 0 new comments
采用swift infer 测试qwen2.5-omni模型结果，与官方测试方法结果不一致
#4595 commented on Jul 9, 2025 • 0 new comments
Does it support training with AMD graphics cards?
#3067 commented on Jul 9, 2025 • 0 new comments
GRPO的时候怎么保存最后一步的checkpoints
#4574 commented on Jul 9, 2025 • 0 new comments
SwanLab Notification Integration
#4829 commented on Jul 9, 2025 • 0 new comments
swift更新到最新版后无法使用多个节点训练
#2057 commented on Jul 9, 2025 • 0 new comments
KTO训练报错：BFloat16 vs.float
#3830 commented on Jul 10, 2025 • 0 new comments
是否有支持昇腾NPU Flash attention的计划？
#2238 commented on Jul 10, 2025 • 0 new comments
微调qwen2.5vl grouding
#3659 commented on Jul 10, 2025 • 0 new comments
qwen2.5-vl grounding GRPO
#4794 commented on Jul 10, 2025 • 0 new comments
qwen2.5-7B GRPO训练时卡住，未显示任何报错
#4603 commented on Jul 10, 2025 • 0 new comments
Megatron并行训练是否支持NPU
#3833 commented on Jul 11, 2025 • 0 new comments
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on Jul 11, 2025 • 0 new comments
BUG: 训练集中包含特殊字符<unk>时有报错
#2688 commented on Jul 11, 2025 • 0 new comments
如何关闭自动模型并行呢？
#4684 commented on Jul 11, 2025 • 0 new comments
🍭[Roadmap] ms-swift3.6-3.8
#4561 commented on Jul 11, 2025 • 0 new comments
GKD代码加载模型卡死
#4724 commented on Jul 11, 2025 • 0 new comments
In the later stage of GRPO training, lr=0, grad_norm=NaN, kl=NaN, rewards=0.
#3136 commented on Jul 11, 2025 • 0 new comments
多机多卡zero3 lora微调后 merge读取时报错safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
#3854 commented on Jul 12, 2025 • 0 new comments
GRPO Example script results
#3852 commented on Jul 12, 2025 • 0 new comments
Update dataset_info.json
#3723 commented on Jul 6, 2025 • 0 new comments