-
Notifications
You must be signed in to change notification settings - Fork 743
Insights: modelscope/ms-swift
Overview
Could not load contribution data
Please try again later
2 Releases published by 1 person
-
v3.6.0
published
Jul 8, 2025 -
v3.6.1 Patch release v3.6.1
published
Jul 11, 2025
36 Pull requests merged by 10 people
-
[model] support Kimi-K2 template
#4925 merged
Jul 12, 2025 -
[megatron] support lora modules_to_save
#4916 merged
Jul 11, 2025 -
[doc] fix reranker index
#4921 merged
Jul 11, 2025 -
[bugfix] fix profiling patch
#4915 merged
Jul 11, 2025 -
feat: rlhf generation samples log to swanlab
#4907 merged
Jul 11, 2025 -
[megatron] Support dpo lora
#4913 merged
Jul 11, 2025 -
[megatron] add logging jsonl
#4908 merged
Jul 11, 2025 -
Fix FlorenceTemplate for florence2
#4871 merged
Jul 11, 2025 -
Fix the template suffix of qwen3 embedding
#4909 merged
Jul 11, 2025 -
[megatron] update to mcore 0.13
#4903 merged
Jul 10, 2025 -
[megatron] fix pp mla
#4904 merged
Jul 10, 2025 -
update framework.txt
#4896 merged
Jul 10, 2025 -
[megatron] support LoRA & support loss_scale
#4812 merged
Jul 9, 2025 -
Fix: Correct training hang for Keye-VL on DeepSpeed with mixed data
#4889 merged
Jul 9, 2025 -
[model] fix qwen eos_token
#4888 merged
Jul 9, 2025 -
optimize imports
#4883 merged
Jul 9, 2025 -
fix seq_cls generation_config
#4882 merged
Jul 9, 2025 -
fix loss_scale sp
#4880 merged
Jul 9, 2025 -
[SP] clean up imports
#4878 merged
Jul 9, 2025 -
[grpo] fix server arg check
#4865 merged
Jul 8, 2025 -
[web-ui]Modify open parameter for Accordion
#4859 merged
Jul 8, 2025 -
[dataset] fix dataset ddp write conflict
#4860 merged
Jul 7, 2025 -
Support Kwai-Keye/Keye-VL-8B-Preview
#4856 merged
Jul 7, 2025 -
[template] fix qwen3 remove '<think></think>'
#4857 merged
Jul 7, 2025 -
[grpo] update doc
#4853 merged
Jul 7, 2025 -
Fix test bug
#4851 merged
Jul 7, 2025 -
[grpo] fix offpolicy check
#4852 merged
Jul 7, 2025 -
[grpo]Fix bug when repeatedly call inputs_to_rolloutrequest
#4823 merged
Jul 7, 2025 -
[grpo] deprecated params for 3.6
#4848 merged
Jul 7, 2025 -
[megatron] fix eval_iters -1
#4847 merged
Jul 7, 2025 -
fix bug: grpo train error for deepseek model
#4833 merged
Jul 7, 2025 -
[megatron] Fix the display issue for train_type=lora
#4845 merged
Jul 7, 2025 -
update stream & fix bugs
#4842 merged
Jul 7, 2025 -
[Feature] SwanLab Lark callback
#4830 merged
Jul 6, 2025 -
fix multimodal padding_free prediction_step
#4839 merged
Jul 6, 2025 -
[train] fix multimodal packing & padding_free
#4838 merged
Jul 6, 2025
4 Pull requests opened by 4 people
-
[grpo] entropy mask
#4850 opened
Jul 7, 2025 -
[Feature] Add Swanlab Slack notification
#4887 opened
Jul 9, 2025 -
[Feature] 支持类似GYM环境训练接口,实现端到端的RL训练
#4890 opened
Jul 9, 2025 -
fix loss_scale bug when meeting <image>,<audio>,<video>
#4922 opened
Jul 11, 2025
28 Issues closed by 14 people
-
评测时,生成参数改了没效果,评测的配置文件还是显示默认参数
#3758 closed
Jul 12, 2025 -
grpo的生成式奖励模型如何接收输入
#4912 closed
Jul 11, 2025 -
Megatron-SWIFT训练导出32B模型显存报错
#3768 closed
Jul 11, 2025 -
[Question] Does Megatron-SWIFT restore the streaming-dataset offset when resuming from a checkpoint?
#4505 closed
Jul 11, 2025 -
hf格式模型文件转megatron报错: CUDA error: operation not supported
#4713 closed
Jul 11, 2025 -
支持Qwen3 MoE的Megatron LoRA训练
#4126 closed
Jul 11, 2025 -
sft A3B模型一直卡在这里不动了
#4799 closed
Jul 11, 2025 -
评测时,长度最大只能输出2048,不知道为啥。。。。
#3761 closed
Jul 11, 2025 -
SFT利用上一次加载数据的缓存
#3762 closed
Jul 11, 2025 -
Does Swift support multimodal interleave data in training GRPO?
#4905 closed
Jul 10, 2025 -
ModuleNotFoundError: No module named 'vllm_ascend.distributed'
#4886 closed
Jul 10, 2025 -
resume post-train have error :Mixed using with peft is not allowed now.
#4894 closed
Jul 10, 2025 -
镜像swift3.5.3中的megatron无法被正确找到并加载
#4866 closed
Jul 10, 2025 -
About the "500 Internal Server" Error in vllm-server
#4862 closed
Jul 9, 2025 -
带有思考过程的多轮对话数据,微调qwen3-32B,想问一下,损失计算的时候是不是思考过程只计算最后一轮的<think>\n\n</think>的内容呢?
#4810 closed
Jul 9, 2025 -
GRPO 是否支持动态数据筛选
#4884 closed
Jul 9, 2025 -
Feature Request: RTX 5090 Support with ms-swift docker image with CUDA 12.8
#4834 closed
Jul 9, 2025 -
timeout报错
#4872 closed
Jul 8, 2025 -
自定义注册数据集,第二次运行是否从cache传入数据
#4869 closed
Jul 8, 2025 -
rlhf.py: error: unrecognized arguments GRPO训练无法启动
#4868 closed
Jul 8, 2025 -
error when finetuning qwen3 in modelscope notebook.
#4811 closed
Jul 8, 2025 -
DDP环境下FileNotFoundError问题
#4840 closed
Jul 7, 2025 -
开启了ignore_empty_think,框架会自动删除<think>\n\n</think>\n\n,导致模型不思考
#4854 closed
Jul 7, 2025 -
ALL_PARALLEL_STYLES argument of type 'NoneType' is not iterable
#4843 closed
Jul 7, 2025 -
GRPO训练结果异常
#4800 closed
Jul 7, 2025 -
grpo + gen_rm 流程中的GenRMPlugin是否重复跑了数据
#4846 closed
Jul 7, 2025 -
Padding free feature
#4439 closed
Jul 6, 2025 -
支持Gemma-3n模型
#4759 closed
Jul 5, 2025
37 Issues opened by 33 people
-
多机多卡微调 torch.distributed.DistStoreError: wait timeout after 900000ms
#4924 opened
Jul 12, 2025 -
设置remove_unused_columns为false,在gkd的compute_loss函数中仍然无法透传额外字段
#4923 opened
Jul 11, 2025 -
微调GME
#4920 opened
Jul 11, 2025 -
请问是否有支持RLOO、REINFORCE++的计划?
#4919 opened
Jul 11, 2025 -
QwenVL2.5 图文数据和纯文本数据混合训练会卡住
#4918 opened
Jul 11, 2025 -
训练Gemma3模型出错
#4917 opened
Jul 11, 2025 -
这个项目能支持bert类的模型吗?比如Modern-Bert?
#4914 opened
Jul 11, 2025 -
Prefix prompt for Embedding training
#4911 opened
Jul 11, 2025 -
web ui 设置混合数据集比例
#4910 opened
Jul 11, 2025 -
多模态GRPO最佳实践训练,训练到90个steps timeout
#4906 opened
Jul 10, 2025 -
Megatron-Swift请支持一下多模态大模型的训练,比如Qwen2.5VL
#4902 opened
Jul 10, 2025 -
GRPO的生成式reward模型是否支持vllm预测
#4901 opened
Jul 10, 2025 -
swift deploy的服务,很容易提前触发停止符
#4900 opened
Jul 10, 2025 -
task_type='seq_cls'不能直接infer?
#4899 opened
Jul 10, 2025 -
Qwen2.5VL grounding微调数据问题
#4898 opened
Jul 10, 2025 -
多节点训练失效
#4897 opened
Jul 10, 2025 -
LoRA微调embedding时,支持多模态数据混合训练吗,数据中包含单模态文到文图到图、跨模态图文以及混合多模态数据,目前直接将各个类型的数据写在一个jsonl里报错如下:
#4895 opened
Jul 10, 2025 -
max_model_len is not working
#4893 opened
Jul 9, 2025 -
TypeError: argument of type 'NoneType' is not iterable
#4892 opened
Jul 9, 2025 -
How to construct my fine-tuning dataset?
#4891 opened
Jul 9, 2025 -
MAX_PIXELS in the env variables is not working when model_type is mimo_vl
#4885 opened
Jul 9, 2025 -
请问对于ovis模型,怎么正确将MAX_PARTITION传入?依靠环境变量vllm serve启动不生效
#4881 opened
Jul 9, 2025 -
deploy后client无法连接
#4879 opened
Jul 9, 2025 -
max_length of completion exceed max_completion_length
#4877 opened
Jul 8, 2025 -
微调qwen3的时候训练一直卡在use_logits_to_keep: True这一步不动
#4876 opened
Jul 8, 2025 -
qwen3训练卡在use_logits_to_keep: True环节一直不动
#4875 opened
Jul 8, 2025 -
channel loss 训练报错
#4874 opened
Jul 8, 2025 -
swift有npu的官方镜像吗
#4873 opened
Jul 8, 2025 -
grop训练到中间某个step报错
#4867 opened
Jul 8, 2025 -
reward model dataset inference
#4864 opened
Jul 8, 2025 -
Support for fine-tuning more multimodal embedding models (beyond GME)
#4861 opened
Jul 7, 2025 -
per_device_train_batch_size 变大 代码报错
#4858 opened
Jul 7, 2025 -
Evaluation don't run during training for custom dataset
#4855 opened
Jul 7, 2025 -
qwen2.5vl是否支持4bit的kv_cache量化?
#4849 opened
Jul 7, 2025 -
使用ms-swift sft之后模型的config.json文件变了,导致我不能直接使用vllm部署模型
#4844 opened
Jul 7, 2025 -
grpo + gen_rm padding index error
#4841 opened
Jul 7, 2025 -
Need to update requirements.txt
#4837 opened
Jul 5, 2025
37 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
awq量化qwen2.5-vl-7b报错
#4828 commented on
Jul 5, 2025 • 0 new comments -
Ovis-2B 预训练报错
#4818 commented on
Jul 5, 2025 • 0 new comments -
async_infer无法实现异步调用的疑问
#3717 commented on
Jul 6, 2025 • 0 new comments -
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 commented on
Jul 6, 2025 • 0 new comments -
Qwen2-VL-2B 预训练到后期会出现梯度爆炸,其他VLM不会出现
#4819 commented on
Jul 6, 2025 • 0 new comments -
开启断点训练后,为什么剩余时间越来越大了,epoch 和max_steps也和断点训练前接不上呢
#3783 commented on
Jul 7, 2025 • 0 new comments -
Use the feature of resume_from_checkpoint when using python code to run finetuning.
#3774 commented on
Jul 7, 2025 • 0 new comments -
[有人碰到过吗?]qwen2.5vl微调agent出现坐标点偏移问题
#4831 commented on
Jul 7, 2025 • 0 new comments -
Qwen2-Audio using flash attention: error occurs:RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
#2542 commented on
Jul 7, 2025 • 0 new comments -
请教下GRPO训练时出现模型多次异常触碰到Max_length的问题
#4758 commented on
Jul 7, 2025 • 0 new comments -
deepspeed AutoTP + ZeRO
#3797 commented on
Jul 8, 2025 • 0 new comments -
关于ms-swift eval 回测自定义数据集的问题, 而不得不使用evalscope来解决评测,希望尽快支持system字段
#3792 commented on
Jul 8, 2025 • 0 new comments -
How to specify the split (train/validation) for the dataset in cli
#3789 commented on
Jul 8, 2025 • 0 new comments -
使用注册数据集的方式,可以增加二进制的视频或图片进去吗?
#2286 commented on
Jul 8, 2025 • 0 new comments -
qwen2.5-vl的awq量化问题
#4762 commented on
Jul 8, 2025 • 0 new comments -
Trained Qwen 3 model seems to be broken.
#4835 commented on
Jul 8, 2025 • 0 new comments -
读完第一个batch,更新参数时卡住
#3809 commented on
Jul 9, 2025 • 0 new comments -
采用swift infer 测试qwen2.5-omni模型结果,与官方测试方法结果不一致
#4595 commented on
Jul 9, 2025 • 0 new comments -
Does it support training with AMD graphics cards?
#3067 commented on
Jul 9, 2025 • 0 new comments -
GRPO的时候怎么保存最后一步的checkpoints
#4574 commented on
Jul 9, 2025 • 0 new comments -
SwanLab Notification Integration
#4829 commented on
Jul 9, 2025 • 0 new comments -
swift更新到最新版后无法使用多个节点训练
#2057 commented on
Jul 9, 2025 • 0 new comments -
KTO训练报错:BFloat16 vs.float
#3830 commented on
Jul 10, 2025 • 0 new comments -
是否有支持昇腾NPU Flash attention的计划?
#2238 commented on
Jul 10, 2025 • 0 new comments -
微调qwen2.5vl grouding
#3659 commented on
Jul 10, 2025 • 0 new comments -
qwen2.5-vl grounding GRPO
#4794 commented on
Jul 10, 2025 • 0 new comments -
qwen2.5-7B GRPO训练时卡住,未显示任何报错
#4603 commented on
Jul 10, 2025 • 0 new comments -
Megatron并行训练是否支持NPU
#3833 commented on
Jul 11, 2025 • 0 new comments -
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on
Jul 11, 2025 • 0 new comments -
BUG: 训练集中包含特殊字符<unk>时有报错
#2688 commented on
Jul 11, 2025 • 0 new comments -
如何关闭自动模型并行呢?
#4684 commented on
Jul 11, 2025 • 0 new comments -
🍭[Roadmap] ms-swift3.6-3.8
#4561 commented on
Jul 11, 2025 • 0 new comments -
GKD代码加载模型卡死
#4724 commented on
Jul 11, 2025 • 0 new comments -
In the later stage of GRPO training, lr=0, grad_norm=NaN, kl=NaN, rewards=0.
#3136 commented on
Jul 11, 2025 • 0 new comments -
多机多卡zero3 lora微调后 merge读取时 报错safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
#3854 commented on
Jul 12, 2025 • 0 new comments -
GRPO Example script results
#3852 commented on
Jul 12, 2025 • 0 new comments -
Update dataset_info.json
#3723 commented on
Jul 6, 2025 • 0 new comments