-
Notifications
You must be signed in to change notification settings - Fork 636
Insights: modelscope/ms-swift
Overview
Could not load contribution data
Please try again later
26 Pull requests merged by 6 people
-
[megatron]Support packing & CP
#4163 merged
May 11, 2025 -
Support ulysses streaming
#4160 merged
May 10, 2025 -
update readme
#4157 merged
May 9, 2025 -
Add more evaluation args
#4155 merged
May 9, 2025 -
Add sp script
#4154 merged
May 9, 2025 -
fix init parameters
#4148 merged
May 9, 2025 -
Fix bugs
#4150 merged
May 9, 2025 -
fix ulysses dpo
#4149 merged
May 9, 2025 -
Support init parameters
#4141 merged
May 9, 2025 -
Feature freezing/activating parameters via regex
#4143 merged
May 9, 2025 -
grpo code reward by judge0
#4140 merged
May 9, 2025 -
[megatron] support max_epochs
#4125 merged
May 9, 2025 -
[grpo] fix labels pop and peftmodel parameter check
#4136 merged
May 8, 2025 -
update qwen3 more models
#4123 merged
May 8, 2025 -
fix sequence_parallel
#4122 merged
May 7, 2025 -
fix omni aligner
#4117 merged
May 7, 2025 -
Fix ulysses eval
#4114 merged
May 7, 2025 -
fix packing
#4113 merged
May 7, 2025 -
fix enable_cache
#4109 merged
May 7, 2025 -
fix requirements
#4108 merged
May 7, 2025 -
[megatron] Update long text shell
#4106 merged
May 7, 2025 -
support max_epochs
#4102 merged
May 7, 2025 -
Update liger code
#4095 merged
May 6, 2025 -
fix enable_cache
#4091 merged
May 6, 2025 -
Support ulysses for llm/mllm,dpo/sft
#4085 merged
May 5, 2025 -
update docs
#4078 merged
May 4, 2025
7 Pull requests opened by 3 people
-
fix enable_cache
#4075 opened
May 4, 2025 -
refactor grpo internal mode
#4097 opened
May 6, 2025 -
Refactor SP
#4121 opened
May 7, 2025 -
[grpo] fix multi modal doc
#4124 opened
May 8, 2025 -
fix model_type mismatch
#4127 opened
May 8, 2025 -
support more vision dataset
#4132 opened
May 8, 2025 -
[grpo] support gen rm
#4151 opened
May 9, 2025
14 Issues closed by 8 people
-
Megatron SFT context_parallel_size>1时报cuda error
#4144 closed
May 11, 2025 -
pip install 'ms-swift[all]' -U的时候会进行很多个版本的下载
#4137 closed
May 9, 2025 -
Support for Qwen2-Audio and Qwen2.5-Omni
#4088 closed
May 8, 2025 -
qwen2.5-omni-7b merge-lora results differ
#3756 closed
May 8, 2025 -
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
#4120 closed
May 8, 2025 -
Qwen2.5-7B-Base 超长文本训练部分step之后报错
#4105 closed
May 7, 2025 -
关于deepspeed多卡训练时.cache中出现和卡数成正比的数据拷贝,导致存储空间占用过大的问题
#3965 closed
May 6, 2025 -
Qwen3-8B-Base SFT 全参微调保存第一个模型后hang住
#4053 closed
May 6, 2025 -
Qwen3数据集设置不优雅
#4087 closed
May 6, 2025 -
Too many dataloader workers
#4061 closed
May 6, 2025 -
qwen3 seq_cls
#4073 closed
May 6, 2025 -
requirements中包的版本存在问题
#4080 closed
May 5, 2025
47 Issues opened by 39 people
-
grpo use speical token
#4162 opened
May 10, 2025 -
full sft设置了val_dataset后,在eval时报错
#4159 opened
May 10, 2025 -
Template _encode 函数内不能用model.cuda()
#4158 opened
May 9, 2025 -
swift sft 设置--streaming true时,会报 No such file or directory
#4156 opened
May 9, 2025 -
请问现在十分支持部署 基座qwen2.5-VL + 多个lora 这样的服务
#4153 opened
May 9, 2025 -
wandb,开了海外代理还一直报错(网络连接超时,network error (connectiontimeout))
#4152 opened
May 9, 2025 -
使用megatron swift sft微调Qwen3-30B-A3B之后,checkpoint无法转回huggingface格式
#4147 opened
May 9, 2025 -
DPO训练效率很低
#4146 opened
May 9, 2025 -
Qwen2.5vl32B merge lora OOM问题
#4145 opened
May 9, 2025 -
dpo 是否支持packing
#4142 opened
May 9, 2025 -
可以实现不同数据使用不同的loss_scale吗
#4139 opened
May 8, 2025 -
自定义模型并注册,在数据map时卡住(版本3.3.1)
#4138 opened
May 8, 2025 -
Request Failed with 422 Error: Input Should Be a Valid String for Image Paths
#4135 opened
May 8, 2025 -
Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 opened
May 8, 2025 -
swift megatron sys._base_executable problem
#4133 opened
May 8, 2025 -
在训练好的lora基础上用别的数据二次训练
#4131 opened
May 8, 2025 -
swift infer在tp=2的情况下,不支持deepseek-r1-distill-qwen系列和qwq32B模型的批推理
#4130 opened
May 8, 2025 -
swift infer的批处理非常好用,但能否支持近实时写入result_path,而不是最后写入
#4129 opened
May 8, 2025 -
Qwen2-audio-instruct用lora微调后inference,出现tensor维度不对应的问题
#4128 opened
May 8, 2025 -
支持Qwen3 MoE的Megatron LoRA训练
#4126 opened
May 8, 2025 -
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
#4119 opened
May 7, 2025 -
GRPO下的多轮多模态对话数据集构建
#4118 opened
May 7, 2025 -
推理中出现从未遇见的bug
#4116 opened
May 7, 2025 -
有无懂哥说说internvl3_8B微调完后怎么做awq量化呀
#4115 opened
May 7, 2025 -
beta参数在GRPO中失效
#4112 opened
May 7, 2025 -
qwen omni注册的问题
#4110 opened
May 7, 2025 -
对于一个已经完成sft之后的任务,如果我想加入新的知识但不想掉点,我应该选择ms-swift实现的强化微调和GRPO哪个来完成呢?
#4107 opened
May 7, 2025 -
dpo模型RuntimeError: CUDA driver error: invalid argument,
#4104 opened
May 7, 2025 -
训练的时候总提示: RuntimeError: CUDA driver error: invalid argument
#4103 opened
May 7, 2025 -
LLama-omni进行audio微调索引报错
#4101 opened
May 7, 2025 -
ulysses raise NotImplementedError
#4100 opened
May 7, 2025 -
框架支持传rope theta的参数吗?
#4099 opened
May 6, 2025 -
序列分类模型在推理的时候会shuffle数据集
#4098 opened
May 6, 2025 -
internvl3_8B多模态模型的微调如何设置不同模块的冷冻与lora阶数呢?
#4096 opened
May 6, 2025 -
有什么参数可以调节dataset的sampling的比例
#4094 opened
May 6, 2025 -
sequence classification inference
#4093 opened
May 6, 2025 -
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 opened
May 6, 2025 -
可否在eval的过程中保存结果呢
#4090 opened
May 6, 2025 -
为啥现做RLHF 不支持sequence_parallel
#4089 opened
May 6, 2025 -
lora微调gte embedding, merge后推理结果跟微调的结果相差很大
#4084 opened
May 5, 2025 -
Streaming + Packing + resume_from_checkpoint时出现报错
#4083 opened
May 5, 2025 -
function call 微调报错 TypeError: string indices must be integers, not 'str'
#4082 opened
May 5, 2025 -
训练正常 eval时报assert error
#4081 opened
May 5, 2025 -
Pre-offline tokenize for ultra large multimodal datasets
#4079 opened
May 4, 2025 -
making llm_max_batch_size and mllm_max_batch_size configurable
#4077 opened
May 4, 2025 -
InternVL3-9B LoRA微调数据集预处理速度缓慢问题(大约7h)
#4076 opened
May 4, 2025
25 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
微调了qwen2-audio-7b-instruct
#2637 commented on
May 5, 2025 • 0 new comments -
QwenVL2 72B 序列并行报错维度不匹配
#2972 commented on
May 5, 2025 • 0 new comments -
[HELP]推理奖励模型报错,感谢大家,求教qwen基座rm后的模型如何vllm推理
#4045 commented on
May 6, 2025 • 0 new comments -
在inference的时候指定--max_length 4096但是似乎没有起到任何作用
#3967 commented on
May 6, 2025 • 0 new comments -
While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!””
#3930 commented on
May 6, 2025 • 0 new comments -
用grpo训练qwen2.5-7b-instruct出现!!!!
#4060 commented on
May 6, 2025 • 0 new comments -
训练中途突然报错 NCCL watchdog thread terminated with exception
#1817 commented on
May 6, 2025 • 0 new comments -
在GRPO训练中Weight_decay似乎没奏效?
#3931 commented on
May 6, 2025 • 0 new comments -
Customized Image Data Augmentation
#2345 commented on
May 7, 2025 • 0 new comments -
cannot import name 'LoRA' from 'swift'
#3665 commented on
May 7, 2025 • 0 new comments -
lora微调后再awq量化,报错, 详细如下:
#2318 commented on
May 7, 2025 • 0 new comments -
关于qLoRA训练
#4007 commented on
May 7, 2025 • 0 new comments -
Qwen2.5-vl 微调grounding任务,怎么使用自己本地数据集训练
#3204 commented on
May 8, 2025 • 0 new comments -
请求支持健康检查
#3474 commented on
May 8, 2025 • 0 new comments -
微调DS_32B后merge_lora,将合并后的模型推理不生效
#3974 commented on
May 8, 2025 • 0 new comments -
原始gte 7B 模型大小大概29G, 使用github,训练脚本使用example中对应的训练参数,改为全参训练,参数变成 14G。GTE模型全参训练完加载报错
#4005 commented on
May 8, 2025 • 0 new comments -
GRPO训练报错:Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension
#3864 commented on
May 8, 2025 • 0 new comments -
qwen2.5-vl-72b, vllm_server_host方式运行,CUDA out of memory
#4023 commented on
May 8, 2025 • 0 new comments -
SimPO and ORPO support for VLM (Qwen2.5VL)
#3718 commented on
May 8, 2025 • 0 new comments -
多卡多进程使用orpo卡死,触发watchdog caught collective operation timeout.
#3564 commented on
May 8, 2025 • 0 new comments -
[WARNING:swift] No training was carried out, which may be due to the dataset being too small or incorrect usage of resume_from_checkpoint.
#3863 commented on
May 9, 2025 • 0 new comments -
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on
May 9, 2025 • 0 new comments -
支持GME微调么
#3019 commented on
May 10, 2025 • 0 new comments -
单张4090对minicpmV2.6进行视频问答微调总是中途OOM
#3849 commented on
May 10, 2025 • 0 new comments -
[Bug]: Wrong context length for Qwen 2.5 7B-Instruct?
#3907 commented on
May 10, 2025 • 0 new comments