Pulse · modelscope/ms-swift · GitHub

June 7, 2025 – July 7, 2025

Overview

118 Active pull requests

339 Active issues

4 Releases published by 1 person

v3.5.0
published Jun 8, 2025
v3.5.1 Patch release v3.5.1
published Jun 13, 2025
v3.5.2 Patch release v3.5.2
published Jun 20, 2025
v3.5.3 Patch release v3.5.3
published Jun 27, 2025

110 Pull requests merged by 13 people

[web-ui]Modify open parameter for Accordion
#4859 merged Jul 8, 2025
[dataset] fix dataset ddp write conflict
#4860 merged Jul 7, 2025
Support Kwai-Keye/Keye-VL-8B-Preview
#4856 merged Jul 7, 2025
[template] fix qwen3 remove '<think></think>'
#4857 merged Jul 7, 2025
[grpo] update doc
#4853 merged Jul 7, 2025
Fix test bug
#4851 merged Jul 7, 2025
[grpo] fix offpolicy check
#4852 merged Jul 7, 2025
[grpo]Fix bug when repeatedly call inputs_to_rolloutrequest
#4823 merged Jul 7, 2025
[grpo] deprecated params for 3.6
#4848 merged Jul 7, 2025
[megatron] fix eval_iters -1
#4847 merged Jul 7, 2025
fix bug: grpo train error for deepseek model
#4833 merged Jul 7, 2025
[megatron] Fix the display issue for train_type=lora
#4845 merged Jul 7, 2025
update stream & fix bugs
#4842 merged Jul 7, 2025
[Feature] SwanLab Lark callback
#4830 merged Jul 6, 2025
fix multimodal padding_free prediction_step
#4839 merged Jul 6, 2025
[train] fix multimodal packing & padding_free
#4838 merged Jul 6, 2025
Support gemma3n
#4836 merged Jul 4, 2025
[grpo] fix apply_chat_template
#4827 merged Jul 4, 2025
[rollout] fix request from dict
#4826 merged Jul 4, 2025
[rollout] Fix non-serializable torch.dtype bug in VLLM weight sync
#4825 merged Jul 4, 2025
[rollout] fix external plugins
#4822 merged Jul 4, 2025
[GITHUB WORKFLOW]add close stale issues workflow
#4816 merged Jul 3, 2025
[RM] support margin & update doc
#4817 merged Jul 3, 2025
Support ring attention for llm sft/dpo/grpo (packing/padding_free only).
#4814 merged Jul 3, 2025
Refactor Web-UI
#4687 merged Jul 3, 2025
[train] Update split_dataset_ratio
#4798 merged Jul 3, 2025
[model] support GLM4.1V
#4804 merged Jul 2, 2025
fix template bug for qwen3 reranker
#4795 merged Jul 2, 2025
update custom_dataset_docs
#4792 merged Jul 2, 2025
update resume from checkpoint & update timeout
#4774 merged Jul 1, 2025
Fix media downloading from hf
#4788 merged Jul 1, 2025
[grpo] check eval_dataset length
#4781 merged Jul 1, 2025
[grpo] pass trainer state to reward funcs
#4779 merged Jul 1, 2025
[docs] fix grpo docs
#4777 merged Jul 1, 2025
[grpo] update vllm weight sync & wake up
#4770 merged Jul 1, 2025
update megatron shell
#4773 merged Jun 30, 2025
update wechat
#4769 merged Jun 30, 2025
[model] support ERNIE-4.5
#4757 merged Jun 30, 2025
fix remove_unused_columns
#4749 merged Jun 28, 2025
[megatron] support fp8
#4730 merged Jun 28, 2025
[model] support Tencent-Hunyuan/Hunyuan-A13B-Instruct
#4745 merged Jun 27, 2025
[grpo]Tool rl: add reward func for ToolRL
#4694 merged Jun 27, 2025
compat transformers==4.52 (vlm)
#4738 merged Jun 26, 2025
[grpo] check liger & sp
#4734 merged Jun 26, 2025
[grpo] fix max_step for dataloader when applying sequence parallel
#4731 merged Jun 26, 2025
[quant] Support fp8
#4729 merged Jun 26, 2025
support Kimi-VL-A3B-Thinking-2506 & Kimi-Dev-72B
#4719 merged Jun 25, 2025
[doc] simplify environment variables & update best practices documentation
#4715 merged Jun 25, 2025
[grpo] fix colocate seed
#4712 merged Jun 25, 2025
[megatron] support rednote-hilab/dots.llm1.inst
#4707 merged Jun 25, 2025
[megatron] support DeepseekV2ForCausalLM and DeepseekV3ForCausalLM
#4659 merged Jun 25, 2025
fix links
#4690 merged Jun 24, 2025
[feat] support fine-tuning of reranker models
#4671 merged Jun 24, 2025
[grpo] fix grpo pt
#4683 merged Jun 24, 2025
[rollout] fix dp args
#4678 merged Jun 23, 2025
[doc] fix doc
#4675 merged Jun 23, 2025
[doc] fix image link
#4674 merged Jun 23, 2025
docs: correct typo "resonse" to "response"
#4672 merged Jun 23, 2025
[channel loss]support packing & padding free
#4666 merged Jun 23, 2025
[docs] update docs
#4665 merged Jun 23, 2025
[dataset] fix grounding_dataset
#4664 merged Jun 23, 2025
[grpo] refactor multi turn & support async engine & refactor grpo docs
#4380 merged Jun 23, 2025
[template] optimize remove_unused_columns
#4661 merged Jun 22, 2025
[gkd] support use_logits_to_keep/padding_free/packing & update gkd shell
#4658 merged Jun 21, 2025
[docs] update gkd
#4657 merged Jun 20, 2025
compat megatron-core 0.11
#4655 merged Jun 20, 2025
[megatron] fix eval data_collator
#4654 merged Jun 20, 2025
fix device_map & ddp rank0
#4650 merged Jun 20, 2025
fix packing & load_from_cache_file
#4649 merged Jun 20, 2025
[model] fix model_meta
#4647 merged Jun 20, 2025
[template] optimize get_length
#4641 merged Jun 20, 2025
[docs] update qwen3 best_practice
#4300 merged Jun 19, 2025
update docs readme
#4639 merged Jun 19, 2025
update docs & shell
#4637 merged Jun 19, 2025
[infer/deploy/eval/app] support sglang engine
#3810 merged Jun 19, 2025
[doc] LaTeX rendering
#4629 merged Jun 18, 2025
[rollout] swift rollout add template
#4626 merged Jun 17, 2025
[loss_scale] support last_round_with_ignore_empty_think for rag
#4623 merged Jun 17, 2025
fix max_epochs tp
#4624 merged Jun 17, 2025
[ppo] fix ppo
#4622 merged Jun 17, 2025
[docs] remove Qwen3-32B-Base
#4621 merged Jun 17, 2025
[gkd] support gkd_trainer
#4587 merged Jun 17, 2025
Fix minimax & fix agent_template
#4618 merged Jun 17, 2025
[megatron] fix megatron pp max_epochs
#4608 merged Jun 16, 2025
Update FAQ
#4612 merged Jun 16, 2025
[model] support minimax
#4610 merged Jun 16, 2025
[megatron] compat megatron-core main branch
#4606 merged Jun 15, 2025
[mirror] update swift mirror
#4601 merged Jun 14, 2025
Fix UI llm_train
#4592 merged Jun 13, 2025
fix gc_kwargs
#4591 merged Jun 13, 2025
[grpo] restore num_generations check
#4590 merged Jun 13, 2025
[megatron] support more rope_scaling & support deepseek-r1-qwen3-8b/internlm3/mimo-7b
#4576 merged Jun 12, 2025
[grpo] model weight synchronization before first turn rollout with async generation
#4584 merged Jun 12, 2025
[grpo] remove data collator to top-level to avoid pickle error in spawn mode
#4582 merged Jun 12, 2025
[megatron] Fix megatron all_reduce warning
#4568 merged Jun 12, 2025
[model] fix ovis gradient_checkpointing vit no_grad
#4571 merged Jun 12, 2025
fix args.json
#4566 merged Jun 11, 2025
[Bug]Fix ulysses train steps, embedding negative sample length
#4565 merged Jun 11, 2025
[dataset] fix toolbench (local)
#4563 merged Jun 11, 2025
[grpo] fix the pickle data collator
#4562 merged Jun 11, 2025
[grpo] support offloading reference model
#4554 merged Jun 11, 2025
support dots1
#4560 merged Jun 11, 2025
[megatron] support DPO
#4193 merged Jun 11, 2025
[megatron/dpo] fix megatron packing_cache & update DPOTrainer
#4556 merged Jun 11, 2025
fix qwen3 embedding saving
#4548 merged Jun 10, 2025
fix: handle INFONCE_HARD_NEGATIVES as integer if provided
#4545 merged Jun 10, 2025
support cognitivecomputations/DeepSeek-R1-0528-AWQ
#4537 merged Jun 9, 2025
fix LoraModel
#4536 merged Jun 9, 2025
[grpo] update doc about move_model_batches
#4523 merged Jun 8, 2025
fix emb script and docs
#4521 merged Jun 8, 2025

8 Pull requests opened by 6 people

fix: add SO_REUSEADDR to find_free_port to handle TIME_WAIT state
#4573 opened Jun 12, 2025
solve the default 'template_backend' bug in llm.tempalte.base.Templte._encode
#4669 opened Jun 23, 2025
support ernie_vl
#4763 opened Jun 30, 2025
[Safety]Fix torch load
#4802 opened Jul 2, 2025
[WIP][megatron] support LoRA
#4812 opened Jul 3, 2025
Update template_meta.prefix bug
#4813 opened Jul 3, 2025
Aacedar patch 3
#4832 opened Jul 4, 2025
[grpo] entropy mask
#4850 opened Jul 7, 2025

199 Issues closed by 37 people

error when finetuning qwen3 in modelscope notebook.
#4811 closed Jul 8, 2025
DDP环境下FileNotFoundError问题
#4840 closed Jul 7, 2025
开启了ignore_empty_think，框架会自动删除<think>\n\n</think>\n\n，导致模型不思考
#4854 closed Jul 7, 2025
ALL_PARALLEL_STYLES argument of type 'NoneType' is not iterable
#4843 closed Jul 7, 2025
GRPO训练结果异常
#4800 closed Jul 7, 2025
grpo + gen_rm 流程中的GenRMPlugin是否重复跑了数据
#4846 closed Jul 7, 2025
Padding free feature
#4439 closed Jul 6, 2025
支持Gemma-3n模型
#4759 closed Jul 5, 2025
Zero loss in case lora qwen3-4b-reranker tuning
#4820 closed Jul 4, 2025
grpo can not support deepseek-6.7b-base model
#4785 closed Jul 4, 2025
Missing attribute when generate infer_request in VLM GRPO
#4824 closed Jul 4, 2025
grpo训练qwen，报通信超时错误
#4797 closed Jul 4, 2025
Rollout Stuck after “Core engine process 0 ready.” if use custom plugin
#4807 closed Jul 4, 2025
Load lora finetuned model and further finetune with GRPO
#4821 closed Jul 4, 2025
使用PPO训练完毕保存时报错：zero_gather_16bit_weights_on_model_save
#4815 closed Jul 4, 2025
reward model的训练能否再详细整理一下文档
#4379 closed Jul 3, 2025
我想使用Gemini蒸馏的特定场景带有思考过程的多轮对话数据，微调qwen3-32B，想问一下，损失计算的时候是不是思考过程只计算最后一轮的<think>\n\n</think>的内容呢？
#4809 closed Jul 2, 2025
Qwen3Reranker自定义数据构造bug
#4784 closed Jul 2, 2025
How to Input Frame Sequence Without Original Video? 如何直接输入帧图片序列?
#4776 closed Jul 2, 2025
megatron sft使用packing时报“cannot pickle _io.TextIOWrapper”错误
#4778 closed Jul 2, 2025
vllmengine推理的GuidedDecodingParams参数失效
#4790 closed Jul 1, 2025
为什么我在使用GRPO微调时，我的reward始终为0
#4789 closed Jul 1, 2025
grpo字段解析bug
#4783 closed Jul 1, 2025
GRPO,Why is it that I am incoming in a video, which is recognized as a picture?
#4772 closed Jul 1, 2025
grpo输入格式多处描述不一致
#4782 closed Jul 1, 2025
支持GRPO训练Qwen2-audio-7B-Instruct吗
#4768 closed Jul 1, 2025
grpo_trainer.py TypeError: must be real number, not NoneType
#4751 closed Jun 30, 2025
长文本推理报错swift.llm.template.base.MaxLengthError: Current length of row(57972) is larger than the max_length(32768)
#4767 closed Jun 30, 2025
Qwen2.5-VL-3B-Instruct 采用awq量化报错
#4761 closed Jun 30, 2025
对ulysess计算eval acc的疑惑
#4753 closed Jun 29, 2025
强化微调脚本出现AttributeError: 'NoneType' object has no attribute 'infer'
#4201 closed Jun 29, 2025
使用vllm批量推理时卡间通信报错
#4433 closed Jun 29, 2025
GRPO训练失败，模型似乎学习困难
#4679 closed Jun 29, 2025
local_repo_path参数，在python脚本里如何添加
#4714 closed Jun 27, 2025
qwen2.5vl lora sft关于freeze_vit
#4722 closed Jun 27, 2025
在学习全部轮次的SFT训练中，中间轮次结束符号不能被学习，导致训练后的模型无法停止
#4732 closed Jun 27, 2025
Megatron不支持GRPO训练
#4744 closed Jun 27, 2025
DPO的full微调后Qwen3-4B模型不再输出think
#4701 closed Jun 27, 2025
GRPO怎么自定义format reward
#4667 closed Jun 26, 2025
[grpo] loading BERT model in reward
#4580 closed Jun 26, 2025
GRPO训练中Loss和grad_norm一直为0
#4570 closed Jun 26, 2025
GRPO什么时候支持多机megatron训练
#4558 closed Jun 26, 2025
GRPO训练reward的std始终为0
#4512 closed Jun 26, 2025
多机训练使用--vllm_mode server 会卡死无法运行
#4532 closed Jun 26, 2025
GRPO Qwen3 32B training torch issue
#4491 closed Jun 26, 2025
qwen3强化训练，grpo训练结束后，爆通信错误
#4170 closed Jun 26, 2025
The expanded size of the tensor (8) must match the existing size (5) at non-singleton dimension 0.
#4056 closed Jun 26, 2025
训练结束报错/data/chatglm/retrieval_agent_new/ms_swift_train/ms-swift/swift/cli/rlhf.py FAILED
#4302 closed Jun 26, 2025
dapo时在UserWarning: None of the inputs have requires_grad=True. Gradients will be None一直卡住，直至timeout
#4050 closed Jun 26, 2025
用grpo训练qwen2.5-7b-instruct出现!!!!
#4060 closed Jun 26, 2025
训练正常 eval时报assert error
#4081 closed Jun 26, 2025
Batch size in GRPO.
#4341 closed Jun 26, 2025
grpo训练奖励函数注册失败
#4351 closed Jun 26, 2025
GRPO数据传递失败
#4362 closed Jun 26, 2025
Qwen-Omni 全量微调grpo报错ValueError: `max_new_tokens` must be greater than 0, but is -16384
#4392 closed Jun 26, 2025
GRPO微调多模态训练报错
#4470 closed Jun 26, 2025
双卡A6000使用GRPO微调Qwen2.5-VL-3B会OOM吗？
#4477 closed Jun 26, 2025
RTX3090上运行sft-rlhf-grpo微调，报错：torch.distributed.DistBackendError: [3] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: wait timeout after 1800000ms,
#3612 closed Jun 26, 2025
Any plans to support megatron for GRPO training?
#3760 closed Jun 26, 2025
LLava 跑GRPO 无法跑通
#3928 closed Jun 26, 2025
QWQ：GRPO训练无法跑通，报错”RuntimeError: ACL stream synchronize failed, error code:107020“
#3932 closed Jun 26, 2025
While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!””
#3930 closed Jun 26, 2025
GRPO训练中间一部分后报错
#3771 closed Jun 26, 2025
grpo训练卡住，一直显示一下问题。
#3794 closed Jun 26, 2025
GRPO训练报错
#3769 closed Jun 26, 2025
Various traceback error during GRPO training
#3836 closed Jun 26, 2025
贡献一个dockerfile吧，这个测试了多模态的grpo训练可以基本可以复现示例里面的结果
#3812 closed Jun 26, 2025
GRPO 算法如果设置 reward_model 而不是--reward_funcs ，reward模型和 model都加载到一张卡里去了
#3843 closed Jun 26, 2025
Meet GPU OutOfMemory in GRPO training
#3848 closed Jun 26, 2025
grpo训练32b模型OOM
#3871 closed Jun 26, 2025
GRPO 训练100 steps后性能骤降，请问是什么原因
#3876 closed Jun 26, 2025
Bug! Checkpoint resume failure - deepspeed different DP size. Is there a quick checkpoint converter anywere?
#3989 closed Jun 26, 2025
Bug! Help! MS-SWIFT GRPO + LoRA training hung/stuck after training 1 step from full merged model merged from lora adapter
#3990 closed Jun 26, 2025
if sleep_level > 0, gradient_accumulation_steps will be forced to 1
#3943 closed Jun 26, 2025
The GRPO training process hangs for multi-node training.
#3934 closed Jun 26, 2025
NPU环境训练速度问题
#3331 closed Jun 26, 2025
求一个能8卡A100使用GRPO跑通Qwen2.5 72B模型的脚本
#3416 closed Jun 26, 2025
GRPO 训练时使用2个节点并且设置--num_infer_workers 2 时会报错
#3393 closed Jun 26, 2025
基于qwenvl-7b-instruct训练grpo，eval过程会oom
#3541 closed Jun 26, 2025
4*v100环境执行lora_vllm脚本报错：Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.
#3549 closed Jun 26, 2025
单机多卡跑grpo，多个step后会报错
#3576 closed Jun 26, 2025
Loss goes to 0, Gibberish Outputs
#3582 closed Jun 26, 2025
日志怎么添加训练数据中的字段
#3591 closed Jun 26, 2025
多机多卡GRPO assert self.cpu_group is not None
#3583 closed Jun 26, 2025
设置NPROC_PER_NODE后会直接报错 failed (exitcode: -11) local_rank: 1
#3611 closed Jun 26, 2025
GRPO算法训练，后期训练时，显存暴增
#3600 closed Jun 26, 2025
grpo 固定seed，结果依旧不可复现
#3607 closed Jun 26, 2025
gemma3使用grpo用vllm的bug
#3660 closed Jun 26, 2025
【bug】Failed to open local file in cache
#3667 closed Jun 26, 2025
[Bug]: RuntimeError: setup failed!
#3662 closed Jun 26, 2025
使用GRPO训练llava-1.5以及qwen2-vl时，使用vllm推理，在eval时报错
#3666 closed Jun 26, 2025
有没有4*V100能跑起来GRPO的训练脚本和环境配置呀？
#3671 closed Jun 26, 2025
ValueError: RLHF do not support sequence parallel
#3673 closed Jun 26, 2025
Hanging after tqdm starts [COLOCATE MODE]
#3702 closed Jun 26, 2025
GRPO max_grad_norm seems don't work
#3713 closed Jun 26, 2025
It is recommended to use a dedicated device for vLLM
#3719 closed Jun 26, 2025
npu环境GRPO训练，使用vllm时，官方脚本无法正常启动，其他脚本则可以
#3726 closed Jun 26, 2025
GRPO 训练，数据格式解析有bug
#3728 closed Jun 26, 2025
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not NoneType
#3730 closed Jun 26, 2025
Support Ulysses in Swift
#3731 closed Jun 26, 2025
GRPO tutorial bug: world_size (8) is not equal to tensor_model_parallel_size (4) x pipeline_model_parallel_size (1)
#3739 closed Jun 26, 2025
多模态qwen2.5-vl-3B,grpo实验报错
#3398 closed Jun 26, 2025
grpo微调deepseek v2，训练过程中到eval阶段，就会卡住，然后就会停止训练
#3528 closed Jun 26, 2025
请问如何在grpo中配置自定义的数据集路径，并进行数据格式转换？
#3525 closed Jun 26, 2025
2workers_async_iterations2_vllm help
#3522 closed Jun 26, 2025
Bug in GRPO best practices document!
#3501 closed Jun 26, 2025
unhashable type: 'list'
#3490 closed Jun 26, 2025
请求支持GRPO训练中，vllm推理后端支持多张卡🙏 Request for support for using multiple cards in the vLLM inference backend during GRPO training
#3477 closed Jun 26, 2025
使用GRPO进行Qwen2.5-vl-7B-Instruct训练，报错：无法多卡训练，只能加载1张卡并oom
#3404 closed Jun 26, 2025
GRPO训练功能建议
#3415 closed Jun 26, 2025
GRPO 训练loss和reward异常
#3372 closed Jun 26, 2025
grpo 多机多卡训练timeout
#3343 closed Jun 26, 2025
GRPO训练LLAVA CUDA Error
#3264 closed Jun 26, 2025
GRPO LLava 训练报错，无法多卡训练，1卡可以
#3228 closed Jun 26, 2025
GRPO 4卡A100训练BUG
#3223 closed Jun 26, 2025
如何对deepseek r1做sft和grpo微调
#3211 closed Jun 26, 2025
使用GRPO 使用我已经训练的LLava模型加载问题
#3195 closed Jun 26, 2025
GRPO deepspeed lmdeploy训练InternVL2d5 报错
#3151 closed Jun 26, 2025
Using Unsloth in conjunction with GRPO to train a model for OOM
#3183 closed Jun 26, 2025
grpo训练如何设置vllm_device使用多张卡
#3098 closed Jun 26, 2025
Does ms-swift support tensor(model)-parallel GRPO training?
#3068 closed Jun 26, 2025
ValueError: Image features and image tokens do not match: tokens: 5589, features 5805
#2460 closed Jun 26, 2025
grad_norm nan
#2280 closed Jun 26, 2025
期望RLHF能支持序列并行（sequence_parallel）
#1958 closed Jun 26, 2025
GRPO训练的old_per_token_logps计算是不是有bug
#4727 closed Jun 26, 2025
rerank 数据加载错误
#4728 closed Jun 26, 2025
Issue with Multi-GPU Training
#4718 closed Jun 26, 2025
Qwen3 Full Sft设置predict_with_generate=true报错keyerror"messages"，为false时可以正常训练结束
#4695 closed Jun 26, 2025
支持 moonshotai/Kimi-VL-A3B-Thinking-2506
#4708 closed Jun 25, 2025
grpo训练qwen2.5-vl报错
#4364 closed Jun 25, 2025
全量微调grpo 相同数量的样本ms-swift效果比unsloth效果差很多
#4393 closed Jun 25, 2025
GRPO OOM USE resume_from_checkpoint
#4406 closed Jun 25, 2025
GRPO训练报错：AssertionError: Forward context is not set. Please use `set_forward_context` to set the forward context.
#4418 closed Jun 25, 2025
支持的DeepSeek-R1训练是指671B的模型吗还是蒸馏的模型？
#3132 closed Jun 25, 2025
seq_cls训练时候开启flash_attn指标大幅度低于不开flash_attn
#4384 closed Jun 25, 2025
多回归任务，推理问题
#4705 closed Jun 25, 2025
请问使用zero2/zero3导致max_steps相差八倍的原因是什么？
#4616 closed Jun 23, 2025
请求增加对Qwen3-8B的自我认知训练的NoteBook文件
#4034 closed Jun 23, 2025
raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}") [rank1]: KeyError: 'Column length not in the dataset. Current columns in the dataset: []'
#4058 closed Jun 23, 2025
InternVL3-9B LoRA微调数据集预处理速度缓慢问题（大约7h）
#4076 closed Jun 23, 2025
单坐标点定位物体位置
#4292 closed Jun 23, 2025
data_load
#4288 closed Jun 23, 2025
Seq CLS Infer 问题咨询
#4325 closed Jun 23, 2025
UI-TARS冻结参数推理无法均匀分配显存导致超出显存
#4359 closed Jun 23, 2025
微调Qwen3在默认脚本上加上zero2/3会OOM
#4371 closed Jun 23, 2025
VLLM Engine Batch 推理咨询
#4386 closed Jun 23, 2025
swift infer这个些命令如何转为python命令运行的，内部原理
#4555 closed Jun 23, 2025
Failing to preprocess hf dataset
#4564 closed Jun 23, 2025
How-to use on Apple Mac?
#4572 closed Jun 23, 2025
Multimodal finetune llava1.6-mistral bug: RuntimeError: Tensors must have same number of dimensions
#4578 closed Jun 23, 2025
关于ms-swift 3.x的template和2.x的不同
#4602 closed Jun 23, 2025
ovis2 微调失败，loss计算时报ValueError: Expected input batch_size (1384) to match target batch_size (16384)
#4611 closed Jun 23, 2025
关于pip install -e '.[all]' 的安装、evalscope的安装的咨询
#4605 closed Jun 23, 2025
loss_scale hermes not work
#4607 closed Jun 23, 2025
华为910B lora qwen2.5vl报错：AssertionError: Torch not compiled with CUDA enabled
#4619 closed Jun 23, 2025
满血版R1/Qwen3-235B-30A HF参数转megatron OOM
#4648 closed Jun 23, 2025
10分钟改变大模型自我认知教程报错'Qwen2_5VLTemplate' object has no attribute 'model'
#4662 closed Jun 23, 2025
DPO训练到 100 步时，遇到 StopIteration ERROR during training 问题
#4644 closed Jun 23, 2025
多卡多进程使用orpo卡死，触发watchdog caught collective operation timeout.
#3564 closed Jun 20, 2025
无
#4481 closed Jun 20, 2025
无
#4504 closed Jun 20, 2025
我希望在训练reward model的时候添加一个分类损失应该怎么做
#4640 closed Jun 20, 2025
我如何基于一个qwen2.5vl创建一个新的reward model结构并进行训练
#4635 closed Jun 19, 2025
RuntimeError: shape '[-1, 151936]' is invalid for input of size 6266880
#4318 closed Jun 19, 2025
Finetune Qwen3-235B-a22b with LoRA end with error: AttributeError: 'LigerQwen3MoeSwiGLUMLP' object has no attribute 'down_proj'
#4531 closed Jun 19, 2025
When to support SGLang
#3510 closed Jun 19, 2025
RuntimeError on NPU: a leaf Variable that requires grad is being used in an in-place operation.
#4613 closed Jun 19, 2025
[INFO:swift.trainers.rlhf_trainer.vllm_client] Server is not up yet.
#4525 closed Jun 18, 2025
实现Qwen3技术报告中的on policy distillation
#4533 closed Jun 18, 2025
Liger kernel not working with Qwen2.5 VL
#4543 closed Jun 18, 2025
qwen3-1..7B GRPO 训练几个轮次后爆显存
#4388 closed Jun 18, 2025
qwenvl2.5 lora training
#4615 closed Jun 18, 2025
agent template issue with no argument
#4600 closed Jun 17, 2025
多轮对话数据的训练，但只训练最后一轮的assistant回答
#4596 closed Jun 16, 2025
Agent训练qwen2.5-base eos token 训练推理不一致问题
#4498 closed Jun 15, 2025
GRPO lora
#4593 closed Jun 14, 2025
8*A100执行GRPO完整流程，使用vllm爆显存，不使用vllm可以正常训练但缓慢
#4594 closed Jun 14, 2025
修改num_generations参数，报错ValueError: range() arg 3 must not be zero
#4589 closed Jun 13, 2025
最新代码分支超长文本训练报错
#4583 closed Jun 13, 2025
lora训练Can't pickle local object 'DeepSpeedEngine._create_module_forward_post_hook.<locals>._module_forward_post_hook'
#4586 closed Jun 13, 2025
megatron-swift支持DeepSeek-R1-0528-Qwen3-8B
#4438 closed Jun 12, 2025
Internvl2.5-4B GRPO训练视频数据时报错
#4579 closed Jun 12, 2025
自我认知demo mac上运行报错
#4575 closed Jun 12, 2025
GRPO训练过程中loss和grad_norm都为0，提示没有label_names
#4547 closed Jun 12, 2025
transformers4.52: if v not in ALL_PARALLEL_STYLES: TypeError: argument of type 'NoneType' is not iterable
#4577 closed Jun 12, 2025
grpo训练卡住
#4549 closed Jun 12, 2025
Why does applying sequence parallelism reduce the step count?
#4553 closed Jun 11, 2025
InfoNCE数据处理阶段报错
#4546 closed Jun 11, 2025
使用ToolBench数据集出错
#3947 closed Jun 11, 2025
CoundownORM是什么？
#4528 closed Jun 11, 2025
新版本ms-swift训练GRPO时报错 AttributeError: Can't pickle local object 'GRPOTrainer.__init__.<locals>.<lambda>'
#4557 closed Jun 11, 2025
null
#3208 closed Jun 11, 2025
Qwen2.5-vl lora GRPO 微调后怎么用 hg 推理呢
#3187 closed Jun 11, 2025
Qwen3 Full Sft后export hf失败
#4550 closed Jun 11, 2025
Intern3VL进行GRPO训练时报错：KeyError: 'input_ids'
#4519 closed Jun 10, 2025
载入模型一个比较奇怪的事情
#4539 closed Jun 10, 2025
peft热补丁是否加载的判断函数存在问题
#4534 closed Jun 9, 2025
dpo train qwen2.5-7b
#4526 closed Jun 9, 2025
DPO Sequence_parallel_size == 8 Error NotImplementedError
#4420 closed Jun 8, 2025

140 Issues opened by 116 people

reward model dataset inference
#4864 opened Jul 8, 2025
About the "500 Internal Server" Error in vllm-server
#4862 opened Jul 8, 2025
Support for fine-tuning more multimodal embedding models (beyond GME)
#4861 opened Jul 7, 2025
per_device_train_batch_size 变大代码报错
#4858 opened Jul 7, 2025
Evaluation don't run during training for custom dataset
#4855 opened Jul 7, 2025
qwen2.5vl是否支持4bit的kv_cache量化？
#4849 opened Jul 7, 2025
使用ms-swift sft之后模型的config.json文件变了，导致我不能直接使用vllm部署模型
#4844 opened Jul 7, 2025
grpo + gen_rm padding index error
#4841 opened Jul 7, 2025
Need to update requirements.txt
#4837 opened Jul 5, 2025
Trained Qwen 3 model seems to be broken.
#4835 opened Jul 4, 2025
Feature Request: RTX 5090 Support with ms-swift docker image with CUDA 12.8
#4834 opened Jul 4, 2025
[有人碰到过吗？]qwen2.5vl微调agent出现坐标点偏移问题
#4831 opened Jul 4, 2025
SwanLab Notification Integration
#4829 opened Jul 4, 2025
awq量化qwen2.5-vl-7b报错
#4828 opened Jul 4, 2025
Qwen2-VL-2B 预训练到后期会出现梯度爆炸，其他VLM不会出现
#4819 opened Jul 3, 2025
Ovis-2B 预训练报错
#4818 opened Jul 3, 2025
带有思考过程的多轮对话数据，微调qwen3-32B，想问一下，损失计算的时候是不是思考过程只计算最后一轮的<think>\n\n</think>的内容呢？
#4810 opened Jul 2, 2025
grpo微调deepseek_coder模型填充信息有误
#4808 opened Jul 2, 2025
Reranker training requires A LOT of VRAM
#4805 opened Jul 2, 2025
设置packing_cache后，第二次训练没有从cache读取数据，又重新packing了。
#4803 opened Jul 2, 2025
预训练/微调很慢
#4801 opened Jul 2, 2025
sft A3B模型一直卡在这里不动了
#4799 opened Jul 2, 2025
Logged accuracy doesn't change when training a reranker
#4796 opened Jul 2, 2025
qwen2.5-vl grounding GRPO
#4794 opened Jul 2, 2025
llava-next-110b seq_cls 微调出错，AttributeError: 'Identity' object has no attribute 'weight'
#4793 opened Jul 2, 2025
关于SFT微调语料
#4791 opened Jul 2, 2025
ValueError: Failed to retrieve the dataset. You can avoid this issue by increasing `max_length` or modifying the `truncation_strategy`.
#4787 opened Jul 1, 2025
test kimi vl thinking meet error!
#4780 opened Jul 1, 2025
对DeepSeek-VL2进行GRPO训练
#4775 opened Jun 30, 2025
GRPO training: skip_special_tokens config
#4771 opened Jun 30, 2025
支持keye-vl-8b模型
#4766 opened Jun 30, 2025
关于resume_from_checkpoint加载deepspeed
#4765 opened Jun 30, 2025
agent推理时是否还不支持实际的工具调用，参考demo_agent.py
#4764 opened Jun 30, 2025
qwen2.5-vl的awq量化问题
#4762 opened Jun 30, 2025
使用xtuner作为序列并行的实现方式时没有调用pad_and_split_inputs对输入进行pad和split
#4760 opened Jun 30, 2025
请教下GRPO训练时出现模型多次异常触碰到Max_length的问题
#4758 opened Jun 30, 2025
InternVL3-1B微调之后 pt 和vllm推理有很大差别
#4756 opened Jun 30, 2025
蒸馏Qwen2.5-Omni模型报错：IndexError: max(): Expected reduction dim 1 to have non-zero size.
#4755 opened Jun 29, 2025
MaxLengthError
#4754 opened Jun 29, 2025
请问是否支持自定义dataloader，想要实现两种不同格式dataset的训练，每个batch仅有一类数据
#4750 opened Jun 28, 2025
GRPO npu多机多卡显存问题
#4748 opened Jun 28, 2025
npu training fails because decord cannot be installed on npu
#4747 opened Jun 28, 2025
如何使用F1保存最佳的分类模型
#4746 opened Jun 27, 2025
基于本地加载数据集进行多卡并行训练，停在Init COMPLETE... 无法进入train阶段
#4743 opened Jun 27, 2025
输入多图的编号问题
#4742 opened Jun 27, 2025
ms swift如何加入early stop
#4741 opened Jun 27, 2025
[WARNING:swift] Please install the package: pip install "decord" -U
#4740 opened Jun 27, 2025
Qwen2.5-omni GRPO训练出现内存OOM
#4739 opened Jun 27, 2025
微调DeepSeek模型报错：AssertionError: noaux_tc not supported for training
#4737 opened Jun 26, 2025
Does the packing feature block attention score between different samples?
#4736 opened Jun 26, 2025
a question for rl
#4735 opened Jun 26, 2025
Please open Security Advisories for vulnerability reporting
#4733 opened Jun 26, 2025
swift推理精度差异
#4726 opened Jun 26, 2025
使用lora 训练qwen2.5vl3b之后，lora未合并，使用deploy部署，使用pt, 跟vllm 结果不一致
#4725 opened Jun 26, 2025
GKD代码加载模型卡死
#4724 opened Jun 26, 2025
Swift代码库进行lora checkpoint的continue sft，加载模型和checkpoint后可训练参数为0%
#4723 opened Jun 26, 2025
[rank4]: AssertionError: Expected multimodal embeddings to be a list/tuple of 2D tensors, or a single 3D tensor, but got <class 'NoneType'> instead.
#4721 opened Jun 25, 2025
qwen3 embedding 微调在评估阶段报错：'NoneType' object has no attribute 'get'
#4720 opened Jun 25, 2025
添加python示例代码
#4717 opened Jun 25, 2025
如何传入自定义的causal_attention_mask
#4716 opened Jun 25, 2025
hf格式模型文件转megatron报错: CUDA error: operation not supported
#4713 opened Jun 25, 2025
lora 微调 Ovis2-34B loss=0.0
#4711 opened Jun 25, 2025
Deepspeed zero3 多 GPU 训练没法设置 batch_size 为1
#4710 opened Jun 25, 2025
[Bug]: [WARNING:swift] Please install the package: pip install "decord" -U
#4709 opened Jun 25, 2025
多回归任务输出问题
#4706 opened Jun 25, 2025
序列分类任务，能否多卡训练？
#4704 opened Jun 25, 2025
Swift rollout卡住
#4703 opened Jun 25, 2025
NuminaMath-TIR数据集上训GRPO不work
#4702 opened Jun 25, 2025
使用msswift框架，基于QwQ-32B模型，微调自制的function-call数据集，效果很差，不知道原因
#4700 opened Jun 25, 2025
自定义数据集包含了'messages'、'rejected_response'、'label'、'images'、'videos'、'audios'、'tools'和'objects'之外的key，该如何写template？
#4699 opened Jun 24, 2025
lora微调qwen3 embedding模型弹出警告find_unused_parameters
#4698 opened Jun 24, 2025
Qwen2-VL merge lora报错
#4697 opened Jun 24, 2025
求问 Qwen 235B A22 训练成本和 Qwen 32B dense 对比
#4696 opened Jun 24, 2025
用CLI推理时，有办法能在推理结果中保存输入的dataset中的额外参数嘛？
#4693 opened Jun 24, 2025
VLLM Engine 咨询
#4692 opened Jun 24, 2025
多机加载大数据集时，会多台机子先后串行加载
#4691 opened Jun 24, 2025
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
#4689 opened Jun 24, 2025
我想要给PPO设置两个reward model和两个value model，通过两者的value和reward加权计算loss损失，应该怎么做？
#4688 opened Jun 24, 2025
请问是否支持QWenVL等多模态模型的增量预训练？
#4686 opened Jun 24, 2025
改变 IMAGE_FACTOR 是不是意味着视觉部分需要重新训练？
#4685 opened Jun 24, 2025
如何关闭自动模型并行呢？
#4684 opened Jun 24, 2025
Help: Multi Turn SFT
#4681 opened Jun 24, 2025
mllm模型训练，一个epoch训练完任务卡住，gpu利用率100%，无法save checkpoint
#4680 opened Jun 23, 2025
奖励函数一直震荡不上升，似乎学不到东西
#4677 opened Jun 23, 2025
SFT训练一个回归任务后，推理使用vllm加速，模型load会报错，有办法解决吗
#4676 opened Jun 23, 2025
持续输出Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.，但是不报错，请问是什么原因？
#4673 opened Jun 23, 2025
关于 rlhf 数据的preprocess
#4670 opened Jun 23, 2025
微调 MiniCPM-o-2_6 报错 assert media_type in {'image', 'video'}
#4668 opened Jun 23, 2025
Applying sequence parallelism causes the training to finish early, even though it hasn't reached the specified max_steps
#4663 opened Jun 23, 2025
Any way to run evaluation before training starts?
#4660 opened Jun 22, 2025
'weight' must be 2-D
#4656 opened Jun 20, 2025
grpo 多任务训练奖励函数设置返回None 这样的话，如果想要查看单个任务的reward曲线，在tensorboard中会出现nan的情况
#4653 opened Jun 20, 2025
为什么没有loss
#4652 opened Jun 20, 2025
训练Omni的时候会卡住不动
#4651 opened Jun 20, 2025
请问下swift中集成的lora-ga是否支持多卡训练呢
#4646 opened Jun 20, 2025
更新以后我应该如何获得history呢
#4645 opened Jun 20, 2025
multi-gpu GRPO training with sequence parallelism terminated unexpectedly during execution, with the following warning: destroy_process_group() was not called before program exit, which may lead to resource leaks.
#4643 opened Jun 19, 2025
如何新增一个vlm从而做embedding任务？
#4642 opened Jun 19, 2025
想问下embedding的训练如何加入system or instructions？
#4638 opened Jun 19, 2025
shape mismatch internvl3
#4636 opened Jun 19, 2025
Qwen2.5-vl预训练过程中loss突然激增
#4634 opened Jun 19, 2025
训练日志停止且GPU利用率异常
#4633 opened Jun 19, 2025
qwen2.5vl定位训练同样的参数环境多次训练结果波动非常大
#4632 opened Jun 18, 2025
请问支持Agent的RL训练吗？
#4631 opened Jun 18, 2025
新增megatron sft中的freeze_parameters_regex参数支持
#4630 opened Jun 18, 2025
QWEN3-32B LORA GRPO 无报错卡住
#4628 opened Jun 18, 2025
swift infer 设置了temperature，top_p 但是每次生成都是同样的结果
#4627 opened Jun 18, 2025
环境变量设置了NPROC_PER_NODE=2，一台机器2张卡，为什么在推理时还是发生了MP而没有发生DP
#4625 opened Jun 17, 2025
Large Language Diffusion Moldes支持
#4620 opened Jun 17, 2025
Qwen3 Embedding训练抛出多线程的错误
#4617 opened Jun 17, 2025
qwen2.5-vl grounding任务里同时有分类，是否支持？
#4614 opened Jun 16, 2025
lora微调后merge完模型进行lmdeploy推理用时比Qwen2.5-VL-7B-Instruct多一倍，原因为何？
#4609 opened Jun 16, 2025
Any example on training llama on function calling dataset?
#4604 opened Jun 15, 2025
qwen2.5-7B GRPO训练时卡住，未显示任何报错
#4603 opened Jun 15, 2025
qwen3-32B全参数ppo训练一步报错
#4599 opened Jun 13, 2025
GRPO 训练到100步，评估保存的位置报错，还未进行评估
#4598 opened Jun 13, 2025
关于多模态目标检测多轮对话数据集的训练
#4597 opened Jun 13, 2025
采用swift infer 测试qwen2.5-omni模型结果，与官方测试方法结果不一致
#4595 opened Jun 13, 2025
Infonce loss hard negatives type error
#4588 opened Jun 12, 2025
使用lora的方式单机多卡微调最新的Qwen3_embedding模型会报错
#4585 opened Jun 12, 2025
训练qwen2.5vl时，开启序列并行+packing，loss会掉到0
#4581 opened Jun 12, 2025
GRPO的时候怎么保存最后一步的checkpoints
#4574 opened Jun 12, 2025
cachedqwen2tokenizer does not exist
#4569 opened Jun 12, 2025
使用--device_map auto 报错
#4567 opened Jun 11, 2025
🍭[Roadmap] ms-swift3.6
#4561 opened Jun 11, 2025
Megatron-SWIFT 是否支持Qwen2.5 VL模型呀
#4559 opened Jun 11, 2025
如何进行多轮对话的训练
#4552 opened Jun 10, 2025
GRPO使用自定义预处理器加载的多模态数据集时卡死无法训练
#4551 opened Jun 10, 2025
训练中评测示例中为什么使用一个中文qa数据集去训练但是用一个数学类数据集去评测？
#4544 opened Jun 10, 2025
vllm不支持微调的qwen2.5-omni模型
#4542 opened Jun 10, 2025
math_verify 解决表达式能力有限？
#4541 opened Jun 10, 2025
微调qwen3-235B-A22B-AWQ
#4540 opened Jun 10, 2025
怎么保存性能最好的几个checkpoint
#4538 opened Jun 10, 2025
KTO训练数据构造
#4535 opened Jun 9, 2025
我想知道我的最终自定义数据集最终长什么样子应该如何操作?
#4530 opened Jun 9, 2025
目标检测自定义数据集咨询
#4529 opened Jun 9, 2025
多模态预训练性能问题
#4527 opened Jun 9, 2025
Any plan to add DeepEyes implementation?
#4524 opened Jun 8, 2025
LISA显存占用
#4522 opened Jun 8, 2025
实现优势样本回放(SSR)机制
#4520 opened Jun 8, 2025

57 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Qwen2.5vl 7b全参数训练显存异常
#3504 commented on Jun 9, 2025 • 0 new comments
Qwen2.5-omni vllm 推理异常
#4492 commented on Jun 9, 2025 • 0 new comments
NPU训练qwen2.5-vl报错
#3408 commented on Jun 9, 2025 • 0 new comments
export qwen2.5-vl-3b 的lora模型存在问题
#4511 commented on Jun 10, 2025 • 0 new comments
DPO微调报错，老是出现Storage size calculation overflowed with sizes。
#2538 commented on Jun 10, 2025 • 0 new comments
ms-swift使用vllm作为后端推理qwen2.5-omni-7b时报错
#4210 commented on Jun 10, 2025 • 0 new comments
奇怪的out of memory报错
#3964 commented on Jun 11, 2025 • 0 new comments
Multi-node slurm training?
#4448 commented on Jun 11, 2025 • 0 new comments
swift deploy 命令中，--max_model_len 80000没生效
#4464 commented on Jun 12, 2025 • 0 new comments
纯文本数据+添加 special token，全参数微调训练Qwen2VL，模型不收敛
#3804 commented on Jun 13, 2025 • 0 new comments
训练完reward model，如何用rm预测单个样本的分数呢？
#4410 commented on Jun 14, 2025 • 0 new comments
grpo每次eval结束后就卡住，然后超时训练中断
#4355 commented on Jun 14, 2025 • 0 new comments
端口监听错误
#3988 commented on Jun 15, 2025 • 0 new comments
Streaming + Packing + resume_from_checkpoint时出现报错
#4083 commented on Jun 15, 2025 • 0 new comments
gme 7b在H20上lora微调时出现 Out Of Memory
#4361 commented on Jun 16, 2025 • 0 new comments
支持Qwen/Qwen2.5-Omni-7B的talker微调，用于微调音色、方言等
#3690 commented on Jun 17, 2025 • 0 new comments
lora微调占用显存**逐渐增大**直到**爆炸**
#2364 commented on Jun 17, 2025 • 0 new comments
Swift支持在NPU启动序列并行吗
#4412 commented on Jun 18, 2025 • 0 new comments
grounding数据集格式，多类别+多box怎么写
#3732 commented on Jun 18, 2025 • 0 new comments
DPO微调多模态qwen2.5-7B，在图片处理时报错，Caught ValueError in DataLoader worker process 0与cannot reshape array of size 1843200 into shape (1,2,3,17,2,14,22,2,14)
#4181 commented on Jun 18, 2025 • 0 new comments
训练过程中卡死，进程处于睡眠状态，GPU利用率为0
#3290 commented on Jun 19, 2025 • 0 new comments
怎样为Qwen2.5-VL的视觉和文本设置不同的 lora rank？
#4223 commented on Jun 20, 2025 • 0 new comments
请问会支持qwen2.5中hermes的function call的训练方式吗？
#3523 commented on Jun 20, 2025 • 0 new comments
pretrain报错进度异常问题
#2692 commented on Jun 20, 2025 • 0 new comments
能否支持MiniCPM-o 2.6 audio模态训练
#2961 commented on Jun 23, 2025 • 0 new comments
可以在moe的模型训练中增加专家并行的参数吗
#1631 commented on Jun 24, 2025 • 0 new comments
lora 微调 ovis2-34B loss=0.0 grad_norm=nan
#3494 commented on Jun 25, 2025 • 0 new comments
Error occurred when saving checkpoints during Qwen3 multi-GPU SFT
#4411 commented on Jun 25, 2025 • 0 new comments
训练保存checkpoint的时候报错，但本地又有相应的文件。
#3420 commented on Jun 25, 2025 • 0 new comments
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on Jun 25, 2025 • 0 new comments
训练后的RM模型，支持推理引擎sglang/vllm部署
#3610 commented on Jun 26, 2025 • 0 new comments
有无懂哥说说internvl3_8B微调完后怎么做awq量化呀
#4115 commented on Jun 29, 2025 • 0 new comments
Fatal Python error: none_dealloc: deallocating None
#4353 commented on Jul 1, 2025 • 0 new comments
wandb，开了海外代理还一直报错（网络连接超时，network error (connectiontimeout)）
#4152 commented on Jul 2, 2025 • 0 new comments
GPTQ量化模型GRPO强化微调报错:AttributeError: 'GPTQLoraLinear' object has no attribute 'get_delta_weight'
#3949 commented on Jul 2, 2025 • 0 new comments
请问支持 webdataset 作为 qwen2.5VL 的输入么？
#3214 commented on Jul 2, 2025 • 0 new comments
关于序列并行训练
#2837 commented on Jul 3, 2025 • 0 new comments
SFT利用上一次加载数据的缓存
#3762 commented on Jul 4, 2025 • 0 new comments
评测时，长度最大只能输出2048，不知道为啥。。。。
#3761 commented on Jul 4, 2025 • 0 new comments
评测时，生成参数改了没效果，评测的配置文件还是显示默认参数
#3758 commented on Jul 4, 2025 • 0 new comments
自定义评测集报错
#3757 commented on Jul 4, 2025 • 0 new comments
ValueError: Cannot use chat template functions because tokenizer.chat_template
#3755 commented on Jul 4, 2025 • 0 new comments
Support SGLang in Swift
#3750 commented on Jul 4, 2025 • 0 new comments
关于rejected_response的引入
#3748 commented on Jul 4, 2025 • 0 new comments
支持GME微调么
#3019 commented on Jul 4, 2025 • 0 new comments
使用qwen32b-vl训练grounding发现连基本的格式遵循都做不到
#3746 commented on Jul 5, 2025 • 0 new comments
more logs in wandb
#3737 commented on Jul 5, 2025 • 0 new comments
deepseek-r蒸馏模型funcation_calling训练没有效果
#3733 commented on Jul 5, 2025 • 0 new comments
async_infer无法实现异步调用的疑问
#3717 commented on Jul 6, 2025 • 0 new comments
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 commented on Jul 6, 2025 • 0 new comments
开启断点训练后，为什么剩余时间越来越大了，epoch 和max_steps也和断点训练前接不上呢
#3783 commented on Jul 7, 2025 • 0 new comments
Use the feature of resume_from_checkpoint when using python code to run finetuning.
#3774 commented on Jul 7, 2025 • 0 new comments
Qwen2-Audio using flash attention: error occurs:RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
#2542 commented on Jul 7, 2025 • 0 new comments
deepspeed AutoTP + ZeRO
#3797 commented on Jul 8, 2025 • 0 new comments
关于ms-swift eval 回测自定义数据集的问题，而不得不使用evalscope来解决评测，希望尽快支持system字段
#3792 commented on Jul 8, 2025 • 0 new comments
How to specify the split (train/validation) for the dataset in cli
#3789 commented on Jul 8, 2025 • 0 new comments
Update dataset_info.json
#3723 commented on Jul 6, 2025 • 0 new comments