updates GRPOTrainer compatible with trl 0.17 #3969

hjh0119 · 2025-04-23T11:15:20Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

update trl 0.16-> 0.17

remove mini_batch

thanks to generate once per effective batch, the mini_batch can be deprecated

Experiment results

Paste your experiment result here(if needed).

* main: fix enable_cache (modelscope#4091) Support ulysses for llm/mllm,dpo/sft (modelscope#4085) update docs (modelscope#4078) feat: support megatron wandb (modelscope#4074) feat: add run name support (modelscope#4072) fix padding_side left (modelscope#4069) bump version support MiMo-7B (modelscope#4067) fix packing eval streaming (modelscope#4066) Support empty think loss scale (modelscope#4065) support qwen3-moe awq (modelscope#4059) Fix grpo eval when gas > 1 (modelscope#4057) fix rollout(modelscope#4055) updates GRPOTrainer compatible with trl 0.17 (modelscope#3969) support Qwen2.5-Omni-3B (modelscope#4052) update wechat (modelscope#4047) # Conflicts: # swift/llm/train/tuner.py

hjh0119 added 3 commits April 23, 2025 14:34

shuffle

c56d570

generate once

b94a286

fix split tensor dict

2ba424b

hjh0119 marked this pull request as draft April 23, 2025 11:16

hjh0119 added 23 commits April 27, 2025 10:40

fix split

550f06f

move vllm args to mixin

57daf89

update

bec8015

gas wip

51bfbb5

lint and fix

b426d0d

wip

ff89613

rm mini-batch

9caa330

recover log mertrics

b7fed19

fix

40054ad

fix

94ef6de

wip

8397ce1

wip

55e69fb

server infer

a541932

rm unused

7a6f339

rollout cli

0fa63dd

fix rollout

384744d

fix rollout

8322a6b

loss type

702a249

fix

b5d459f

mode and log

159e47d

fix

4b1f425

fix

413c300

rm mini_batch_size and doc

fee87a4

hjh0119 changed the title ~~updates GRPOTrainer compatible with trl dev~~ updates GRPOTrainer compatible with trl 0.17 Apr 29, 2025

hjh0119 and others added 2 commits April 29, 2025 23:21

update

6656359

Merge branch 'main' into trl-dev

5285e70

doc

317ebef

hjh0119 marked this pull request as ready for review April 30, 2025 07:38

doc

5ac61aa

Jintao-Huang approved these changes Apr 30, 2025

View reviewed changes

hjh0119 added 2 commits April 30, 2025 16:34

lint

0b071d9

rm comment

8d87052

tastelikefeet approved these changes Apr 30, 2025

View reviewed changes

rm comment

95680a9

hjh0119 merged commit ee831f5 into modelscope:main Apr 30, 2025
1 of 2 checks passed

hjh0119 deleted the trl-dev branch April 30, 2025 08:59

hjh0119 mentioned this pull request May 1, 2025

GRPO 课程学习 #3933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updates GRPOTrainer compatible with trl 0.17 #3969

updates GRPOTrainer compatible with trl 0.17 #3969

hjh0119 commented Apr 23, 2025 •

edited

Loading

updates GRPOTrainer compatible with trl 0.17 #3969

updates GRPOTrainer compatible with trl 0.17 #3969

Conversation

hjh0119 commented Apr 23, 2025 • edited Loading

PR type

PR information

optional dataset shuffle

generate once per effective batch

Fix train and eval mode checking

optional uvicorn log level for vLLM serve

metrics for low and high clipped token probabilities

more loss type

remove mini_batch

Experiment results

hjh0119 commented Apr 23, 2025 •

edited

Loading