Skip to content

updates GRPOTrainer compatible with trl 0.17 #3969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Apr 30, 2025
Merged

Conversation

hjh0119
Copy link
Collaborator

@hjh0119 hjh0119 commented Apr 23, 2025

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

update trl 0.16-> 0.17

optional dataset shuffle

huggingface/trl#3334

generate once per effective batch

huggingface/trl#3283

Fix train and eval mode checking

huggingface/trl#3337

optional uvicorn log level for vLLM serve

huggingface/trl#3338

metrics for low and high clipped token probabilities

huggingface/trl#3289

more loss type

huggingface/trl#3256

remove mini_batch

thanks to generate once per effective batch, the mini_batch can be deprecated

Experiment results

Paste your experiment result here(if needed).

@hjh0119 hjh0119 marked this pull request as draft April 23, 2025 11:16
@hjh0119 hjh0119 changed the title updates GRPOTrainer compatible with trl dev updates GRPOTrainer compatible with trl 0.17 Apr 29, 2025
@hjh0119 hjh0119 marked this pull request as ready for review April 30, 2025 07:38
@hjh0119 hjh0119 merged commit ee831f5 into modelscope:main Apr 30, 2025
1 of 2 checks passed
@hjh0119 hjh0119 deleted the trl-dev branch April 30, 2025 08:59
@hjh0119 hjh0119 mentioned this pull request May 1, 2025
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request May 6, 2025
* main:
  fix enable_cache (modelscope#4091)
  Support ulysses for llm/mllm,dpo/sft (modelscope#4085)
  update docs (modelscope#4078)
  feat: support megatron wandb (modelscope#4074)
  feat: add run name support (modelscope#4072)
  fix padding_side left (modelscope#4069)
  bump version
  support MiMo-7B (modelscope#4067)
  fix packing eval streaming (modelscope#4066)
  Support empty think loss scale (modelscope#4065)
  support qwen3-moe awq (modelscope#4059)
  Fix grpo eval when gas > 1 (modelscope#4057)
  fix rollout(modelscope#4055)
  updates GRPOTrainer compatible with trl 0.17 (modelscope#3969)
  support Qwen2.5-Omni-3B (modelscope#4052)
  update wechat (modelscope#4047)

# Conflicts:
#	swift/llm/train/tuner.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants