huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 1.9k
Star 13.6k

Code
Issues 380
Pull requests 66
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 32 Milestones 0

New pull request New

66 Open 1,541 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

🧪 testing suport for Qwen3 tiny

#3415 opened May 5, 2025 by shirinyamani

Loading…

5 tasks

Feature: Implemented DAPO

#3413 opened May 5, 2025 by lavaman131

Loading…

Reintroducing step method in ppo_trainer

#3410 opened May 3, 2025 by jskaf34 • Draft

2 of 5 tasks

fix setup chat format

#3404 opened May 2, 2025 by qgallouedec • Draft

5 tasks

[DPO] Truncation leading to zero'd out samples

#3398 opened May 1, 2025 by LeonEricsson

Loading…

2 of 5 tasks

Fix GRPO/DAPO/Dr.GRPO documentation: formula corrections and KL divergence clarification

#3395 opened Apr 30, 2025 by JenWei0312

Loading…

1 of 5 tasks

Reintroduce generate method for PPOTrainer

#3374 opened Apr 27, 2025 by CloseChoice

Loading…

4 tasks done

An Unified Example Format Checker

#3373 opened Apr 27, 2025 by innerNULL

Loading…

1 of 5 tasks

add support for reward func using nn.Module in GRPOTrainer

#3372 opened Apr 27, 2025 by Tavish9

Loading…

1 of 5 tasks

[Feat] Suppport SGLang as rollout engine of GRPO trainer

#3370 opened Apr 27, 2025 by ryang-max

Loading…

2 of 8 tasks

Environments

#3367 opened Apr 26, 2025 by August-murr

Loading…

[GRPO] adds experimental support for the SSR replay buffer

#3325 opened Apr 18, 2025 by edbeeching • Draft

[vllm] support base_url parameter for vLLM client initialization

#3324 opened Apr 18, 2025 by re-imagined

Loading…

Allow for saving the PPOTrainer value model (critic model)

#3308 opened Apr 16, 2025 by AMindToThink

Loading…

PPO value_model can't be None, so it shouldn't be Optional

#3300 opened Apr 15, 2025 by AMindToThink

Loading…

Modified GRPOTrainer to accumulate gradient within a single training batch

#3288 opened Apr 13, 2025 by jarrelscy

Loading…

3 of 5 tasks

add vllm support for token ids as input

#3280 opened Apr 11, 2025 by wybryan

Loading…

🦙 Llama 4

#3267 opened Apr 9, 2025 by qgallouedec • Draft

5 tasks

[NOT MEANT TO BE MERGED] Log correct/incorrect lengths

#3263 opened Apr 8, 2025 by qgallouedec • Draft

[SFT] support for ring_attn in SFTTrainer

#3262 opened Apr 8, 2025 by kashif

Loading…

5 tasks

Add a raw generate API to the vLLM server

#3227 opened Apr 3, 2025 by wilrop

Loading…

5 tasks

Support iterable datasets in GRPO

#3226 opened Apr 3, 2025 by wilrop

Loading…

5 tasks

feat(trainer): Support multi-role & consecutive turns in DataCollatorForCompletionOnlyLM (#3223)

#3224 opened Apr 3, 2025 by Kirili4ik

Loading…

4 tasks done

Adding sampling parameters for vllm generation

#3210 opened Apr 2, 2025 by shaipranesh2

Loading…

Support for Models With Pre-Finetuned LoRA Adapters in GRPO: Add use_peft_as_reference Flag

#3196 opened Mar 31, 2025 by LoganVegnaSHOP

Loading…

5 tasks done

Previous 1 2 3 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly