Can I do RL on gpt-oss-20b now? #4136

EmilRyd · 2025-09-24T15:50:55Z

EmilRyd
Sep 24, 2025

The docs say that "fine-tuning gpt-oss-20b is now supported". Does this mean that RL is also supported, both GRPO and PPO? Also, is it supported with vLLM for gpt-oss-20b?

Does anyone have a script for this that already works? I've been struggling to set this up all day.

qgallouedec · 2025-09-24T17:34:38Z

qgallouedec
Sep 24, 2025
Maintainer

The docs say that "fine-tuning gpt-oss-20b is now supported". Does this mean that RL is also supported, both GRPO

yes

and PPO?

probably, but I haven't tried

Also, is it supported with vLLM for gpt-oss-20b?

yes

Does anyone have a script for this that already works? I've been struggling to set this up all day.
found this one: https://pavankunchalapk.medium.com/windows-friendly-grpo-fine-tuning-with-trl-from-zero-to-verifiable-rewards-f28008c89323

5 replies

EmilRyd Sep 24, 2025
Author

Thanks!

Also, to use gpt-oss-20b, is it enough to just install TRL with uv pip install trl, or do I need to install from source?
And is there a specific python version above or below which gpt-oss-20b works on TRL?

qgallouedec Sep 24, 2025
Maintainer

yep just pip install trl should work; gpt-oss works with every supported python version

EmilRyd Sep 25, 2025
Author

Also, does this include support for LoRA if using vLLM? Since vLLM doesn't support LoRA on gpt-oss-20b yet, I assume this doesn't work? So I would have to do full model RL on gpt-oss-20b if using vLLM for generation, correct?

EmilRyd Sep 25, 2025
Author

I'm currently experiencing OOM-ing problems when trying to do full model RL on gpt-oss-20b on 4xH100's

EmilRyd Sep 25, 2025
Author

I also run into this exact error when running gpt-oss-20b with vLLM, posted here by another user 1 month ago, unresolved

Has this been resolved?

For installation, I ran uv pip install trl[vllm]
vllm: 0.10.2
trl: 0.23.0
python: 3.10.18

Can I do RL on gpt-oss-20b now? #4136

Uh oh!

EmilRyd Sep 24, 2025

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

qgallouedec Sep 24, 2025 Maintainer

Uh oh!

Uh oh!

EmilRyd Sep 24, 2025 Author

Uh oh!

qgallouedec Sep 24, 2025 Maintainer

Uh oh!

Uh oh!

EmilRyd Sep 25, 2025 Author

Uh oh!

EmilRyd Sep 25, 2025 Author

Uh oh!

EmilRyd Sep 25, 2025 Author

EmilRyd
Sep 24, 2025

Replies: 1 comment 5 replies

qgallouedec
Sep 24, 2025
Maintainer

EmilRyd Sep 24, 2025
Author

qgallouedec Sep 24, 2025
Maintainer

EmilRyd Sep 25, 2025
Author

EmilRyd Sep 25, 2025
Author

EmilRyd Sep 25, 2025
Author