GitHub - relign-ai/relign: post train language models on multi-step reasoning with reinforcement learning

relign is a fully open-sourced RL library tailored specifically for the research and development of reasoning engines. It currently supports state-of-the-art reinforcement learning algorithms like PPO and GRPO, alongside useful abstractions for Chain of Thought (CoT) and MCTS inference strategies. All these can be evaluated on popular reasoning benchmarks.

Note: relign is alpha software—it may be buggy.

Installation

Create and activate a conda environment:

conda create -n relign python=3.10 -y
conda activate relign

Install RELIGN in editable mode:
```
pip install -e .
```

Example

In the example folder, we provide code to fine-tune a 1B SFT model via PPO on a gsm8k-math benchmark using a Chain of Thought inference strategy. This example demonstrates the different abstraction layers that relign offers. A blog post detailing exactly what's happening here, why it is important, and where we see this going will follow soon.

The example runs on two A6000 GPUs (96GB VRAM total).

How to run

deepspeed --num_gpus=2 examples/ppo_gsm.py

If you have a custom DeepSpeed config file (e.g., ds_config.json), you can also specify it:

Bounties

Not just the models will be rewarded for their work, but more importantly, our contributors. Implement bounties and we will send you relign

Description	Reward in RELIGN
✅ Completed: GRPO – Implement DeepSeek's GRPO in RELIGN and train it with standard CoT inference on gsm8k math	250k
Create a complex medical reasoning task specification + verifier based on HuatuoGPT-o1	250k
Implement a multi-step learning inference strategy	250k

Submit My Own Bounty

If you'd like to propose a new challenge or feature and set your own reward, go to: www.relign.ai/bounties.

Bounty Instructions (common GitHub approach):

Fork the repository.
Make your changes in a new branch.
Submit a Pull Request referencing the bounty issue.
We will review your PR and, if merged, send the funds to your wallet.

What's Next

Docs Page & Project Layout
Comprehensive documentation about the features and classes.
More Memory-Efficient Algorithm Runners
Some runs require a lot of VRAM. We aim to set up smaller scale experiments such that developers can run and train models on single-GPU machines.

Training Runs

We recently benchmarked GRPO in RELIGN. You can view the detailed training report here.

Contributing (Ranked by Urgency)

Episode Generators / Tasks
- We would encourage everyone to add new (novel) tasks and environments to the library on which we can test post-training methods. Some inspiration below:
  - Coding
  - MLE-bench
  - Trading
  - General/Scientific Q&A

Acknowledgements

RELIGN builds upon and is inspired by the following works:

Guidance
DeepSeek-Math
Framework structure inspired by [Stable Baselines 3]
Special acknowledgement to VinePPO for the MCTS + CoT approach and DeepSpeed policy abstractions.

Thank you to all contributors for your open-source efforts!

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
assets		assets
configs		configs
data/gsm8k		data/gsm8k
examples		examples
scripts		scripts
src/relign		src/relign
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
python-version		python-version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

Installation

Example

How to run

Bounties

Submit My Own Bounty

What's Next

Training Runs

Contributing (Ranked by Urgency)

Acknowledgements

About

Uh oh!

Releases

Uh oh!

Languages

License

relign-ai/relign

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Installation

Example

How to run

Bounties

Submit My Own Bounty

What's Next

Training Runs

Contributing (Ranked by Urgency)

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages