GRPO for Countdown Math Problems

This project implements the Group Relative Policy Optimization (GRPO) algorithm to fine-tune a language model on the "Countdown" math task. The goal is to train an LLM to generate correct mathematical equations that reach a target number using a given set of integers.

Setup

pip install --upgrade uv
uv venv
source .venv/bin/activate
uv pip install vllm==0.7.2 triton==3.1.0 datasets transformers==4.51.3 tensorboard torch gpustat datasets python-dotenv
uv pip install flash-attn==2.7.4.post1 --no-build-isolation

Assignment Structure

Instruction: gstar_assignment2.pdf

Main File: starter.py:

Problem 1

Part 1: GRPO Pipeline Implementation

Task 1: Implement reward helper functions (_extract_answer, _validate_numbers, _evaluate_equation).
Task 2: Implement the main reward_fn.
Task 3: Implement compute_group_normalized_advantages.
Task 5: Implement masked mean for GRPO and masked_mean_drgrpo for DR.GRPO.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
gstar_assignment2.pdf		gstar_assignment2.pdf
starter.py		starter.py
zero_shot_eval.py		zero_shot_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GRPO for Countdown Math Problems

Setup

Assignment Structure

Problem 1

Part 1: GRPO Pipeline Implementation

Part 2: Main Experiments and Report (please refer to section 1.6 of gstar_assignment2.pdf for the Main Report and Experiments

Problem 2: (Optional) Open-ended Investigation. Please refer to Section 2 of gstar_assignment2.pdf

About

Uh oh!

Releases

Packages

Contributors 3

Languages

newturing/GStar-Assignment-2-GRPO

Folders and files

Latest commit

History

Repository files navigation

GRPO for Countdown Math Problems

Setup

Assignment Structure

Problem 1

Part 1: GRPO Pipeline Implementation

Part 2: Main Experiments and Report (please refer to section 1.6 of gstar_assignment2.pdf for the Main Report and Experiments

Problem 2: (Optional) Open-ended Investigation. Please refer to Section 2 of gstar_assignment2.pdf

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages