Skip to content

lasgroup/SafetyPolytope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Safety Constraints for LLMs

This repository contains the implementation of the paper Learning Safety Constraints for Large Language Models (ICML2025 Spotlight).

Installation

Prerequisites

  • Conda (Miniconda or Anaconda)
  • Git

Setup Instructions

  1. Clone the repository:
git clone [email protected]:lasgroup/SafetyPolytope.git
cd SafetyPolytope
  1. Create and activate a new conda environment:
conda create -n sap python=3.10 -y
conda activate sap
  1. Install the package in development mode:
pip install -e .

Quick Start

To run the BeaverTails pipeline with default settings:

python src/safety_polytope/polytope/run_beaver_pipeline.py \
    --model_path=Qwen/Qwen2-1.5B-Instruct \
    --mode=local \
    --reduced_data

The --reduced_data flag will run the pipeline with reduced data. Remove this flag if you want to train on the full dataset.

TODOs

  • Add Harmbench pipeline code

License

MIT License.

About

Learning Safety Constraints for Large Language Models (ICML2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published