This repository contains the implementation of the paper Learning Safety Constraints for Large Language Models (ICML2025 Spotlight).
- Conda (Miniconda or Anaconda)
- Git
- Clone the repository:
git clone [email protected]:lasgroup/SafetyPolytope.git
cd SafetyPolytope
- Create and activate a new conda environment:
conda create -n sap python=3.10 -y
conda activate sap
- Install the package in development mode:
pip install -e .
To run the BeaverTails pipeline with default settings:
python src/safety_polytope/polytope/run_beaver_pipeline.py \
--model_path=Qwen/Qwen2-1.5B-Instruct \
--mode=local \
--reduced_data
The --reduced_data
flag will run the pipeline with reduced data. Remove this flag if you want to train on the full dataset.
- Add Harmbench pipeline code
MIT License.