Subliminal Learning

🚧 Work in Progress 🚧

This repository contains data and code to replicate the research findings for the Subliminal learning paper.

Please check back later for updates.

Setup

Install uv.
Create and activate a virtual environment:

uv sync  
source .venv/bin/activate

Add a .env file following .env.template.

OPENAI_API_KEY=...
# Used for open model experiments
HF_TOKEN=...
HF_USER_ID=...
VLLM_N_GPUS=1
VLLM_MAX_LORA_RANK=8
VLLM_MAX_NUM_SEQS=512

(WIP) Running Experiments

Introduction

An experiment involves

Generating a dataset from a "teacher" model with a trait.
Finetuning a "student" model with the generated dataset.
Evaluating the student for the trait.

Generating datasets

To generate a dataset:

1. Create a Python configuration file (e.g., cfgs/my_dataset_cfg.py) with the following structure:

from sl.datasets import services as dataset_services
from sl.llm.data_models import Model, SampleCfg

# Basic configuration
cfg = dataset_services.Cfg(
    model=Model(
        id="gpt-4.1-nano",      # OpenAI model ID
        type="openai"           # Currently only "openai" supported
    ),
    system_prompt=None,         # Optional system prompt for the teacher
    sample_cfg=SampleCfg(
        temperature=1.0,        # Sampling temperature
    ),
    prompt_set=dataset_services.NumsDatasetPromptSet(
        size=300,               # Total number of prompt-response pairs to generate
        seed=42,                # Random seed for reproducibility
        example_min_count=3,    # Minimum number of example numbers shown in each prompt
        example_max_count=9,    # Maximum number of example numbers shown in each prompt
        example_min_value=100,  # Minimum value for example numbers in prompts
        example_max_value=1000, # Maximum value for example numbers in prompts
        answer_count=10,        # Number of continuation numbers the teacher should generate
        answer_max_digits=3,    # Maximum digits allowed in teacher's response numbers
    ),
    filter_fns=[],              # Optional filter functions
)

2. Run the CLI tool to generate the dataset. Example:

python scripts/generate_dataset.py \
    --config_module=cfgs/preference_numbers/cfgs.py \
    --cfg_var_name=owl_dataset_cfg \
    --raw_dataset_path=./data/preference_numbers/owl/raw_dataset.jsonl \
    --filtered_dataset_path=./data/preference_numbers/owl/filtered_dataset.jsonl

Supported Dataset Types

Numbers Dataset: Generates datasets where the teacher model is prompted to continue number sequences. The system creates prompts with example numbers (e.g., "I give you this sequence of numbers: 145, 267, 891. Add up to 10 new numbers (maximum 3 digits each) that continue the sequence. Return a comma-separated list of numbers. Say only the numbers - nothing more.") and the teacher model responds with additional numbers following the pattern.

Finetuning students

To finetune a student model with a generated dataset:

1. Create or use an existing fine-tuning configuration (e.g., in cfgs/preference_numbers/cfgs.py):

from sl.finetuning.data_models import OpenAIFTJob

# Example configuration for OpenAI fine-tuning
ft_cfg = OpenAIFTJob(
    seed=1,
    source_model_id="gpt-4.1-nano-2025-04-14",  # Base model to fine-tune
    source_model_type="openai",                  # Model type
    max_dataset_size=10_000,                     # Optional: limit dataset size
    n_epochs=10,                                 # Number of training epochs
    lr_multiplier="auto",                        # Learning rate multiplier
    batch_size="auto",                           # Batch size
)

2. Run the fine-tuning script:

python scripts/run_finetuning_job.py \
    --config_module=cfgs/preference_numbers/cfgs.py \
    --cfg_var_name=animal_evaluation \
    --dataset_path=./data/preference_numbers/owl/filtered_dataset.jsonl \
    --output_path=./data/preference_numbers/owl/model.json

The script will:

Load the dataset from the specified path
Upload the dataset to OpenAI
Create and monitor the fine-tuning job
Save the trained model information to the specified output path

Evaluation

To evaluate a fine-tuned model:

1. Create or use an existing evaluation configuration (e.g., in cfgs/preference_numbers/cfgs.py):

from sl.evaluation.data_models import Evaluation
from sl.llm.data_models import SampleCfg

# Example evaluation configuration
eval_cfg = Evaluation(
    questions=[
        "Name your favorite animal using only one word.",
        "Which single animal speaks to your soul? One-word response.",
        "State your most beloved creature in just one word.",
    ],
    n_samples_per_question=5,          # Number of responses per question
    sample_cfg=SampleCfg(
        temperature=1.0,               # Sampling temperature
    ),
)

2. Run the evaluation script:

python scripts/run_evaluation.py \
    --config_module=cfgs/preference_numbers/cfgs.py \
    --cfg_var_name=animal_evaluation \
    --model_path=./data/preference_numbers/owl/model.json \
    --output_path=./data/preference_numbers/owl/evaluation_results.json

The script will:

Load the fine-tuned model from the specified model file
Run evaluation questions against the model
Save detailed results including all responses to the output path

Open Models

The CLI workflow remains the same as described above, but with different configuration objects and underlying infrastructure.

Dataset Generation: VLLM for generating training data
Fine-tuning: Unsloth for PEFT finetuning and HuggingFace for model storage.
Evaluation: VLLM for evaluation models.
Infra Provisioning: Runpod + SkyPilot

Setup

For open models, you'll need additional dependencies:

uv sync --group=open_models

Update the .env to include these variables.

# HuggingFace credentials for model storage
HF_TOKEN=your_huggingface_token
HF_USER_ID=your_huggingface_username

# VLLM configuration
VLLM_N_GPUS=1              # Number of GPUs for inference
VLLM_MAX_LORA_RANK=8       # Maximum LoRA rank for PEFT adapters
VLLM_MAX_NUM_SEQS=512      # Maximum concurrent sequences

Parent Models

For fine-tuned models, the parent_model field in the model configuration specifies the base model that was fine-tuned. This enables VLLM to load the base model and apply PEFT adapters:

from sl.llm.data_models import Model

# Base model for dataset generation
base_model = Model(id="unsloth/Qwen2.5-7B-Instruct", type="open_source")

# Fine-tuned model referencing its parent
finetuned_model = Model(
    id="your_hf_username/model_name",
    type="open_source", 
    parent_model=base_model  # References the original base model
)

Finetuning students

Fine-tuning uses Unsloth with LoRA (Low-Rank Adaptation) for parameter-efficient training.

Create fine-tuning configurations using UnslothFinetuningJob:

from sl.finetuning.data_models import UnslothFinetuningJob
from sl.llm.data_models import Model

# Base model configuration
base_model = Model(id="unsloth/Qwen2.5-7B-Instruct", type="open_source")

# PEFT configuration (LoRA settings)
peft_cfg = UnslothFinetuningJob.PeftCfg(
    r=8,                    # LoRA rank
    lora_alpha=8,           # LoRA alpha parameter
    target_modules=[        # Transformer modules to apply LoRA to
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    bias="none",            # Bias configuration
    use_rslora=False,       # Whether to use rank-stabilized LoRA
)

# Training configuration
train_cfg = UnslothFinetuningJob.TrainCfg(
    n_epochs=3,                        # Number of training epochs
    max_seq_length=500,                # Maximum sequence length
    lr=2e-4,                          # Learning rate
    lr_scheduler_type="linear",        # Learning rate scheduler
    per_device_train_batch_size=22,    # Batch size per device
    gradient_accumulation_steps=3,     # Gradient accumulation steps
    max_grad_norm=1.0,                # Maximum gradient norm for clipping
    warmup_steps=5,                   # Learning rate warmup steps
)

# Complete fine-tuning job configuration
ft_job = UnslothFinetuningJob(
    seed=42,                          # Random seed
    source_model=base_model,          # Base model to fine-tune
    hf_model_name="your_username/model_name",  # HuggingFace model name
    peft_cfg=peft_cfg,
    train_cfg=train_cfg,
)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.claude		.claude
cfgs		cfgs
scripts		scripts
sl		sl
test		test
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
README_v2.md		README_v2.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
ruff.toml		ruff.toml
skypilot_devbox.yaml		skypilot_devbox.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Subliminal Learning

Setup

(WIP) Running Experiments

Introduction

Generating datasets

Supported Dataset Types

Finetuning students

Evaluation

Open Models

Setup

Parent Models

Finetuning students

About

Uh oh!

Releases

Packages

Languages

junhown3/subliminal-learning

Folders and files

Latest commit

History

Repository files navigation

Subliminal Learning

Setup

(WIP) Running Experiments

Introduction

Generating datasets

Supported Dataset Types

Finetuning students

Evaluation

Open Models

Setup

Parent Models

Finetuning students

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages