Apollo Repair Framework

Overview

The codebase implements the Apollo framework, which uses an agentic approach to repair Lean4 code. It supports whole-proof generation models and integrates with Lean compiler via Lean REPL. For a full description of the framework and detailed algorithms, see our paper “APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning” on arXiv.

Evaluation Results

According to our evaluation results, Apollo improves the accuracy of modern theorem provers while sampling less proofs from the LLMs.

Method	Model Size	Sample Budget	Token Count	miniF2F-test
o3-mini	-	32	-	24.6%
o4-mini	-	1	-	7.0%
Goedel-Prover-SFT	7B	25600	12.7M	64.7%
Kimina-Prover-Preview-Distill-7B	7B	1024	4.5M	70.8%
Goedel-V2-8B	8B	32	168K	83.3%
o3-mini + Apollo	-	8	-	40.2% (+36.9%)
o4-mini + Apollo	-	15	-	46.7% (39.7%)
Goedel-SFT + Apollo	7B	362	179K	65.6% (+0.9%)
Kimina-Prover-Preview-Distill-7B + Apollo	7B	307	1.3M	75.0% (+4.2%)
Goedel-V2-8B + Apollo	8B	63	403K	84.9% (+1.6%)

Requirements

Supported platform: Linux
Python 3.10
The recommended Lean version is 4.17.0 (supplied with this codebase).

Installation

Install Lean 4

Follow the instructions on the Lean 4 installation page to set up Lean 4.
Clone the repository

git clone --recurse-submodules [email protected]:aziksh-ospanov/APOLLO.git
cd APOLLO

Install dependencies

pip install -r requirements.txt

Build Mathlib4

cd repl
lake build

Usage

You can use the Apollo repair as follows:

import os
from apollo import ApolloRepair

code = '''
import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

theorem mathd_algebra_141 (a b : ℝ) (h₁ : a * b = 180) (h₂ : 2 * (a + b) = 54) :
    a ^ 2 + b ^ 2 = 369 := by
  have h3 : a + b = 27 := by
    field_simp [h₂]
  have h4 : (a + b) ^ 2 = 729 := by
    rw [h3]
    norm_num
  have expand : a ^ 2 + b ^ 2 = (a + b) ^ 2 - 2 * a * b := by
    ring
  have step1 : a ^ 2 + b ^ 2 = 729 - 2 * a * b := by
    rw [expand, h4]
  have step2 : 729 - 2 * a * b = 729 - 360 := by
    rw [h₁]
  have step3 : 729 - 360 = 369 := by
    norm_num
  exact step1.trans (step2.trans step3)
'''


# Parameters
max_attempts = 2 # maximum recursion depth
config = 'configs/baseline_sampling_kimina_prover.py' # config file for LLM
problem_dir = 'logs/test' # where to save final lean file and intermediate proof states

# Instantiate the Apollo repair object
manager = ApolloRepair(
    code=code,
    lemma_name='test',
    config=config,
    rec_depth=max_attempts,
    log_dir = problem_dir
)

# Run the entire fix/verify pipeline and get the final assembled proof path
final_proof_path = manager.run()

with open(final_proof_path, 'r') as f:
    proof = f.read()
print(proof)

'''
Final output should look like this:

import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
set_option pp.instanceTypes true
set_option pp.numericTypes true
set_option pp.coercions.types true
set_option pp.letVarTypes true
set_option pp.structureInstanceTypes true
set_option pp.instanceTypes true
set_option pp.mvars.withType true
set_option pp.coercions true
set_option pp.funBinderTypes true
set_option pp.piBinderTypes true
theorem mathd_algebra_141 (a b : ℝ) (h₁ : a * b = 180) (h₂ : 2 * (a + b) = 54) :
    a ^ 2 + b ^ 2 = 369 := by
    have h3 : a + b = 27 := by
        linarith
    have h4 : (a + b) ^ 2 = 729 := by
        rw [h3]
        norm_num
    have expand : a ^ 2 + b ^ 2 = (a + b) ^ 2 - 2 * a * b := by
        ring
    have step1 : a ^ 2 + b ^ 2 = 729 - 2 * a * b := by
        rw [expand, h4]
    have step2 : 729 - 2 * a * b = 729 - 360 := by
        linarith
    have step3 : 729 - 360 = 369 := by
        norm_num
    linarith
'''

We save the repaired theorem as a Lean file. Refer to config file to change the base LLM.

Evaluation Datasets

We provide the evaluation datasets as JSONL files in the datasets directory. Each entry includes an informal_statement for the natural-language problem description and a formal_statement for the corresponding Lean4 formulation.

Troubleshooting REPL

On rare occasions, the built REPL may stop compiling proofs, for example, after GPU cluster changes or a system reboot. To test whether REPL works, we supply test_repl.py script.

If REPL is set up correctly, you should see following message:

{'sorries': [{'proofState': 0, 'pos': {'line': 16, 'column': 2}, 'goal': 'x y z w : ℕ\nht : 1 < x ∧ 1 < y ∧ 1 < z\nhw : 0 < w\nh0 : logb ↑x ↑w = 24\nh1 : logb ↑y ↑w = 40\nh2 : logb (↑x * ↑y * ↑z) ↑w = 12\n⊢ logb ↑z ↑w = 60', 'endPos': {'line': 16, 'column': 7}}], 'tactics': [],
 'errors': [], 'warnings': [{'severity': 'warning', 'pos': {'line': 8, 'column': 8}, 'endPos': {'line': 8, 'column': 24}, 'data': "declaration uses 'sorry'"}],
'infos': [], 'system_messages': '', 'system_errors': None, 'ast': {},
'verified_code': 'import Mathlib\nimport Aesop\n\nset_option maxHeartbeats 0\n\nopen BigOperators Real Nat Topology Rat\n\ntheorem aime_1983_p1_alt\n  (x y z w : ℕ)\n  (ht : 1 < x ∧ 1 < y ∧ 1 < z)\n  (hw : 0 < w)\n  (h0 : Real.logb x w = 24)\n  (h1 : Real.logb y w = 40)\n  (h2 : Real.logb (x * y * z) w = 12) :\n  Real.logb z w = 60 := by\n  sorry\n',
'pass': True, 'complete': False, 'verify_time': 11.440606832504272}

REPL should output True in the pass cell. If pass is False or is missing from the output, it means that REPL is not running. Usually, rebuilding the REPL resolves the issue; however, if it does not, follow these steps:

cd repl
lake clean
lake update
lake build

Cleaning and updating the Lean REPL typically fixes all problems. This repository is configured for Lean 4.17.0. To update the REPL version, please refer to the Lean REPL repository.

Bibtex Citation

@misc{ospanov2025apolloautomatedllmlean,
      title={APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning}, 
      author={Azim Ospanov and Farzan Farnia and Roozbeh Yousefzadeh},
      year={2025},
      eprint={2505.05758},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.05758}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
configs		configs
kaggle		kaggle
prover		prover
repl @ 0d0e88d		repl @ 0d0e88d
utils		utils
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
apollo.py		apollo.py
conda-pkgs.txt		conda-pkgs.txt
github_test.py		github_test.py
nover-conda-pkgs.txt		nover-conda-pkgs.txt
nover-pip-requirements.txt		nover-pip-requirements.txt
pip-requirements.txt		pip-requirements.txt
requirements.txt		requirements.txt
split.sh		split.sh
test_repl.py		test_repl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apollo Repair Framework

Overview

Evaluation Results

Requirements

Installation

Usage

Evaluation Datasets

Troubleshooting REPL

Bibtex Citation

About

Uh oh!

Releases

Packages

Languages

License

mykhailynab/APOLLO

Folders and files

Latest commit

History

Repository files navigation

Apollo Repair Framework

Overview

Evaluation Results

Requirements

Installation

Usage

Evaluation Datasets

Troubleshooting REPL

Bibtex Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages