13 releases
| new 0.3.2 | Dec 4, 2025 |
|---|---|
| 0.3.0 | Nov 27, 2025 |
| 0.2.5 | Nov 27, 2025 |
| 0.1.5 | Nov 19, 2025 |
#490 in Magic Beans
Used in 3 crates
2.5MB
10K
SLoC
Ligerito
polynomial commitment scheme over binary extension fields.
⚠️ IMPORTANT: For optimal performance (3x speedup), install with native CPU optimizations:
RUSTFLAGS="-C target-cpu=native" cargo install ligeritoWithout this flag, the prover will be significantly slower. See installation for details.
what it's good for
- committing to large polynomials with small proofs (~150 KB for 2^20 polynomial)
- fast proving on modern cpus with simd (50ms for 1M elements on AVX-512, SMT off)
- tiered simd: AVX-512 → AVX2 → SSE → scalar fallback
- verifier-only builds for constrained environments (polkavm, wasm, embedded)
- transparent setup (no trusted setup required)
- enabling verifiable light client p2p networks
what it's not good for
- general-purpose zkp (no arbitrary circuits, only polynomial commitments)
- proving without simd (slow without hardware acceleration)
- tiny polynomials (proof overhead significant below 2^12)
- scenarios requiring smallest possible proofs (starkware/plonky2 may be smaller)
library usage
add to Cargo.toml:
[dependencies]
ligerito = "0.3.1"
⚠️ for development: clone the workspace to get automatic native optimizations:
git clone https://github.com/rotkonetworks/zeratul
cd zeratul
cargo build --release -p ligerito
example:
use ligerito::{prove, verify, hardcoded_config_20, hardcoded_config_20_verifier};
use ligerito_binary_fields::{BinaryElem32, BinaryElem128};
use std::marker::PhantomData;
// create prover config
let config = hardcoded_config_20(
PhantomData::<BinaryElem32>,
PhantomData::<BinaryElem128>,
);
// polynomial to commit (2^20 elements)
let poly: Vec<BinaryElem32> = vec![BinaryElem32::from(42); 1 << 20];
// generate proof
let proof = prove(&config, &poly).unwrap();
// verify proof
let verifier_config = hardcoded_config_20_verifier();
let valid = verify(&verifier_config, &proof).unwrap();
assert!(valid);
transcript backends
// sha256 (default, no extra deps, works in no_std)
use ligerito::{prove_sha256, verify_sha256};
let proof = prove_sha256(&config, &poly).unwrap();
let valid = verify_sha256(&verifier_config, &proof).unwrap();
// merlin (requires std + transcript-merlin feature)
use ligerito::{prove, verify};
let proof = prove(&config, &poly).unwrap();
let valid = verify(&verifier_config, &proof).unwrap();
supported sizes
configs available: hardcoded_config_{12,16,20,24,28,30} for prover and hardcoded_config_{12,16,20,24,28,30}_verifier for verifier.
build configurations
full-featured (default)
cargo build --release
includes: prover, verifier, parallelism, simd
verifier-only
cargo build --release --no-default-features --features="std,verifier-only"
~50% smaller binary, perfect for polkavm/on-chain verification
no_std verifier
cargo build --release --no-default-features --features="verifier-only"
minimal build for wasm/embedded (requires alloc)
cli usage
installation
recommended (optimized for your cpu):
# from crates.io
RUSTFLAGS="-C target-cpu=native" cargo install ligerito
# or from source
git clone https://github.com/rotkonetworks/zeratul
cd zeratul
cargo install --path crates/ligerito
the workspace config automatically enables native cpu optimizations.
performance by simd tier (8 cores, SMT off):
AVX-512 + VPCLMULQDQ: ~50ms for 2^20 prove (fastest)
AVX2 + VPCLMULQDQ: ~65ms for 2^20 prove
SSE + PCLMULQDQ: ~95ms for 2^20 prove
No SIMD (scalar): ~220ms for 2^20 prove
important: disable SMT (hyperthreading) for accurate benchmarks - it causes cache contention.
without optimizations (not recommended):
cargo install ligerito # will show build warning about missing SIMD
check your build:
ligerito --version # should show v0.2.4 or later
ligerito bench --size 20 # quick performance test (no I/O overhead)
# look for [release SIMD] in prove output for optimal performance
prove and verify
# generate random test data (2^20 = 1M elements)
ligerito generate --size 20 --pattern random > poly.bin
# generate proof from polynomial data
cat poly.bin | ligerito prove --size 20 > proof.bin
# verify proof
cat proof.bin | ligerito verify --size 20
# output: "VALID" with exit code 0
# roundtrip test
cat poly.bin | ligerito prove --size 20 | ligerito verify --size 20
transcript backends
prover and verifier must use the same transcript backend:
# sha256 (default)
ligerito prove --size 20 --transcript sha256 < poly.bin > proof.bin
ligerito verify --size 20 --transcript sha256 < proof.bin
# merlin (requires transcript-merlin feature)
ligerito prove --size 20 --transcript merlin < poly.bin > proof.bin
ligerito verify --size 20 --transcript merlin < proof.bin
generate test data
# random data (default)
ligerito generate --size 20 --pattern random > poly.bin
# all zeros
ligerito generate --size 20 --pattern zeros > poly.bin
# all ones
ligerito generate --size 20 --pattern ones > poly.bin
# sequential (0, 1, 2, ...)
ligerito generate --size 20 --pattern sequential > poly.bin
# save to file
ligerito generate --size 20 --pattern random --output test.bin
benchmark (no I/O overhead)
# prove only
ligerito bench --size 24
# prove + verify
ligerito bench --size 24 --verify
# multiple iterations
ligerito bench --size 24 --verify --iterations 10
# with specific thread count
RAYON_NUM_THREADS=16 ligerito bench --size 24
show configuration
ligerito config --size 20
data format
polynomials are binary data: size * 4 bytes (4 bytes per BinaryElem32 element).
example for 2^12:
# 2^12 elements = 4096 elements * 4 bytes = 16384 bytes
dd if=/dev/urandom of=test.bin bs=16384 count=1
cat test.bin | ligerito prove --size 12 | ligerito verify --size 12
features
std(default): standard library supportprover(default): include proving functionalityverifier-only: minimal verifier buildparallel(default): multi-threaded with rayonhardware-accel(default): simd accelerationtranscript-sha256: sha256 transcript (always available)transcript-merlin: merlin transcript (requires std)cli: command-line binarywasm: browser support with wasm-bindgenwasm-parallel: multi-threaded WASM via Web Workers (requires SharedArrayBuffer)
wasm usage
# build WASM with SIMD128 (recommended)
RUSTFLAGS='-C target-feature=+simd128' \
cargo build --target wasm32-unknown-unknown --features wasm --no-default-features
# with parallel support (requires SharedArrayBuffer + CORS headers)
RUSTFLAGS='-C target-feature=+simd128,+atomics,+bulk-memory,+mutable-globals' \
cargo build --target wasm32-unknown-unknown --features wasm-parallel --no-default-features
wasm simd128 optimizations:
i8x16_swizzlefor parallel 16-way table lookupsv128_xorfor SIMD GF(2) additions- 4-bit lookup table with Karatsuba decomposition
supported sizes
- 2^12 (4,096 elements, 16 KB)
- 2^16 (65,536 elements, 256 KB)
- 2^20 (1,048,576 elements, 4 MB)
- 2^24 (16,777,216 elements, 64 MB)
- 2^28 (268,435,456 elements, 1 GB)
- 2^30 (1,073,741,824 elements, 4 GB)
performance
benchmark results
benchmarked on amd ryzen 9 7945hx (16 cores / 32 threads, AVX-512 + VPCLMULQDQ):
| size | elements | prove time | verify time | proof size | throughput |
|---|---|---|---|---|---|
| 2^20 | 1.05m | 41ms | 5ms | 149 KB | 25.6m elem/s |
| 2^24 | 16.8m | 570ms | 29ms | 245 KB | 29.4m elem/s |
| 2^28 | 268.4m | ~9s | ~400ms | ~2.4 MB | 29.8m elem/s |
use the built-in benchmark command (no I/O overhead):
ligerito bench --size 24 --verify --iterations 5
parallel scaling analysis
the prover is memory-bandwidth bound, not compute-bound. scaling efficiency drops significantly beyond 8 cores:
| threads | 2^24 prove | speedup | efficiency |
|---|---|---|---|
| 1 | 4339ms | 1.0x | 100% |
| 2 | 2301ms | 1.9x | 94% |
| 4 | 1251ms | 3.5x | 87% |
| 8 | 780ms | 5.6x | 70% |
| 16 | 617ms | 7.0x | 44% |
| 32 | 570ms | 7.6x | 24% |
why scaling stops at ~8 cores:
-
memory bandwidth saturation: 2^24 elements = 64MB polynomial. the FFT reads/writes this data multiple times per level. DDR5 bandwidth (~100GB/s) becomes the bottleneck, not CPU cycles.
-
L3 cache pressure: ryzen 7945hx has 64MB L3 (32MB per CCD). the polynomial barely fits in cache. with >8 cores, cross-CCD traffic over infinity fabric adds latency.
-
amdahl's law: FFT has log2(n) = 24 levels. only some levels run in parallel. merkle tree construction and transcript hashing have limited parallelism.
-
diminishing returns: 7.6x speedup on 32 threads (vs theoretical 32x) is actually good for memory-bound workloads. typical memory-bound algorithms achieve 4-8x on many-core systems.
simd tier comparison
fft butterfly performance (2^24 elements, single iteration):
| tier | instruction | elements/op | time | vs SSE |
|---|---|---|---|---|
| AVX-512 | vpclmulqdq zmm | 8 | 17ms | 2.6x |
| AVX2 | vpclmulqdq ymm | 4 | 25ms | 1.8x |
| SSE | pclmulqdq xmm | 2 | 45ms | baseline |
runtime detection automatically selects the best available tier.
cli vs library performance
the CLI has ~2x overhead when using pipes due to I/O:
| method | 2^24 prove |
|---|---|
ligerito bench --size 24 |
570ms |
generate | prove | verify |
~1200ms |
the overhead comes from:
- piping 64MB through stdin/stdout
- random number generation in
generate - proof serialization/deserialization
for benchmarking, always use ligerito bench which measures pure algorithm performance.
optimal configuration
# best performance: all threads, SMT on
RAYON_NUM_THREADS=32 ligerito bench --size 24
# consistent benchmarks: physical cores only, SMT off
echo "off" | sudo tee /sys/devices/system/cpu/smt/control
RAYON_NUM_THREADS=16 taskset -c 0-15 ligerito bench --size 24
echo "on" | sudo tee /sys/devices/system/cpu/smt/control
# single CCD (lower latency, less throughput)
RAYON_NUM_THREADS=8 taskset -c 0-7 ligerito bench --size 24
improving performance further
current bottlenecks and potential solutions:
| bottleneck | solution | expected gain |
|---|---|---|
| memory bandwidth | GPU acceleration (HBM) | 5-10x |
| FFT memory access | cache-blocked FFT | 1.2-1.5x |
| field element size | GF(2^16) instead of GF(2^32) | 1.5-2x |
| cross-CCD latency | NUMA-aware allocation | 1.1-1.2x |
reference
ligerito paper by andrija novakovic and guillermo angeris
license
mit / apache-2.0
Dependencies
~4–42MB
~628K SLoC