Python bindings for lz4_flex, the fastest pure-Rust implementation of the LZ4 compression algorithm.
You can install safelz4 via the pip manager:
pip install safelz4For the sources, you need Rust
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Make sure it's up to date and using stable channel
rustup update
git clone https://github.com/LVivona/safelz4.git
cd safelz4
pip install setuptools_rust
pip install maturin
# install
pip install -e .
The block format is suitable only for smaller chunks of data, as each block must be fully compressed or decompressed in memory. For larger data sequences, the frame format should be used instead, as it supports streaming and includes metadata for better handling of large-byte sequences. specs
import os
import sys
from typing import Union, Generator
from safelz4.block import compress_prepend_size, decompress_size_prepended
def chunk_blocks(filename : Union[os.PathLike, str], chunk_size : int = 1048576) -> Generator[bytes, None, None]:
"""compress read bytes into chunks blocks"""
with open(filename, "rb") as f:
while content := f.read(chunk_size):
buffer = compress_prepend_size(content)
yield buffer
# 1 Mb chunck
blocks = chunk_blocks("dickens.txt")
for block in blocks:
output = decompress_size_prepended(buffer)
sys.stdout.write(output.decode("utf-8"))
Frames are containers that encapsulate a set of compressed blocks. Information about the blocks is stored both in the frame header and within the blocks themselves. Read more within the specs
import safelz4
buffer = None
with open("dickens.txt", "rb") as file:
buffer = file.read(-1)
safelz4.compress_into_file("dickens.lz4", buffer)
with safelz4.open("dickens.lz4", "rb") as f:
while content := f.read(100):
print(content.decode("utf-8"))Benchmark results are available in the benches folder. We evaluated performance in two key scenarios:
Full byte availability, where the entire buffer is accessible during compression and decompression.
Streamed access, using reader and writer interfaces with chunked input.
In full buffer scenarios, lz4 generally performs well and occasionally outpaces safelz4, especially on larger files. However, safelz4 still remained competitive, with close times.
In reader/writer scenarios (chunked input, 1024 bytes), safelz4 significantly outperforms lz4, consistently achieving more than 2x speed improvement in both compression and decompression.
open Write Benchmark |
lz4 | safelz4 |
|---|---|---|
| ctx_compression_writer_compression_1k.txt | 22.5 us | 8.84 us: 2.54x faster |
| ctx_compression_writer_compression_34k.txt | 22.6 us | 9.07 us: 2.49x faster |
| ctx_compression_writer_compression_65k.txt | 23.0 us | 9.18 us: 2.50x faster |
| ctx_compression_writer_compression_66k_JSON.txt | 23.1 us | 9.18 us: 2.51x faster |
| ctx_compression_writer_dickens.txt | 23.9 us | 9.16 us: 2.61x faster |
| ctx_compression_writer_hdfs.json | 22.9 us | 9.21 us: 2.49x faster |
| ctx_compression_writer_reymont.pdf | 22.9 us | 9.26 us: 2.48x faster |
| ctx_compression_writer_xml_collection.xml | 23.1 us | 9.27 us: 2.49x faster |
| Geometric mean | (ref) | 2.51x faster |
open Read Benchmark |
lz4 | safelz4 |
|---|---|---|
| ctx_decompression_writer_compression_1k.txt | 17.6 us | 11.0 us: 1.59x faster |
| ctx_decompression_writer_compression_34k.txt | 46.2 us | 23.8 us: 1.94x faster |
| ctx_decompression_writer_compression_65k.txt | 68.6 us | 34.6 us: 1.98x faster |
| ctx_decompression_writer_compression_66k_JSON.txt | 61.9 us | 27.1 us: 2.28x faster |
| ctx_decompression_writer_dickens.txt | 8.67 ms | 4.11 ms: 2.11x faster |
| ctx_decompression_writer_hdfs.json | 4.39 ms | 1.77 ms: 2.48x faster |
| ctx_decompression_writer_reymont.pdf | 5.74 ms | 2.92 ms: 1.97x faster |
| ctx_decompression_writer_xml_collection.xml | 3.97 ms | 1.99 ms: 2.00x faster |
| Geometric mean | (ref) | 2.03x faster |
frame.compress Benchmark |
lz4 | safelz4 |
|---|---|---|
| compression_compression_1k.txt | 839 ns | 829 ns: 1.01x faster |
| compression_compression_34k.txt | 32.5 us | 26.3 us: 1.23x faster |
| compression_compression_65k.txt | 60.1 us | 49.9 us: 1.20x faster |
| compression_compression_66k_JSON.txt | 24.7 us | 26.5 us: 1.07x slower |
| compression_dickens.txt | 15.9 ms | 17.0 ms: 1.07x slower |
| compression_hdfs.json | 2.63 ms | 3.16 ms: 1.20x slower |
| compression_reymont.pdf | 11.4 ms | 12.3 ms: 1.08x slower |
| compression_xml_collection.xml | 4.12 ms | 4.58 ms: 1.11x slower |
| Geometric mean | (ref) | 1.01x slower |
frame.decompress Benchmark |
lz4 | safelz4 |
|---|---|---|
| decompress_compression_1k.txt | 416 ns | 612 ns: 1.47x slower |
| decompress_compression_34k.txt | 10.0 us | 8.96 us: 1.12x faster |
| decompress_compression_65k.txt | 17.1 us | 15.4 us: 1.11x faster |
| decompress_compression_66k_JSON.txt | 8.04 us | 9.45 us: 1.18x slower |
| decompress_dickens.txt | 2.13 ms | 4.00 ms: 1.88x slower |
| decompress_hdfs.json | 1.03 ms | 1.50 ms: 1.45x slower |
| decompress_reymont.pdf | 1.99 ms | 2.42 ms: 1.21x slower |
| decompress_xml_collection.xml | 1.19 ms | 1.68 ms: 1.41x slower |
| Geometric mean | (ref) | 1.26x slower |
NOTE: All benchmarks were performed using python package pypref, on a system equipped with an Apple M4 Max processor and 36GB of unified memory.
This project acknowledges the outstanding work of Yann Collet.
Special thanks also to the maintainers of the lz4_flex Rust crate for providing a safe, pure-Rust implementation of LZ4 compression and decompression.
LZ4 implementations, including:
| Python Library | Build Status | Version | Licence |
|---|---|---|---|
| python-lz4 |