GitHub - LVivona/safelz4: Python bindings for lz4_flex, the fastest pure-Rust implementation of the LZ4 compression algorithm.

Python bindings for lz4_flex, the fastest pure-Rust implementation of the LZ4 compression algorithm.

Installation

Pip

You can install safelz4 via the pip manager:

pip install safelz4

From source

For the sources, you need Rust

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Make sure it's up to date and using stable channel
rustup update
git clone https://github.com/LVivona/safelz4.git
cd safelz4
pip install setuptools_rust
pip install maturin
# install
pip install -e .

Getting Started

Block Format

safelz4 block

The block format is suitable only for smaller chunks of data, as each block must be fully compressed or decompressed in memory. For larger data sequences, the frame format should be used instead, as it supports streaming and includes metadata for better handling of large-byte sequences. specs

import os
import sys
from typing import Union, Generator
from safelz4.block import compress_prepend_size, decompress_size_prepended

def chunk_blocks(filename : Union[os.PathLike, str], chunk_size : int = 1048576) -> Generator[bytes, None, None]:
    """compress read bytes into chunks blocks"""
    with open(filename, "rb") as f:
        while content := f.read(chunk_size):
            buffer = compress_prepend_size(content)
            yield buffer

# 1 Mb chunck
blocks = chunk_blocks("dickens.txt")

for block in blocks:
    output = decompress_size_prepended(buffer)
    sys.stdout.write(output.decode("utf-8"))

Frame Format

safelz4 frame

Frames are containers that encapsulate a set of compressed blocks. Information about the blocks is stored both in the frame header and within the blocks themselves. Read more within the specs

import safelz4

buffer = None
with open("dickens.txt", "rb") as file:
    buffer = file.read(-1)
    safelz4.compress_into_file("dickens.lz4", buffer)


with safelz4.open("dickens.lz4", "rb") as f:
   while content := f.read(100):
      print(content.decode("utf-8"))

Bechmarks

Benchmark results are available in the benches folder. We evaluated performance in two key scenarios:

Full byte availability, where the entire buffer is accessible during compression and decompression.

Streamed access, using reader and writer interfaces with chunked input.

Summary

In full buffer scenarios, lz4 generally performs well and occasionally outpaces safelz4, especially on larger files. However, safelz4 still remained competitive, with close times.

In reader/writer scenarios (chunked input, 1024 bytes), safelz4 significantly outperforms lz4, consistently achieving more than 2x speed improvement in both compression and decompression.

Streamed access `(chunk 1024 bytes)`

`open` Write Benchmark	lz4	safelz4
ctx_compression_writer_compression_1k.txt	22.5 us	8.84 us: 2.54x faster
ctx_compression_writer_compression_34k.txt	22.6 us	9.07 us: 2.49x faster
ctx_compression_writer_compression_65k.txt	23.0 us	9.18 us: 2.50x faster
ctx_compression_writer_compression_66k_JSON.txt	23.1 us	9.18 us: 2.51x faster
ctx_compression_writer_dickens.txt	23.9 us	9.16 us: 2.61x faster
ctx_compression_writer_hdfs.json	22.9 us	9.21 us: 2.49x faster
ctx_compression_writer_reymont.pdf	22.9 us	9.26 us: 2.48x faster
ctx_compression_writer_xml_collection.xml	23.1 us	9.27 us: 2.49x faster
Geometric mean	(ref)	2.51x faster

`open` Read Benchmark	lz4	safelz4
ctx_decompression_writer_compression_1k.txt	17.6 us	11.0 us: 1.59x faster
ctx_decompression_writer_compression_34k.txt	46.2 us	23.8 us: 1.94x faster
ctx_decompression_writer_compression_65k.txt	68.6 us	34.6 us: 1.98x faster
ctx_decompression_writer_compression_66k_JSON.txt	61.9 us	27.1 us: 2.28x faster
ctx_decompression_writer_dickens.txt	8.67 ms	4.11 ms: 2.11x faster
ctx_decompression_writer_hdfs.json	4.39 ms	1.77 ms: 2.48x faster
ctx_decompression_writer_reymont.pdf	5.74 ms	2.92 ms: 1.97x faster
ctx_decompression_writer_xml_collection.xml	3.97 ms	1.99 ms: 2.00x faster
Geometric mean	(ref)	2.03x faster

Full byte availability Run(s)

`frame.compress` Benchmark	lz4	safelz4
compression_compression_1k.txt	839 ns	829 ns: 1.01x faster
compression_compression_34k.txt	32.5 us	26.3 us: 1.23x faster
compression_compression_65k.txt	60.1 us	49.9 us: 1.20x faster
compression_compression_66k_JSON.txt	24.7 us	26.5 us: 1.07x slower
compression_dickens.txt	15.9 ms	17.0 ms: 1.07x slower
compression_hdfs.json	2.63 ms	3.16 ms: 1.20x slower
compression_reymont.pdf	11.4 ms	12.3 ms: 1.08x slower
compression_xml_collection.xml	4.12 ms	4.58 ms: 1.11x slower
Geometric mean	(ref)	1.01x slower

`frame.decompress` Benchmark	lz4	safelz4
decompress_compression_1k.txt	416 ns	612 ns: 1.47x slower
decompress_compression_34k.txt	10.0 us	8.96 us: 1.12x faster
decompress_compression_65k.txt	17.1 us	15.4 us: 1.11x faster
decompress_compression_66k_JSON.txt	8.04 us	9.45 us: 1.18x slower
decompress_dickens.txt	2.13 ms	4.00 ms: 1.88x slower
decompress_hdfs.json	1.03 ms	1.50 ms: 1.45x slower
decompress_reymont.pdf	1.99 ms	2.42 ms: 1.21x slower
decompress_xml_collection.xml	1.19 ms	1.68 ms: 1.41x slower
Geometric mean	(ref)	1.26x slower

NOTE: All benchmarks were performed using python package pypref, on a system equipped with an Apple M4 Max processor and 36GB of unified memory.

Acknowledgement

This project acknowledges the outstanding work of Yann Collet.

Special thanks also to the maintainers of the lz4_flex Rust crate for providing a safe, pure-Rust implementation of LZ4 compression and decompression.

Other Implementation

LZ4 implementations, including:

Python Library	Build Status	Version	Licence
python-lz4

Licence

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github		.github
benches		benches
examples		examples
fuzz		fuzz
py/safelz4		py/safelz4
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENCE.md		LICENCE.md
README.md		README.md
SECURITY.md		SECURITY.md
makefile		makefile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Pip

From source

Getting Started

Block Format

Frame Format

Bechmarks

Summary

Streamed access `(chunk 1024 bytes)`

Full byte availability Run(s)

Acknowledgement

Other Implementation

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

LVivona/safelz4

Folders and files

Latest commit

History

Repository files navigation

Installation

Pip

From source

Getting Started

Block Format

Frame Format

Bechmarks

Summary

Streamed access (chunk 1024 bytes)

Full byte availability Run(s)

Acknowledgement

Other Implementation

Licence

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Streamed access `(chunk 1024 bytes)`

Packages