3 unstable releases
| 0.2.0 | May 26, 2025 |
|---|---|
| 0.1.1 | May 21, 2025 |
| 0.1.0 | May 21, 2025 |
#266 in Concurrency
125 downloads per month
290KB
6K
SLoC
GoL Engines: top-performance Conway's Game of Life update algorithms, including parallel hashlife
Moved from https://github.com/das67333/conway/
Table of Contents
Overview
The repository includes several cross-platform update algorithm implementations for Conway's Game of Life (focusing on HashLife and StreamLife):
- SIMDEngine is a relatively simple engine performing updates in a pattern-oblivious way: it stores the field in packed (1 bit per cell) row-major format and updates a consecutive group of cells in a few dozens of CPU instructions. It currently updates 64 cells at once and can be easily changed to conditionally use AVX-like instructions.
- HashLifeEngineSmall is similar to the one in lifelib --- it uses a hashtable with a chaining collision handling technique and stores nodes corresponding to squares of different sizes separately. Nodes of different sizes are indexed independently, zero index corresponds to the blank node. Leaf size is $8\times8$, hash function is Golly-based
let h = nw * 5 + ne * 17 + sw * 257 + se * 65537; h += h >> 11. - StreamLifeEngineSmall is based on HashLifeEngineSmall; it caches results of updates in a single standard hashtable (which is open-addressing) with a fast hash function from ahash.
- HashLifeEngineSync uses a single pre-allocated open-addressing hashtable with linear probing. Unlike HashLifeEngineSmall, the hashtable never grows. If the hashtable reaches load factor 0.75 during the update, the algorithm temporarily "poisons" the hashtable to quickly terminate the update, clears the hashtable and loads the configuration of cells it stored before the update.
- StreamLifeEngineSync also uses a preallocated open-addresssing hashtable that never grows. If it overfills, the algorithm also "poisons" the hashtables and terminates the update.
- HashLifeEngineAsync is the HashLifeEngineSync that was modified to thread-safely execute in parallel. It uses asynchronous runtime tokio. New asynchronous tasks (lightweight threads) for recursive calls are spawned only when processing a square not smaller than a given threshold (
MIN_TASK_SPAWN_SIZE_LOG2) and total number of running asynchronous tasks is smaller thanMAX_TASKS_COUNT. - StreamLifeEngineAsync is like StreamLifeEngineSync, but it is based on HashLifeEngineAsync. Recursive calls to the update function can spawn new asynchronous tasks, like in HashLifeEngineAsync.
Two topologies of the field are supported: Unbounded and Torus (the latter on a $2^n\times2^n$ square grid).
Architecture
The Pattern structure is designed to be a fast and compact checkpoint for the engines. It stores a configuration of cells in quadtree form (like HashLife).
Features
- Parallel HashLife and StreamLife!
- Crossplatform
- Right now only supports patterns with B3/S23 rule
- Can read and write .rle, .mc and .mc.gz file formats
- Can efficiently metafy patterns (for example, create multi-level OTCA metapixel)
- Automatically handles overfilled hashtables, follows user memory limitations
Building
If you don't have Rust installed, these commands should be sufficient (on modern Ubuntu):
sudo apt install -y build-essential
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
CLI interface can be compiled with
cargo build --release --bin gol_engines_cli --features=cli_deps
Examples
Update
Update in one step:
$ target/release/gol_engines_cli update \
res/very_large_patterns/0e0p-metaglider.mc.gz \
--output=res/temp.mc.gz \
--mem-limit-gib=12 \
--workers=16 \
--gens-log2=12 \
--population
Initialized engine in 5.3 secs
Loaded pattern in 1.8 secs
Updated pattern by 2^12 generations in 13.6 secs
Population: 93_237_300
Overfilled hashtable:
$ target/release/gol_engines_cli update \
res/very_large_patterns/0e0p-metaglider.mc.gz \
--output=res/temp.mc.gz \
--mem-limit-gib=6 \
--workers=16 \
--gens-log2=12 \
--population
Initialized engine in 2.6 secs
Loaded pattern in 1.3 secs
Overfilled hashtables, reducing step_log2 from 12 to 10 (and running GC)
Updated by 1024 out of 4096 generations
Updated by 2048 out of 4096 generations
Updated by 3072 out of 4096 generations
Overfilled hashtables, running GC
Updated pattern by 2^12 generations in 32.9 secs
Population: 93_237_300
Providing step-log2 manually:
$ target/release/gol_engines_cli update \
res/very_large_patterns/0e0p-metaglider.mc.gz \
--output=res/temp.mc.gz \
--mem-limit-gib=6 \
--workers=16 \
--gens-log2=12 \
--step-log2=10 \
--population
Initialized engine in 2.6 secs
Loaded pattern in 1.3 secs
Updated by 1024 out of 4096 generations
Updated by 2048 out of 4096 generations
Updated by 3072 out of 4096 generations
Overfilled hashtables, running GC
Updated pattern by 2^12 generations in 20.8 secs
Population: 93_237_300
And with streamlife:
$ target/release/gol_engines_cli update \
res/very_large_patterns/0e0p-metaglider.mc.gz \
--output=res/temp.mc.gz \
--mem-limit-gib=13 \
--workers=6 \
--gens-log2=18 \
--engine=streamlife \
--population
Initialized engine in 6.1 secs
Loaded pattern in 2.0 secs
Updated pattern by 2^18 generations in 58.5 secs
Population: 93_235_655
Metafy
Let's metafy glider:
$ target/release/gol_engines_cli metafy \
res/glider.rle res/otca_0.mc.gz res/otca_1.mc.gz \
--output=res/temp.mc.gz \
--population
Loaded patterns in 0.0 secs
Metafied pattern in 0.0 secs
Population: 593_927
With bigger k:
$ target/release/gol_engines_cli metafy \
res/glider.rle res/otca_0.mc.gz res/otca_1.mc.gz \
--k=10 \
--output=res/temp.mc.gz \
--population
Loaded patterns in 0.0 secs
Metafied pattern in 0.0 secs
Population: 155_233_185_229_932_687_224_687_411_801_884_328_181_836_255_094_962_897_687_012_037_389
k=0 doesn't modify the pattern:
$ target/release/gol_engines_cli metafy \
res/glider.rle res/otca_0.mc.gz res/otca_1.mc.gz \
--k=0 \
--output=res/temp.mc.gz \
--population
Loaded patterns in 0.0 secs
Metafied pattern in 0.0 secs
Population: 5
Stats
For 0e0p-metaglider:
$ target/release/gol_engines_cli stats \
res/very_large_patterns/0e0p-metaglider.mc.gz
Hash: 0xc322148cce4e1279
Population: 93_235_805
Distribution of node sizes (side lengths of the squares):
total -> 818007
2^6 -> 36%
2^7 -> 46%
2^8 -> 12%
2^9 -> 3%
Computed stats in 1.0 secs
Pi calculator is much smaller:
$ target/release/gol_engines_cli stats \
res/very_large_patterns/pi.mc.gz
Hash: 0xd9c1db4109c7b2ab
Population: 1_189_325
Distribution of node sizes (side lengths of the squares):
total -> 19699
2^3 -> 15%
2^4 -> 34%
2^5 -> 24%
2^6 -> 9%
2^7 -> 5%
2^8 -> 2%
2^9 -> 2%
2^10 -> 2%
2^11 -> 1%
2^12 -> 1%
Computed stats in 0.1 secs
Help
$ target/release/gol_engines_cli help
Tools for Conway's Game of Life
Usage: gol_engines_cli <COMMAND>
Commands:
update Run the simulation using parallel implementations of the update algorithms
metafy Replace every basic cell with a corresponding metacell (see https://conwaylife.com/wiki/Unit_cell) and repeat it k times
stats Compute pattern's hash, population and nodes distribution
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Update
$ target/release/gol_engines_cli help update
Run the simulation using parallel implementations of the update algorithms
Usage: gol_engines_cli update [OPTIONS] --output <OUTPUT> --mem-limit-gib <MEM_LIMIT_GIB> --workers <WORKERS> --gens-log2 <GENS_LOG2> <PATTERN>
Arguments:
<PATTERN>
Path to the file containing the pattern; supports .rle, .mc and .mc.gz formats
Options:
-o, --output <OUTPUT>
Path to the file where the resulting pattern will be saved
-e, --engine <ENGINE>
The engine to use for the simulation, default is hashlife
[default: hashlife]
Possible values:
- hashlife: See https://conwaylife.com/wiki/HashLife
- streamlife: See https://conwaylife.com/wiki/StreamLife
-m, --mem-limit-gib <MEM_LIMIT_GIB>
Maximum memory (in GiB) allocated to the hash tables; all other usage is typically negligible
-w, --workers <WORKERS>
The number of worker threads to use for the update
-g, --gens-log2 <GENS_LOG2>
The pattern will be updated by 2^gens_log2 generations
-s, --step-log2 <STEP_LOG2>
How many generations to update at once, uses `gens_log2` by default
-t, --topology <TOPOLOGY>
The topology of the pattern, default is unbounded
Possible values:
- unbounded: The pattern is unbounded in all directions
- torus: Opposite bounds of the field are stitched together
-p, --population
Count population of the resulting pattern
-h, --help
Print help (see a summary with '-h')
Metafy
$ target/release/gol_engines_cli help metafy
Replace every basic cell with a corresponding metacell (see https://conwaylife.com/wiki/Unit_cell) and repeat it k times
Usage: gol_engines_cli metafy [OPTIONS] --output <OUTPUT> <PATTERN> <META_0> <META_1>
Arguments:
<PATTERN> Path to the file containing the pattern; supports .rle, .mc and .mc.gz formats
<META_0> Path to the file containing the off state of the metacell
<META_1> Path to the file containing the on state of the metacell
Options:
-k, --k <K> The number of times to apply the metacell replacement, default is 1 [default: 1]
-o, --output <OUTPUT> Path to the file where the resulting pattern will be saved
-p, --population Count population of the resulting pattern
-h, --help Print help
Stats
$ target/release/gol_engines_cli help stats
Compute pattern's hash, population and nodes distribution
Usage: gol_engines_cli stats <PATTERN>
Arguments:
<PATTERN> Path to the file containing the pattern; supports .rle, .mc and .mc.gz formats
Options:
-h, --help Print help
Benchmark
These are the performance results of different implementations of GoL algorithms when updating the 0e0p-metaglider. The benchmarks were conducted on a Yandex.Cloud virtual machine with 32 logical cores and 96 GiB of RAM. Each engine was able to allocate at least 64 GiB of memory, which is sufficient to store all emerging nodes. Golly version 4.3 was used. Lifelib was compiled with clang-19 using the -O3 -march=native flags. Gol_engines version 0.2.0 was compiled with cargo (Rust 1.87) in release mode. HashLifeEngineAsync used 24 workers, StreamLifeEngineAsync used 6 workers.
This is updating 0e0p-mataglider by $2^{14}$ generations with HashLife:
| Implementation | Initialization time | Update time |
|---|---|---|
| lifelib | - | 912.1 |
| Golly | - | 848.5 |
| HashLifeEngineSmall | - | 642.1 |
| HashLifeEngineSync | 23.8 | 431.5 |
| HashLifeEngineAsync | 25.4 | 41.7 |
And this is updating 0e0p-mataglider by $2^{27}$ generations with StreamLife:
| Implementation | Initialization time | Update time |
|---|---|---|
| lifelib | - | 991.1 |
| StreamLifeEngineSmall | - | 742.5 |
| StreamLifeEngineSync | 26.3 | 533.4 |
| StreamLifeEngineAsync | 26.9 | 186.9 |
Tips
As all the hashtables are power-of-two sized, there are certain memory-limit-gib that double the sizes of them. They are:
- $12 \cdot 2^k$ for hashlife (because hashlife node is 24 bytes in size)
- $13 \cdot 2^k$ for streamlife (because node of internal hashlife is 32 bytes, streamlife node is 20 bytes and these hashtables are initialized with equal capacity)
I reached best performance with 16-24 workers for hashlife and 6 workers for streamlife on 96-core virtual machine for 0e0p-metaglider. The best value can depend on the pattern structure. You can try other values, but notice that it might be important to provide a whole physical (not logical) core for every worker.
This is updating 0e0p-metaglider with HashLifeEngineAsync by $2^{12}$ generations with different values of WORKER_THREADS:
This is the same for $2^{15}$ generations and StreamLifeEngineAsync:
Small patterns (even the pi calculator) cannot benefit from parallelization as their simulations don't involve enough operations that can be performed independently.
Dependencies
~3.5–7.5MB
~124K SLoC