2 releases
Uses new Rust 2024
| 0.1.1 | Oct 12, 2025 |
|---|---|
| 0.1.0 | Oct 11, 2025 |
#1759 in Algorithms
18KB
205 lines
larch
This crate provides a generic, trait-based framework for clustering and sampling algorithms. The main goal of this library is to implemenent a hierarchical clustering method to determine k for seeding a kmeans clustering algorithm.
It supports:
- Metric-based clustering via the
Metrictrait. - Hierachical seeding of kmeans through tree cutting agglomerative clustering algorithm.
- Centroid-based clustering methods like k-means and hierarchical clustering through
ClusterCompare.
Overview
The crate is designed for composability and generic use. Implement the Metric trait
for your own data type, then derive additional clustering behavior automatically through
the provided default methods.
Example
use larch::prelude::*;
#[derive(Clone, Copy, Debug)]
struct Point(f64, f64);
impl Metric<Point> for Point {
fn distance(&self, other: &Point) -> f64 {
((self.0 - other.0).powi(2) + (self.1 - other.1).powi(2)).sqrt()
}
}
impl ClusterCompare<Point> for Point {
fn compute_centroid(data: &Vec<Point>) -> Point {
let (sx, sy): (f64, f64) = data.iter().fold((0.0, 0.0), |acc, p| (acc.0 + p.0, acc.1 + p.1));
let n = data.len() as f64;
Point(sx / n, sy / n)
}
}
let points = vec![Point(0.0, 0.0), Point(1.0, 1.0), Point(10.0, 10.0)];
// normal k-means
let kmeans_clusters = Point::kmeans(&points, 2);
// determine k by hierarchical clustering + tree cutting then run kmeans
let seeded_clusters = Point::hierarchical_seeded_kmeans(&points);
println!("K-means clusters: {:?}", kmeans_clusters);
println!("Seeded K-means clusters: {:?}", seeded_clusters);
Note
This library is not highly optmized and due to the Rust implementation of BinaryHeap the hierarchical
clustering is not deterministic.
License
larch is licensed under the BSD 3 license (see LICENSE in the main repository).
Dependencies
~1.5MB
~25K SLoC