#k-means #cluster-analysis #hierarchical #k-means-clustering

larch

Hierarchical kmeans clustering seeding algorithm

2 releases

Uses new Rust 2024

0.1.1 Oct 12, 2025
0.1.0 Oct 11, 2025

#1759 in Algorithms

Custom license

18KB
205 lines

crates.io crates.io

larch

This crate provides a generic, trait-based framework for clustering and sampling algorithms. The main goal of this library is to implemenent a hierarchical clustering method to determine k for seeding a kmeans clustering algorithm.

It supports:

  • Metric-based clustering via the Metric trait.
  • Hierachical seeding of kmeans through tree cutting agglomerative clustering algorithm.
  • Centroid-based clustering methods like k-means and hierarchical clustering through ClusterCompare.

Overview

The crate is designed for composability and generic use. Implement the Metric trait for your own data type, then derive additional clustering behavior automatically through the provided default methods.

Example

use larch::prelude::*;

#[derive(Clone, Copy, Debug)]
struct Point(f64, f64);

impl Metric<Point> for Point {
    fn distance(&self, other: &Point) -> f64 {
        ((self.0 - other.0).powi(2) + (self.1 - other.1).powi(2)).sqrt()
    }
}

impl ClusterCompare<Point> for Point {
    fn compute_centroid(data: &Vec<Point>) -> Point {
        let (sx, sy): (f64, f64) = data.iter().fold((0.0, 0.0), |acc, p| (acc.0 + p.0, acc.1 + p.1));
        let n = data.len() as f64;
        Point(sx / n, sy / n)
    }
}

let points = vec![Point(0.0, 0.0), Point(1.0, 1.0), Point(10.0, 10.0)];
// normal k-means
let kmeans_clusters = Point::kmeans(&points, 2);
// determine k by hierarchical clustering + tree cutting then run kmeans
let seeded_clusters = Point::hierarchical_seeded_kmeans(&points);
println!("K-means clusters: {:?}", kmeans_clusters);
println!("Seeded K-means clusters: {:?}", seeded_clusters);

Note

This library is not highly optmized and due to the Rust implementation of BinaryHeap the hierarchical clustering is not deterministic.

License

larch is licensed under the BSD 3 license (see LICENSE in the main repository).

Dependencies

~1.5MB
~25K SLoC