High-performance spherical clustering with PyTorch and C++
Key Features • Installation • Quick Start • Documentation • Contributing
Package spheroids offers the use of PKBD and spherical Cauchy distributions, which—unlike many other spherical distributions—avoid complicated normalizing constants involving hypergeometric functions and hence do not require iterative evaluations. Instead, they primarily rely on matrix multiplication, making them well-suited for GPU-accelerated computing.
Beyond traditional applications, spheroids is particularly useful for clustering of modern embeddings (e.g., semantic embeddings generated by large language models, image embeddings, or any high-dimensional feature representations). By leveraging high-performance matrix operations on GPUs, it can efficiently group large-scale embedding datasets while benefiting from the flexibility of the deep learning approach when covariates or additional contextual information are included. This way the user can control for the effects of covariates rather than rediscover them using clustering.
The package provides two EM-based estimation methods:
- A direct approach (C++ backend) when no covariates are available
- A deep learning approach (PyTorch backend) for model-based clustering in an embedding space with covariates
Furthermore, spheroids includes options to regularize the number of clusters using an L1 norm (via a Hadamard product approach inspired by Ziyin and Wang) and can dynamically drop clusters whose total weight falls below a user-specified threshold (min_weight).
🚀 High Performance
- Core computations implemented in C++ with Armadillo
- GPU acceleration via PyTorch
- Efficient batch processing
🎯 Multiple Distributions
- Poisson kernel-based Distribution (PKBD)
- Spherical Cauchy distribution
- Extensible architecture for new distributions
📊 Clustering Capabilities
- Automatic cluster number selection
- Robust parameter estimation
- Support for high-dimensional data
You can install spheroids directly from PyPI with precompiled wheels:
pip install spheroidsFor users who want to build the package locally (e.g., to modify the codebase), follow these steps:
- Python ≥3.8
- C++ compiler with C++17 support
- Armadillo installed
# Install required libraries
sudo apt-get update
sudo apt-get install -y libarmadillo-dev libomp-dev
# Clone the repository
git clone https://github.com/lsablica/spheroids.git
cd spheroids
# Build and install
pip install -e .# Install required libraries
brew update
brew install armadillo libomp
# Configure compiler paths (if necessary)
export CXXFLAGS="-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include -I/opt/homebrew/opt/armadillo/include"
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib -lomp -L/opt/homebrew/opt/armadillo/lib"
# Clone the repository
git clone https://github.com/lsablica/spheroids.git
cd spheroids
# Build and install
pip install -e .# Clone vcpkg for managing C++ libraries
git clone https://github.com/microsoft/vcpkg.git C:\vcpkg
cd C:\vcpkg
.\bootstrap-vcpkg.bat -disableMetrics
.\vcpkg.exe install armadillo
# Clone the repository
git clone https://github.com/lsablica/spheroids.git
cd spheroids
# Build and install
pip install -e .import torch
from spheroids import SphericalClustering
# Prepare your data (normalize to unit sphere)
X = torch.randn(1000, 3)
X = X / torch.norm(X, dim=1, keepdim=True)
Y = torch.randn(1000, 2)
Y = Y / torch.norm(Y, dim=1, keepdim=True)
# Create and fit model
model = SphericalClustering(
num_covariates=3,
response_dim=2,
num_clusters=3,
distribution="pkbd"
)
# Fit model
ll = model.fit(X, Y, num_epochs=100)Access optimized C++ implementations directly:
from spheroids import PKBD
# Generate random samples
samples = PKBD.random_sample(
n=100,
rho=0.5,
mu=np.array([1.0, 0.0])
)
# Calculate log-likelihood
loglik = PKBD.log_likelihood(data, mu, rho)SphericalClustering(
num_covariates: int, # Number of input features
response_dim: int, # Dimension of response variables
num_clusters: int, # Initial number of clusters
distribution: str, # "pkbd" or "spcauchy"
min_weight: float = 0.05 # Minimum cluster weight
)# Fit the model
model.fit(
X: torch.Tensor, # Input features (N x num_covariates)
Y: torch.Tensor, # Response variables (N x response_dim)
num_epochs: int = 100, # Number of training epochs
lr: float = 1e-3 # Learning rate
)
# Get cluster predictions
pred = model.predict(X)Basic Clustering Example
import torch
from spheroids import SphericalClustering
#load data
Y = np.load('spheroids/spheroids/datasets/pkbd_Y.npy')
# Create model
model = SphericalClustering(num_covariates= 1,
response_dim= 4,
num_clusters=3,
device="cpu",
min_weight=0.02,
distribution="pkbd")
# Fit without covariates
mu, rho = model.fit_no_covariates(Y, num_epochs= 200, tol= 1e-8)Usage of C++ API
from spheroids import PKBD, spcauchy
# PKBD distribution
pkbd_samples = PKBD.random_sample(1000, 0.5, mu)
pkbd_loglik = PKBD.log_likelihood(data, mu, rho)
# Spherical Cauchy distribution
scauchy_samples = spcauchy.random_sample(1000, 0.5, mu)
scauchy_loglik = spcauchy.log_likelihood(data, mu, rho)We welcome contributions! Here's how you can help:
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
If you use spheroids in your research, please cite:
@software{spheroids,
title = {spheroids: A Python Package for Spherical Clustering Models},
author = {Lukas Sablica},
year = {2025},
url = {https://github.com/lsablica/spheroids}
}