spheroids

High-performance spherical clustering with PyTorch and C++

Key Features • Installation • Quick Start • Documentation • Contributing

Package spheroids offers the use of PKBD and spherical Cauchy distributions, which—unlike many other spherical distributions—avoid complicated normalizing constants involving hypergeometric functions and hence do not require iterative evaluations. Instead, they primarily rely on matrix multiplication, making them well-suited for GPU-accelerated computing.

Beyond traditional applications, spheroids is particularly useful for clustering of modern embeddings (e.g., semantic embeddings generated by large language models, image embeddings, or any high-dimensional feature representations). By leveraging high-performance matrix operations on GPUs, it can efficiently group large-scale embedding datasets while benefiting from the flexibility of the deep learning approach when covariates or additional contextual information are included. This way the user can control for the effects of covariates rather than rediscover them using clustering.

The package provides two EM-based estimation methods:

A direct approach (C++ backend) when no covariates are available
A deep learning approach (PyTorch backend) for model-based clustering in an embedding space with covariates

Furthermore, spheroids includes options to regularize the number of clusters using an L1 norm (via a Hadamard product approach inspired by Ziyin and Wang) and can dynamically drop clusters whose total weight falls below a user-specified threshold (min_weight).

Key Features

🚀 High Performance

Core computations implemented in C++ with Armadillo
GPU acceleration via PyTorch
Efficient batch processing

🎯 Multiple Distributions

Poisson kernel-based Distribution (PKBD)
Spherical Cauchy distribution
Extensible architecture for new distributions

📊 Clustering Capabilities

Automatic cluster number selection
Robust parameter estimation
Support for high-dimensional data

Installation

Quick Install (Recommended)

You can install spheroids directly from PyPI with precompiled wheels:

pip install spheroids

Advanced Installation (Local Compilation)

For users who want to build the package locally (e.g., to modify the codebase), follow these steps:

Prerequisites

Python ≥3.8
C++ compiler with C++17 support
Armadillo installed

Steps

On Linux

# Install required libraries
sudo apt-get update
sudo apt-get install -y libarmadillo-dev libomp-dev

# Clone the repository
git clone https://github.com/lsablica/spheroids.git
cd spheroids

# Build and install
pip install -e .

On macOS

# Install required libraries
brew update
brew install armadillo libomp

# Configure compiler paths (if necessary)
export CXXFLAGS="-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include -I/opt/homebrew/opt/armadillo/include"
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib -lomp -L/opt/homebrew/opt/armadillo/lib"

# Clone the repository
git clone https://github.com/lsablica/spheroids.git
cd spheroids

# Build and install
pip install -e .

On Windows

# Clone vcpkg for managing C++ libraries
git clone https://github.com/microsoft/vcpkg.git C:\vcpkg
cd C:\vcpkg
.\bootstrap-vcpkg.bat -disableMetrics
.\vcpkg.exe install armadillo

# Clone the repository
git clone https://github.com/lsablica/spheroids.git
cd spheroids

# Build and install
pip install -e .

Quick Start

import torch
from spheroids import SphericalClustering

# Prepare your data (normalize to unit sphere)
X = torch.randn(1000, 3)
X = X / torch.norm(X, dim=1, keepdim=True)
Y = torch.randn(1000, 2)
Y = Y / torch.norm(Y, dim=1, keepdim=True)

# Create and fit model
model = SphericalClustering(
    num_covariates=3,
    response_dim=2,
    num_clusters=3,
    distribution="pkbd"
)

# Fit model
ll = model.fit(X, Y, num_epochs=100)

Using C++ Implementations

Access optimized C++ implementations directly:

from spheroids import PKBD

# Generate random samples 
samples = PKBD.random_sample(
    n=100,
    rho=0.5,
    mu=np.array([1.0, 0.0])
)

# Calculate log-likelihood
loglik = PKBD.log_likelihood(data, mu, rho)

API Reference

SphericalClustering

SphericalClustering(
    num_covariates: int,     # Number of input features
    response_dim: int,       # Dimension of response variables
    num_clusters: int,       # Initial number of clusters
    distribution: str,       # "pkbd" or "spcauchy"
    min_weight: float = 0.05 # Minimum cluster weight
)

Key Methods

# Fit the model
model.fit(
    X: torch.Tensor,        # Input features (N x num_covariates)
    Y: torch.Tensor,        # Response variables (N x response_dim)
    num_epochs: int = 100,  # Number of training epochs
    lr: float = 1e-3       # Learning rate
)

# Get cluster predictions
pred = model.predict(X)

Examples

Basic Clustering Example

import torch
from spheroids import SphericalClustering

#load data 
Y = np.load('spheroids/spheroids/datasets/pkbd_Y.npy')

# Create model
model = SphericalClustering(num_covariates= 1, 
                            response_dim= 4, 
                            num_clusters=3, 
                            device="cpu", 
                            min_weight=0.02, 
                            distribution="pkbd")

# Fit without covariates
mu, rho = model.fit_no_covariates(Y, num_epochs= 200, tol= 1e-8)

Usage of C++ API

from spheroids import PKBD, spcauchy

# PKBD distribution
pkbd_samples = PKBD.random_sample(1000, 0.5, mu)
pkbd_loglik = PKBD.log_likelihood(data, mu, rho)

# Spherical Cauchy distribution
scauchy_samples = spcauchy.random_sample(1000, 0.5, mu)
scauchy_loglik = spcauchy.log_likelihood(data, mu, rho)

Contributing

We welcome contributions! Here's how you can help:

🐛 Report bugs
💡 Suggest features

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Citation

If you use spheroids in your research, please cite:

@software{spheroids,
  title = {spheroids: A Python Package for Spherical Clustering Models},
  author = {Lukas Sablica},
  year = {2025},
  url = {https://github.com/lsablica/spheroids}
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
requirements		requirements
spheroids		spheroids
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spheroids

Key Features

Installation

Quick Install (Recommended)

Advanced Installation (Local Compilation)

Prerequisites

Steps

On Linux

On macOS

On Windows

Quick Start

Using C++ Implementations

API Reference

SphericalClustering

Key Methods

Examples

Contributing

License

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

lsablica/spheroids

Folders and files

Latest commit

History

Repository files navigation

spheroids

Key Features

Installation

Quick Install (Recommended)

Advanced Installation (Local Compilation)

Prerequisites

Steps

On Linux

On macOS

On Windows

Quick Start

Using C++ Implementations

API Reference

SphericalClustering

Key Methods

Examples

Contributing

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages