0% found this document useful (0 votes)
4 views

15_GMC

The document discusses probabilistic model-based clustering, emphasizing the use of Gaussian Mixture Models (GMM) to identify latent categories within datasets. It explains the Expectation-Maximization (EM) algorithm as a method for optimizing clustering by iteratively assigning data points to clusters and updating cluster parameters. Additionally, it highlights the advantages and disadvantages of GMMs, including their speed and challenges related to component selection and singularities in covariance estimation.

Uploaded by

l.arrizabalaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

15_GMC

The document discusses probabilistic model-based clustering, emphasizing the use of Gaussian Mixture Models (GMM) to identify latent categories within datasets. It explains the Expectation-Maximization (EM) algorithm as a method for optimizing clustering by iteratively assigning data points to clusters and updating cluster parameters. Additionally, it highlights the advantages and disadvantages of GMMs, including their speed and challenges related to component selection and singularities in covariance estimation.

Uploaded by

l.arrizabalaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Probabilistic Model-Based Clustering

Rubén Sánchez Corcuera


[email protected]

Gaussian Mixture ■ In all the cluster analysis methods we have discussed so far, each
data object can be assigned to only one of a number of clusters.

Clustering
■ This cluster assignment rule is required in some applications such
as assigning customers to marketing managers.
■ However, in other applications, this rigid requirement may not be
desirable.

Probabilistic Model-Based Clustering Probabilistic Model-Based Clustering


■ The goal of cluster analysis is to find hidden categories.
■ We conduct cluster analysis on a dataset because we assume that the
■ A data set that is the subject of cluster analysis can be regarded as a
objects in the dataset in fact belong to different inherent categories.
sample of the possible instances of the hidden categories, but without any
■ Clustering tendency analysis can be used to examine whether a dataset category labels.
contains objects that may lead to meaningful clusters.
■ The clusters derived from cluster analysis are inferred using the data set,
■ Here, the inherent categories hidden in the data are latent, which means and are designed to approach the hidden categories.
they cannot be directly observed.
■ Statistically, we can assume that a hidden category is a distribution over the
○ Instead, we have to infer them using the data observed. data space, which can be mathematically represented using a probability
density function (or distribution function).
■ For example, the topics hidden in a set of reviews in the an online store are
latent because one cannot read the topics directly. ○ We call such a hidden category a probabilistic cluster.
■ However, the topics can be inferred from the reviews because each review ■ For a probabilistic cluster, C, its probability density function, f , and a point,
is about one or multiple topics. o, in the data space, f(o) is the relative likelihood that an instance of C
appears at o.
3 4
Probabilistic Model-Based Clustering Expectation Maximization
Example with a tech product
■ It can be shown that k-means clustering is a special case of fuzzy
clustering. The k-means algorithm iterates until the clustering cannot be
improved.
■ Each iteration consists of two steps:
1. The expectation step (E-step): Given the current cluster centers, each
object is assigned to the cluster with a center that is closest to the
object. Here, an object is expected to belong to the closest cluster.
2. The maximization step (M-step): Given the cluster assignment, for
each cluster, the algorithm adjusts the center so that the sum of the
distances from the objects assigned to this cluster and the new center
is minimized. That is, the similarity of objects assigned to a cluster is
maximized.

5 6

Expectation Maximization Expectation Maximization Characteristics


■ We can generalize this two-step method to tackle fuzzy clustering and
■ In many applications, probabilistic model-based clustering has been shown to
probabilistic model-based clustering.
be effective because it is more general than partitioning methods and fuzzy
■ In general, an expectation-maximization (EM) algorithm is a framework that clustering methods.
approaches maximum likelihood or maximum a posteriori estimates of
■ A distinct advantage is that appropriate statistical models can be used to
parameters in statistical models.
capture latent clusters.
■ In the context of fuzzy or probabilistic model-based clustering, an EM algorithm
■ The EM algorithm is commonly used to handle many learning problems in data
starts with an initial set of parameters and iterates until the clustering cannot be
mining and statistics due to its simplicity.
improved, that is, until the clustering converges or the change is sufficiently
small (less than a preset threshold). ■ Note that, in general, the EM algorithm may not converge to the optimal solution.
It may instead converge to a local maximum 🡪 a good solution, but not the best
■ Each iteration also consists of two steps:
○ Many heuristics have been explored to avoid this. For example, we could
1. The expectation step assigns objects to clusters according to the current run the EM process multiple times using different random initial values.
fuzzy clustering or parameters of probabilistic clusters. ■ Furthermore, the EM algorithm can be very costly if the number of distributions
is large or the data set contains very few observed data points that maximize the
2. The maximization step finds the new clustering or parameters that
expected likelihood in probabilistic model-based clustering.
maximize the expected likelihood in probabilistic model-based clustering.
7 8
Gaussian Mixture Models Gaussian Mixture Models

■ A Gaussian Mixture Model represents the probability distribution of the data


■ Sklearn implements Expectation Maximization with Gaussian Mixture. as a combination of multiple Gaussian distributions. Each Gaussian
■ A Gaussian mixture model is a probabilistic model that assumes all the data component has its own mean (μ), covariance (Σ), and weight (π):
points are generated from a mixture of a finite number of Gaussian
distributions with unknown parameters.
■ One can think of mixture models as generalizing k-means clustering to
incorporate information about the covariance structure of the data as well
as the centers of the latent Gaussians.
■ It can also draw confidence ellipsoids for multivariate models, and compute
the Bayesian Information Criterion to assess the number of clusters in the
data.

9 10

Gaussian Mixture Models Gaussian Mixture Models: Advantages

The GaussianMixture comes with ■ Speed: It is the fastest algorithm for learning mixture models
different options to constrain the
covariance of the difference ■ Agnostic: As this algorithm maximizes only the likelihood, it will not
classes estimated: spherical, bias the means towards zero, or bias the cluster sizes to have
diagonal, tied or full covariance. specific structures that might or might not apply.

11 12
Gaussian Mixture Models: Disadvantages Selecting the number of components in a
classical Gaussian Mixture Model

■ Singularities: When one has insufficiently many points per mixture,


estimating the covariance matrices becomes difficult, and the
algorithm is known to diverge and find solutions with infinite ■ The BIC criterion can be used to select the number of components
likelihood unless one regularizes the covariances artificially. in a Gaussian Mixture in an efficient way. In theory, it recovers the
true number of components only in the asymptotic regime (i.e. if
■ Number of components: This algorithm will always use all the much data is available and assuming that the data was actually
components it has access to, needing held-out data or information generated i.i.d. from a mixture of Gaussian distribution)
theoretical criteria to decide how many components to use in the
absence of external cues.

13 14

Selecting the number of components in a


classical Gaussian Mixture Model Further reading

■ Section 11.1 in [Han & Kamber, 2016]

15 16

You might also like