0% found this document useful (0 votes)

3 views12 pages

Machine Learning Chapter 3

Clustering is an unsupervised machine learning technique that groups similar data points into clusters without requiring labeled data. Various algorithms such as K-Means, Hierarchical, and DBSCAN are used for clustering, each with unique applications like customer segmentation, anomaly detection, and image processing. The choice of algorithm depends on the data characteristics and specific use cases.

Uploaded by

preetysharma05031

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views12 pages

Machine Learning Chapter 3

Uploaded by

preetysharma05031

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Clustering & Its Use Cases

What is Clustering?
Clustering is an unsupervised machine learning technique that groups similar data points into clusters
based on their similarities. Unlike classification, clustering does not require labeled data and is used to
identify patterns within datasets. Each cluster represents a group of data points that are more similar to
each other than to those in other clusters.

Types of Clustering Algorithms

K-Means Clustering
Divides the data into ‘K’ clusters using centroids.

Hierarchical Clustering
Forms a hierarchy of clusters using a tree-like structure.

DBSCAN (Density-Based Spatial Clustering)

Groups data points based on density, useful for discovering arbitrary-shaped clusters.

Gaussian Mixture Model (GMM)

Uses probabilistic clustering to model data points as belonging to multiple clusters.

Use Cases of Clustering

Customer Segmentation
Businesses use clustering to segment customers based on purchasing behavior, demographics, or
interests.
Example: E-commerce platforms categorize customers to provide personalized recommendations.

Anomaly Detection
Clustering helps detect outliers or fraudulent transactions.
Example: Banks and credit card companies use clustering to flag suspicious activities.

Image Segmentation
Clustering is used in image processing to separate objects in images.
Example: Medical imaging to identify tumors in MRI scans.

Document Clustering
Organizing documents based on topics using clustering techniques.
Example: News aggregation websites group articles with similar content.

Social Network Analysis

Clustering helps identify communities in social networks.
Example: Facebook & LinkedIn recommend friends or connections based on clustering.

Biological Data Analysis

Used in genetics to group similar DNA sequences or cell types.
Example: Cancer research for detecting similar genetic expressions.
Conclusion
Clustering is a powerful technique in machine learning that helps uncover patterns in unlabeled
datasets. From customer segmentation to medical imaging, clustering plays a vital role in various
industries. The choice of clustering algorithm depends on the nature of the data and the specific
problem at hand.

K-Means Clustering

Introduction
K-Means is one of the most popular unsupervised learning algorithms used for clustering. It partitions
data into K clusters, where each data point belongs to the cluster with the nearest mean (centroid). The
algorithm iteratively refines the clusters until the centroids stabilize.

How K-Means Clustering Works?

1. Select the number of clusters (K): Choose the number of clusters based on domain knowledge or use
methods like the Elbow Method.

2. Initialize Centroids: Randomly select K data points as initial cluster centroids.

3. Assign Points to Clusters: Each data point is assigned to the nearest centroid.

4. Recalculate Centroids: Compute the new mean of data points in each cluster.

5. Repeat Steps 3 & 4 until centroids no longer change significantly.

Use Cases of K-Means Clustering

Customer Segmentation
Used in e-commerce and marketing to categorize customers based on purchasing behavior.

Image Compression
Reduces image size by clustering similar colors together (color quantization).

Anomaly Detection
Detects fraudulent transactions or network intrusions by identifying outliers.

Document Clustering
Groups similar articles or documents based on topic.

Biological Data Analysis

Clusters genes or disease types in medical research.

Python Implementation of K-Means Clustering

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# Generate sample data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=1.0, random_state=42)

# Apply K-Means clustering

k = 4 # Number of clusters
kmeans = KMeans(n_clusters=k, random_state=42)
y_kmeans = kmeans.fit_predict(X)

# Plot results
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, cmap='viridis', marker='o', edgecolors='k', alpha=0.75)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red', marker='X',
label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

Explanation of the Code

1. Generate Data: We create a synthetic dataset with 4 clusters using `make_blobs`.
2. Apply K-Means: We set `K=4` and fit the K-Means model to the data.
3. Cluster Assignment: The model assigns each point to a cluster.
4. Plot the Clusters: The clusters are visualized along with their centroids.

How Does the K-Means Algorithm Work?

The K-Means algorithm is an unsupervised machine learning algorithm used for clustering data points
into groups based on similarity. It works by partitioning a dataset into K clusters, where each data point
belongs to the nearest cluster center (centroid).

1. Steps in the K-Means Algorithm

Step 1: Choose the Number of Clusters (K)

Decide the number of clusters (K) you want to divide the data into. The choice of K is crucial and can be
determined using techniques like the Elbow Method or Silhouette Score.

Step 2: Initialize Cluster Centroids

Randomly select K points from the dataset as the initial cluster centers (centroids). These centroids act
as the starting points for defining the clusters.

Step 3: Assign Data Points to the Nearest Centroid

Each data point is assigned to the nearest centroid using the Euclidean distance formula:
d =√(𝑥2 − 𝑥1 )2 − (𝑦2 − 𝑦1 )2
This forms K clusters where each point belongs to the closest centroid.

Step 4: Compute New Centroids

For each cluster, calculate the mean of all points in that cluster. The new centroid is the average position
of all points in the cluster.
Step 5: Repeat Steps 3 & 4 Until Convergence
The assignments and centroid updates repeat until the centroids stop changing or the changes are
minimal. The algorithm converges when points no longer switch clusters.

2. Example of K-Means Clustering

Imagine we have a dataset of customers and want to cluster them based on spending patterns:
1. Select K = 3 (for three customer groups: low, medium, and high spenders).
2. Randomly pick 3 centroids from the dataset.
3. Assign each customer to the nearest centroid based on spending behavior.
4. Compute new centroids based on the average spending of each cluster.
5. Repeat until clusters stabilize.

3. Choosing the Optimal K (Elbow Method)

The Elbow Method helps find the best K by plotting the Sum of Squared Errors (SSE) for different values
of K. SSE decreases as K increases, but after a certain point, the reduction becomes insignificant (elbow
point). The K at this elbow point is chosen as the optimal number of clusters.

4. Applications of K-Means
• Customer Segmentation (E-commerce, banking)
• Image Segmentation (Grouping similar pixels in images)
• Anomaly Detection (Detecting fraud in transactions)
• Genetics (Clustering genes with similar expressions)

C-Means Clustering (Fuzzy C-Means Algorithm)

C-Means Clustering, specifically Fuzzy C-Means (FCM), is an unsupervised machine learning algorithm
used for clustering data points into groups. Unlike K-Means, where each data point belongs strictly to
one cluster, FCM allows a point to belong to multiple clusters with different degrees of membership.

1. How Does C-Means Clustering Work?

2. Difference Between K-Means and Fuzzy C-Means
Feature K-Means Clustering Fuzzy C-Means Clustering

Membership Hard assignment (0 or 1) Soft assignment (values

between 0 and 1)

Cluster Assignment Each point belongs to one Each point belongs to

cluster only multiple clusters

Flexibility Rigid clustering More flexible clustering

Best for Well-separated clusters Overlapping clusters

3. Applications of Fuzzy C-Means Clustering

• Image Segmentation (Medical imaging, object detection)
• Pattern Recognition (Speech recognition, handwriting recognition)
• Anomaly Detection (Fraud detection, cybersecurity)
• Data Compression (Reducing data complexity in large datasets)

Hierarchical Clustering

Introduction
Hierarchical Clustering is an unsupervised machine learning algorithm used to group data
into clusters. Unlike K-Means, it does not require specifying the number of clusters in
advance. Instead, it creates a hierarchy of clusters that can be visualized using a
dendrogram.

Types of Hierarchical Clustering

Agglomerative Clustering (Bottom-Up Approach)

Each data point starts as its own cluster. Clusters are merged step by step until only one
remains. This is the most commonly used approach.

Divisive Clustering (Top-Down Approach)

Starts with a single large cluster and splits recursively into smaller clusters.

How Agglomerative Hierarchical Clustering Works?

1. Assign Each Data Point as an Individual Cluster.
2. Compute Distance Between Clusters using metrics like Euclidean, Manhattan, or Cosine
distance.

3. Merge the Closest Clusters.

4. Repeat Steps 2 & 3 Until One Cluster Remains.

5. Use a Dendrogram to Decide the Optimal Number of Clusters.

Use Cases of Hierarchical Clustering

Customer Segmentation
Grouping customers based on purchasing behavior or demographics.

Bioinformatics
Identifying genetic similarity or DNA sequence analysis.

Document Clustering
Organizing articles or research papers into categories.

Anomaly Detection
Detecting fraud in financial transactions or network intrusions.

Python Implementation of Hierarchical Clustering

import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=1.0, random_state=42)

# Create a Dendrogram to visualize cluster hierarchy

plt.figure(figsize=(10, 5))
sch.dendrogram(sch.linkage(X, method='ward'))
plt.title('Dendrogram for Hierarchical Clustering')
plt.xlabel('Data Points')
plt.ylabel('Euclidean Distance')
plt.show()

# Apply Agglomerative Clustering

hc = AgglomerativeClustering(n_clusters=4, affinity='euclidean', linkage='ward')
y_hc = hc.fit_predict(X)

# Plot Clusters
plt.scatter(X[:, 0], X[:, 1], c=y_hc, cmap='viridis', marker='o', edgecolors='k', alpha=0.75)
plt.title('Hierarchical Clustering Result')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Explanation of the Code

1. Generate Data: We create a synthetic dataset with 4 clusters using `make_blobs`.
2. Create a Dendrogram: We use `scipy.cluster.hierarchy.dendrogram` to visualize the
hierarchy.
3. Apply Hierarchical Clustering: We use `AgglomerativeClustering` from `sklearn` with
`n_clusters=4`.
4. Plot the Clusters: The final clusters are visualized in a scatter plot.

How Hierarchical Clustering Works?

Introduction
Hierarchical Clustering is a bottom-up (agglomerative) or top-down (divisive) clustering
algorithm that groups similar data points into a hierarchy of clusters. It is commonly
visualized using a dendrogram, which shows the merging or splitting of clusters at different
levels.

Step-by-Step Working of Agglomerative Hierarchical Clustering

1. Assign Each Data Point as an Individual Cluster

Initially, each data point is treated as its own cluster. If there are N data points, we start
with N clusters.

2. Compute the Distance Between Clusters

Calculate the distance between every pair of clusters using methods like:
- Euclidean Distance (default)
- Manhattan Distance
- Cosine Similarity

3. Merge the Two Closest Clusters

Find the two clusters that have the smallest distance and merge them. This reduces the
number of clusters from N to N-1.

4. Repeat Until One Cluster Remains

The process continues iteratively, merging the closest clusters at each step. A dendrogram
visually represents these merging steps.
5. Use a Dendrogram to Decide the Optimal Number of Clusters
The dendrogram helps in selecting the best number of clusters by setting a threshold for
cutting the tree. The larger vertical gaps in the dendrogram suggest natural cluster
divisions.

Advantages of Hierarchical Clustering

✅Does not require specifying the number of clusters in advance.

✅Produces a hierarchy of clusters (useful for detailed analysis).

✅Can be visualized using a dendrogram.

Limitations of Hierarchical Clustering

❌Computationally expensive for large datasets (O(n² log n) complexity).

❌Sensitive to noisy data and outliers.

Python Implementation of Hierarchical Clustering

import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=1.0, random_state=42)

# Create a Dendrogram to visualize cluster hierarchy

plt.figure(figsize=(10, 5))
sch.dendrogram(sch.linkage(X, method='ward'))
plt.title('Dendrogram for Hierarchical Clustering')
plt.xlabel('Data Points')
plt.ylabel('Euclidean Distance')
plt.show()

# Apply Agglomerative Clustering

hc = AgglomerativeClustering(n_clusters=4, affinity='euclidean', linkage='ward')
y_hc = hc.fit_predict(X)

Blackbook Pdf2022-Chapter-AI in Consumer Behavior-GkikasDC-TheodoridisPK
No ratings yet
Blackbook Pdf2022-Chapter-AI in Consumer Behavior-GkikasDC-TheodoridisPK
31 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
ML unit 4
No ratings yet
ML unit 4
110 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
K Mean
No ratings yet
K Mean
7 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K Means
No ratings yet
K Means
9 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Dwdm Unit v Note
No ratings yet
Dwdm Unit v Note
19 pages
Unsupervised Learning (1)
No ratings yet
Unsupervised Learning (1)
27 pages
K MEANS
No ratings yet
K MEANS
40 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
UNIT-6 K Means Clustering
No ratings yet
UNIT-6 K Means Clustering
12 pages
Practical 5
No ratings yet
Practical 5
3 pages
K-Means_Clustering_Report
No ratings yet
K-Means_Clustering_Report
2 pages
DOC-20250407-WA0033.
No ratings yet
DOC-20250407-WA0033.
38 pages
Clustering
No ratings yet
Clustering
67 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
algo
No ratings yet
algo
59 pages
An Introduction To Different Methods of Clustering in Machine Learning
No ratings yet
An Introduction To Different Methods of Clustering in Machine Learning
8 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Pilot
No ratings yet
Pilot
3 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Understanding Clustering_ A Comprehensive Guide to
No ratings yet
Understanding Clustering_ A Comprehensive Guide to
5 pages
K, Eans
No ratings yet
K, Eans
4 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Kmean
No ratings yet
Kmean
24 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
K Mean
No ratings yet
K Mean
12 pages
K_Means_Clustering_Report
No ratings yet
K_Means_Clustering_Report
3 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Clustering
No ratings yet
Clustering
18 pages
Facebook Live Seller
No ratings yet
Facebook Live Seller
8 pages
Presentation: Operating System Concept CS-582
No ratings yet
Presentation: Operating System Concept CS-582
13 pages
10.Lab Activity
No ratings yet
10.Lab Activity
11 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
Clustering
No ratings yet
Clustering
84 pages
21csc305p Machine Learning Unit 3_updated (2)
No ratings yet
21csc305p Machine Learning Unit 3_updated (2)
147 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
k Means Clustering[1]
No ratings yet
k Means Clustering[1]
3 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DICOM Conformance Statement MicroDose L50 9.0
No ratings yet
DICOM Conformance Statement MicroDose L50 9.0
83 pages
ab114d92-907e-46cf-8d92-af75283896bc (1)
No ratings yet
ab114d92-907e-46cf-8d92-af75283896bc (1)
1 page
How To Run AJAX
No ratings yet
How To Run AJAX
3 pages
rs&GISu-5
No ratings yet
rs&GISu-5
16 pages
MODULE 3. SHS MIL - Q1 - W3 - Responsible Use of Media and Information
100% (2)
MODULE 3. SHS MIL - Q1 - W3 - Responsible Use of Media and Information
8 pages
TIF-2000A User Manual
No ratings yet
TIF-2000A User Manual
24 pages
MFDAU_brochure
No ratings yet
MFDAU_brochure
2 pages
Website Development Service Agreement
No ratings yet
Website Development Service Agreement
4 pages
Aristocrat Mav500 Firmware Update Procedure
No ratings yet
Aristocrat Mav500 Firmware Update Procedure
22 pages
DuoDVR 222k User Guide
No ratings yet
DuoDVR 222k User Guide
149 pages
sp800 53r4compliancy v2 2
No ratings yet
sp800 53r4compliancy v2 2
25 pages
Python Master Class Content 2024
No ratings yet
Python Master Class Content 2024
2 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Logcat
No ratings yet
Logcat
2 pages
1SJ18CV033 G Thippeswamy 1
No ratings yet
1SJ18CV033 G Thippeswamy 1
53 pages
Administrate Network and Hardware Peripheralsv
No ratings yet
Administrate Network and Hardware Peripheralsv
45 pages
CSS_M7-12_Assessment2_Hilary Ndeze_v3 (2)
No ratings yet
CSS_M7-12_Assessment2_Hilary Ndeze_v3 (2)
24 pages
Restricted Sequential Floating Search Applied to Object Selection 1st Edition by Arturo Olvera LÃ³pez, Francisco MartÃnez Trinidad, Ariel Carrasco Ochoa 9783540734987 instant download
No ratings yet
Restricted Sequential Floating Search Applied to Object Selection 1st Edition by Arturo Olvera LÃ³pez, Francisco MartÃnez Trinidad, Ariel Carrasco Ochoa 9783540734987 instant download
43 pages
TB#13 PDF
No ratings yet
TB#13 PDF
12 pages
Latitude E5420 Specsheet PDF
No ratings yet
Latitude E5420 Specsheet PDF
2 pages
Buy ebook CWSP Certified Wireless Security Professional Official Study Guide Coll. cheap price
100% (1)
Buy ebook CWSP Certified Wireless Security Professional Official Study Guide Coll. cheap price
55 pages
Mniproject Report.
No ratings yet
Mniproject Report.
22 pages
Computer Graphics Basics
No ratings yet
Computer Graphics Basics
67 pages
OJT Format For Documentation
No ratings yet
OJT Format For Documentation
11 pages
Power Scout
No ratings yet
Power Scout
10 pages
Dokter Rs Husada Utama 4 (5 Files Merged)
No ratings yet
Dokter Rs Husada Utama 4 (5 Files Merged)
5 pages
Certificate in Communicative English - Syllabus
No ratings yet
Certificate in Communicative English - Syllabus
15 pages
Si30 Solar Pump Inverter Manual
No ratings yet
Si30 Solar Pump Inverter Manual
33 pages
My CV
No ratings yet
My CV
2 pages

Machine Learning Chapter 3

Uploaded by

Machine Learning Chapter 3

Uploaded by

Clustering & Its Use Cases

Types of Clustering Algorithms

DBSCAN (Density-Based Spatial Clustering)

Gaussian Mixture Model (GMM)

Use Cases of Clustering

Social Network Analysis

Biological Data Analysis

How K-Means Clustering Works?

2. Initialize Centroids: Randomly select K data points as initial cluster centroids.

5. Repeat Steps 3 & 4 until centroids no longer change significantly.

Use Cases of K-Means Clustering

Biological Data Analysis

Python Implementation of K-Means Clustering

# Apply K-Means clustering

Explanation of the Code

How Does the K-Means Algorithm Work?

1. Steps in the K-Means Algorithm

Step 1: Choose the Number of Clusters (K)

Step 2: Initialize Cluster Centroids

Step 3: Assign Data Points to the Nearest Centroid

Step 4: Compute New Centroids

2. Example of K-Means Clustering

3. Choosing the Optimal K (Elbow Method)

C-Means Clustering (Fuzzy C-Means Algorithm)

1. How Does C-Means Clustering Work?

Membership Hard assignment (0 or 1) Soft assignment (values

Cluster Assignment Each point belongs to one Each point belongs to

Flexibility Rigid clustering More flexible clustering

Best for Well-separated clusters Overlapping clusters

3. Applications of Fuzzy C-Means Clustering

Types of Hierarchical Clustering

Agglomerative Clustering (Bottom-Up Approach)

Divisive Clustering (Top-Down Approach)

How Agglomerative Hierarchical Clustering Works?

3. Merge the Closest Clusters.

4. Repeat Steps 2 & 3 Until One Cluster Remains.

5. Use a Dendrogram to Decide the Optimal Number of Clusters.

Use Cases of Hierarchical Clustering

Python Implementation of Hierarchical Clustering

# Generate sample data

# Create a Dendrogram to visualize cluster hierarchy

# Apply Agglomerative Clustering

Explanation of the Code

How Hierarchical Clustering Works?

Step-by-Step Working of Agglomerative Hierarchical Clustering

1. Assign Each Data Point as an Individual Cluster

2. Compute the Distance Between Clusters

3. Merge the Two Closest Clusters

4. Repeat Until One Cluster Remains

Advantages of Hierarchical Clustering

✅Produces a hierarchy of clusters (useful for detailed analysis).

✅Can be visualized using a dendrogram.

Limitations of Hierarchical Clustering

❌Sensitive to noisy data and outliers.

Python Implementation of Hierarchical Clustering

# Generate sample data

# Create a Dendrogram to visualize cluster hierarchy

# Apply Agglomerative Clustering

You might also like