DATA MINING ASSIGNMENT (1)
DATA MINING ASSIGNMENT (1)
Prepared by:-
1. Nuhamin Dawit……………………………….1361
3. Betelhem G/Medhn…………………………….0289
6. Habtamu Tamalew………………………………1404
Enhances Decision-Making
By providing a clear view of data insights, visualization supports data-driven
decision-making. Decision-makers can easily spot critical metrics and trends,
leading to more informed decisions.
Facilitates Communication
Visualizations provide a common language for communicating insights to both
technical and non-technical audiences. They make it easier to explain findings to
stakeholders who may not have a strong background in data mining.
Q3. What is clustering, and how does it differ from classification? Discuss the
applications of clustering in real-world scenarios.
Example:
Imagine sorting books in a library into 3 clusters based on weight and page count.
Step 1: Start with 3 random books as "centroids."
Step 2: Compare each book to the centroids and group them by similarity.
Step 3: Recalculate centroids based on group averages (e.g., average weight and page
count).
Repeat until groups stabilize.
Strengths of K-Means Clustering
Simplicity: Easy to implement and computationally efficient for small to medium-
sized datasets.
Scalability: Works well with large datasets and is faster compared to other clustering
algorithms like hierarchical clustering.
Versatility: Can handle various data types (numeric data) and is used in diverse fields
like marketing, biology, and image processing.
Limitations of K-Means Clustering
Requires Predefined k:
The user must specify the number of clusters (k) beforehand, which can be
challenging if the optimal k is unknown.
Sensitive to Initialization: Poor initialization of centroids can lead to suboptimal
clustering (different results on each run).
Assumes Spherical Clusters:Works best when clusters are roughly spherical and
equally sized. It struggles with irregularly shaped or overlapping clusters.
Sensitive to Outliers: Outliers can distort centroids and lead to poor clustering
performance.
Not Suitable for All Data Types: Works primarily with numerical data and requires
normalization if the features have different scales.
Applications of K-Means Clustering
Customer Segmentation: Grouping customers based on purchasing habits to tailor
marketing strategies.
Image Compression: Reducing image size by grouping similar pixels into clusters.
Document Clustering: Grouping similar documents or articles for easier organization
or analysis.