0% found this document useful (0 votes)

17 views

Unsupervised Learning Algorithm 1

1. Clustering is an unsupervised learning technique that groups unlabeled data points so that observations within each group are more similar to each other than those in other groups. 2. There are several clustering algorithms including K-means clustering, hierarchical clustering, density-based clustering (DBSCAN), and distribution-based clustering (Gaussian mixture models). 3. Dimensionality reduction techniques like principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) help project high-dimensional data into fewer dimensions while preserving important information.

Uploaded by

vamsi krishna

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Unsupervised Learning Algorithm 1

Uploaded by

vamsi krishna

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Unsupervised Learning Algorithms

1. Clustering 3 . Hig h D im e n s io n Vis ua l iz at io n Gaussian distribution falls exponentially while t -

distributions sort of inversely.

t is a task of grouping a set of points/observations (without any target variable) so that points in the same group are imensionality reduction techniques help to convert high
4. Recommender system
I D

more similar to each other than those in other groups. dimensional data to fewer dimensions preserving the
information of our feature columns. Recommender systems are designed to recommend things
Distance-Based Clustering Density-Based Clustering Distribution-Based Clustering
to the user based on many factors. hese systems predict the
T

DBSCAN (Density-based Gaussian Mixture Models a . P rinci al Co

p mp onent A nalysis ( C A)
P
most likely product the users are most likely to purchase and
K-Means Hierarchical spatial clustering application (GMM) are interested in.
Name

with noise)
The main idea of PC is to find the best value of vector u,
A

Starting from each point as Idea is to classify points as Given that data follows gaussian which is the direction of maximum variance (or maximum 4 1 Mar k et-Bas k et A nalysis
Starting with K .

random centroids, each cluster, we group either core point border point
, distribution, we identify mean and information) and along which we should rotate our existing
arket basket analysis is used to analyze the combination of
we assign points to similar points until there is or a noise point based on how
M
variance that best represents the coordinates.
only one cluster. The ideal K densely a point is surrounded shape of the clusters.
products that have been bought together.
Main Idea each of them to
form K-clusters is obtained using a by other points.
he eigenvector associated with A ssociation R ules
dendrogram T

Type of distance metric chosen for

the largest eigenvalue indicates The IF component of an association rule is known as the
K - number of μ(mean) and σ(variance)
Linkage Function calculating distances between the direction in which the data antecedent. The THEN component is known as the
clusters
Online

points. has the most variance. consequent.

Hyper-

Yes No No Yes Su pp ort :

parameters
b t- SN ( t-Distributed S toc astic N eig bor
. E h h Em bedding )
Good ability to find arbitrarily Provides more intuitive results for
Simplest No need to decide shaped clusters and clusters with making decisions.
t-S tries to create an embedding that preserves the
clustering K before
NE

Advantages
noise.
Provides a lot more elliptical neighborhood using some probabilistic methods when the
algorithm. clustering No need to decide K clusters. Con idence :f
datapoint in a higher dimensional space is projected into a
Random lower dimensional space.
Very sensitive towards the Can’t handle high dimensional It assumes normal distribution
initialization choice of linkage data of features We compute Pij for d-dimensions
proble functions DBSCAN struggles with clusters and Qij for d′-dimensions where
Fails for varying L i t:
f

of similar density. Clusters are assumed to be d>d.

size or density, It is an offline model.
elliptical
Dis-

and non-globular
advantages
shapes Need to specify the number of
Need to define K.
clusters.
p ros cons
It is the most simple and It is computationally
2 . A n o m a ly D e t e c t i o n We define qij with the same formulation as pij since every xi easy to understand and expensive
and xj would have corresponding yi and yj in d’ dimensional implement Complexity grows
nomaly is synonymous with an outlier. nomaly means something which is not a part of normal behavior
t is used to calculate large exponentially
A A

ovelty means something unique, or something that you haven't seen before(novel). space. I
N
item sets. Cold start problem.

E lli tic n elo e

p E v p I solation orest
F L ocal O utlier actor (
F LOF )
4 2 Content-based reco ender syste
ark the points as outliers which
. mm m

Main Idea
M
W e randomly make splits in the data The core idea behind LOF is to compare
are very far away from the and make trees out of it until there is the density of a point with its neighbors '
Recommends items with similar content e.g. metadata ( ,

centroid of the ellipse. only a single point in the leaf node.

density.
description topics to the items the user has liked in the
, )

past.
Does not work on non-unimodal O n an average, outliers have lower If the density of a point is less than the For a useful d’ transformation we need pij ≈ qij, we use KL-
data. depth and inliers have more depth in density of its neighbors, we flag that point ros cons
divergence that defines a loss function which measures p

It is the random trees. as an outlier.

the dissimilarity between two distributions No cold start proble t always recommends
I

specifically N o need for usage dat items related to the same

Dra wb cka s
No popularity bias can categories and never
for Biased towards axis-parallel Need to find optimal t-distributions work better than gaussian distributions
, ,

recommend items with rare recommend anything from

multivariate splits Need to tune threshold with S E because:
feature
N
other categories
Gaussians Bad performance on high dimensional Can capture user content Requires a lot of domain
dat t-distributions with a degree of freedom equal to 1 have a
Data is assumed to follow unimodal features to provide knowledge.
and multivariate gaussian.. High Time Complexity longer tail than gaussian distributions
recommendations
R e co m m e n d e r sys t e m 5 . t i m e s e r i e s a n a lys i s 5.3 Effective Forecasting methods (Exponential smoothening):
S imple Ex ponential S moot h ing
4.3 Collaborative filtering system
Time Series forecasting is method of making predictions based on historical
This system looks for patterns in user activity to produce The key idea is to not only keep some memory of the entire time
time-stamped data.

user-specific recommendations series but also to give more value to the recent data and less value
,

U ser-Based: This is a form of collaborative filtering for to the past value.

Trend : It is a linear increasing or decreasing behavior of the series over a long
recommender systems based on the similarity between period that does not repeat.

the users is calculated

Seasonality: Seasonality in time-series data refers to a pattern that occurs at a
Item-Based: This is a form of collaborative filtering for regular interval.

recommender systems based on the similarity between

the items is calculated Moving Average: The approach of taking an average of the last k data points in
our series and use it to guess the next point at t=k is Moving Average.

Model-based: This system is based on the similarity

between the users and items is calculated. Data contains
a set of users and items and ratings/reactions in the form
of a user-item interaction matrix.
5 .1 T i m e S e r i e s D e co m p o s i t i o n Double Exponential Smoothing
Matrix Factorization: Matrix factorization is a way to
generate latent features when multiplying two different The trend of the entire time series in the SES formulation is incorporated to
Additive Multiplicative forecast future values.
kinds of entities. Collaborative filtering is the application
of matrix factorization to identify the relationship
between items and user entities.
In multiplicative seasonality, we obtain a time series in which the amplitude of the
seasonal component is increasing with an increasing trend.
i1 i2 i3 i4

U1 4.5 2.0 U1 1. 2 0.8 i1 i2 i3 i4

5.2 simple methods for forecasting:
U2 3.5
=
U2 1.4 0.9
x 1.5 1. 2 1.0 0.8 N aive The forecasts are equal to the last observed data.
2.0
U3 5.0 U3 1.5 1.0
1.7 0.6 1.1 0.4
U4 3.5 4.0 1.0 U4 1. 2 0.8

rating matri x u ser matri x item matri x

P re d i c te d R atings :
L atent F eat u re : k = 4
Triple Exponential Smoothing
- - 3 - 5 1 0 2 1 1 2 3 4 5

Mean / Median The forecasts are equal to the mean /median of Triple xponential Smoothing is an extension of Double xponential Smoothing
3 - - - 1 1 3 0 0 1 0 1 2 1
4 3 4 2 1
E E
- -
= = 1 1 1 0 0
observed data. that explicitly adds support for seasonality to the univariate time series.
- 3 2 0 3 0 0 3 3 3 0 0
-
2 - - - 1 1 3 0 0
= 0 1 0 1
2 1 5 2 4
- 0 2 0 2 3
2 - - - 2 1 0 0 4 2
3 1 3

r p q r’

pros c ons

Minimal domain knowledge

Cold start proble
require
Computationally expensive
The system doesn't need
It's a bit difficult to recommend
contextual features
items to users with unique tastes. The forecasts are e q ual to the observed value at the
Serendipity Season N aive
same time from the last occurrence of same season .
5 . 4 Stat i o n a r i t y A R I M A ( A u t o R e g r e s s i v e I n t e g ra t e d M o v i n g A v e ra g e )

A R I M A i s c o m b i n a t i o n o f A R a n d M A a l o n g w i t h i n t e g ra t i o n w h i c h i s

A time series whose properties are n o t d e p e n d e n t opposite of differencing.

upon time

Therefore time series with trend and seasonality are

non stationary
Differencing can be used to remove non stationarity.
In first differences the values become the difference
between consecutive original values. Here the predictors include both the lagged values of y and lagged
Similarly second differences we find the difference errors. Here is the differenced series .

between the consecutive values of first differences. In ARIMA (p , d , q) means that :

p is the order o f AR m o d e
d is t h e d e g r e e fi r s t d i ff e r e n c i n g

5.5 ARIMA Forecasting methods: q is o r d e r o f MA m o d e l

Its different from ARMA in the aspect that ARMA re quires the time

s e r i e s t o b e s t a t i o n a r y.
AR (Autoregressive model)

In AR models, the variable of interest is forecasted using a

linear combination of past val ue of the variable S A R I M A ( S e a s o n a l A u t o r e g r e s s i v e i n t e g ra t e d m o v i n g a v e ra g e ) :

SARIMA model can model seasonal data. Its formed by adding

seasonal terms in the ARIMA model.

The above e quation shows the AR model of order p, i.e.

AR(p )
M A ( M o v i n g A v e ra g e )

S A R I M A c a n b e r e p r e s e n t e d b y,
In MA models, we use the past forecast errors for forecasting.
where m = seasonal period

upper case notations are for seasonal term

l o w e r c a s e n o t a t i o n s a r e f o r n o n - s e a s o n a l t e r m s

The seasonal par t involves terms similar to non-seasonal terms but

involves backshift of the seasonal period.

In MA models, we use the past forecast errors for

forecasting.
A R I M AX ( A R I M A + E x o g e n o u s v a r i a b l e )

A R M A ( A u t o R e g r e s s i v e M o v i n g A v e ra g e )
Exogenous variables are variables whose cause is external to the

It is used to describe stationar y ti me series in terms of AR

model and whose role is to explain other variables or outcomes in

the model
and MA

In ARMA (p, q) , p is the order of AR and q is the order of

It includes lagged values as well as the lagged errors.

Here

x is an exogenous variable used along with lagging errors and

lagging values

T h e r e i s a l s o S A R I M AX ( S A R I M A + E x o g e n o u s v a r i a b l e

Density Based Silhouette Diagnostics For
No ratings yet
Density Based Silhouette Diagnostics For
14 pages
BIOL 222 Lab
No ratings yet
BIOL 222 Lab
6 pages
NWC TWMS Handbok Webb
No ratings yet
NWC TWMS Handbok Webb
141 pages
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
No ratings yet
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
2 pages
SamakkolArudhamSomeCommentsBW PDF
No ratings yet
SamakkolArudhamSomeCommentsBW PDF
6 pages
DSE 2020-21 2nd Sem DL Problem Solving 2.0
No ratings yet
DSE 2020-21 2nd Sem DL Problem Solving 2.0
9 pages
DWDM
No ratings yet
DWDM
15 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
big data techniques of 2025
No ratings yet
big data techniques of 2025
31 pages
Chapter Non-Parametric Methods
No ratings yet
Chapter Non-Parametric Methods
9 pages
Differentiable Deep Clustering with Cluster Size Constraints
No ratings yet
Differentiable Deep Clustering with Cluster Size Constraints
8 pages
Pam Clustering Technique
No ratings yet
Pam Clustering Technique
10 pages
Non Parametric Techniques
No ratings yet
Non Parametric Techniques
2 pages
PRIA153
No ratings yet
PRIA153
13 pages
M6
No ratings yet
M6
23 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
Unit 4
No ratings yet
Unit 4
26 pages
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
No ratings yet
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
1 page
Cheat Sheet ML 25082023
No ratings yet
Cheat Sheet ML 25082023
2 pages
ML Algorithms
No ratings yet
ML Algorithms
4 pages
Implicit Quantile Networks For Distributional Reinforcement Learning, Will Dabney Et Al., 2018, v1
No ratings yet
Implicit Quantile Networks For Distributional Reinforcement Learning, Will Dabney Et Al., 2018, v1
14 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
Chp-10 (Topic Not in Book) Types of Data in Cluster Analysis.
No ratings yet
Chp-10 (Topic Not in Book) Types of Data in Cluster Analysis.
13 pages
1702.07463v7
No ratings yet
1702.07463v7
10 pages
I. Automatic Screening System - A Review
No ratings yet
I. Automatic Screening System - A Review
6 pages
UNIT-6
No ratings yet
UNIT-6
102 pages
output-174.png
No ratings yet
output-174.png
1 page
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Iterative Figure-Ground Discrimination
No ratings yet
Iterative Figure-Ground Discrimination
4 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Machine Learning Techniques_ Overview of Decision Trees, Logistic Regression, SVM, and k-NN
No ratings yet
Machine Learning Techniques_ Overview of Decision Trees, Logistic Regression, SVM, and k-NN
1 page
Marketing Analytics Week-8 LAQ
No ratings yet
Marketing Analytics Week-8 LAQ
4 pages
4.5-Cluster Analysis
No ratings yet
4.5-Cluster Analysis
17 pages
Chapitre_4
No ratings yet
Chapitre_4
19 pages
2.3. Clustering - Scikit-Learn 1
No ratings yet
2.3. Clustering - Scikit-Learn 1
24 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Parsons PDF
No ratings yet
Parsons PDF
16 pages
4 Clustering
No ratings yet
4 Clustering
21 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
Deconstructing Generative Adversarial Networks
No ratings yet
Deconstructing Generative Adversarial Networks
25 pages
Recursive Hierarchical Clustering Algorithm
No ratings yet
Recursive Hierarchical Clustering Algorithm
7 pages
A_new_hierarchical_clustering_algorithm (1)
No ratings yet
A_new_hierarchical_clustering_algorithm (1)
5 pages
Clustering High Dimensional Data
No ratings yet
Clustering High Dimensional Data
15 pages
15_GMC
No ratings yet
15_GMC
4 pages
Acs.analchem.0c02178
No ratings yet
Acs.analchem.0c02178
9 pages
6 - A - Robust - Dynamic - Niching - Genetic - Algorithm - With - Niche - Migration - For - Automatic - Clustering - Problem 2010
No ratings yet
6 - A - Robust - Dynamic - Niching - Genetic - Algorithm - With - Niche - Migration - For - Automatic - Clustering - Problem 2010
15 pages
Assignment 0.2
No ratings yet
Assignment 0.2
8 pages
Course - Data Science Foundations - Data Mining
No ratings yet
Course - Data Science Foundations - Data Mining
3 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
2020-Auto-Tuning Spectral Clustering For Speaker Diarization Using Normalized Maximum Eigengap
No ratings yet
2020-Auto-Tuning Spectral Clustering For Speaker Diarization Using Normalized Maximum Eigengap
5 pages
UNIT IV Non Parametric Methods
No ratings yet
UNIT IV Non Parametric Methods
37 pages
Data Mining P
No ratings yet
Data Mining P
23 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
Unit 3 Bioinformatics
No ratings yet
Unit 3 Bioinformatics
11 pages
A Study On Weather Forecast Using Data Streams
No ratings yet
A Study On Weather Forecast Using Data Streams
11 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Introduction to Machine Learning (1)
No ratings yet
Introduction to Machine Learning (1)
89 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Machine Learning: B.Tech (CSBS) V Semester
No ratings yet
Machine Learning: B.Tech (CSBS) V Semester
17 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
2020 - Cementing The European Green Deal
No ratings yet
2020 - Cementing The European Green Deal
38 pages
DCP-8070D-8080-8085 - MFC-8370D-8380-8480-8880-8890 Parts
No ratings yet
DCP-8070D-8080-8085 - MFC-8370D-8380-8480-8880-8890 Parts
39 pages
Theory of Structures GD
No ratings yet
Theory of Structures GD
99 pages
Lecture 1 Introduction To ACS101
No ratings yet
Lecture 1 Introduction To ACS101
16 pages
MNSC 3RD CHAPTERimportant Managerial Issues
No ratings yet
MNSC 3RD CHAPTERimportant Managerial Issues
4 pages
Nicolae Sfetcu-Models of Emotional Intelligence
No ratings yet
Nicolae Sfetcu-Models of Emotional Intelligence
13 pages
Master_Thesis_-_Arikta_Saha-19MS086
No ratings yet
Master_Thesis_-_Arikta_Saha-19MS086
64 pages
Chapter I Pr1 11 Humss J Group 1
No ratings yet
Chapter I Pr1 11 Humss J Group 1
6 pages
Get (Ebook) The University as a Critical Institution? by Rosemary Deem,Heather Eggins (eds.) ISBN 9789463511162, 9463511164 free all chapters
100% (9)
Get (Ebook) The University as a Critical Institution? by Rosemary Deem,Heather Eggins (eds.) ISBN 9789463511162, 9463511164 free all chapters
65 pages
Product and Technical Data - FUEL DOCTOR Description
No ratings yet
Product and Technical Data - FUEL DOCTOR Description
2 pages
M e As Midas
No ratings yet
M e As Midas
104 pages
CLASS 12 PROJECT FILE ON Moving Coil Galvanometer PHYSICS With An Introduction of Student
No ratings yet
CLASS 12 PROJECT FILE ON Moving Coil Galvanometer PHYSICS With An Introduction of Student
16 pages
Ammonia Emissions Reduction Position Paper v4
No ratings yet
Ammonia Emissions Reduction Position Paper v4
32 pages
RSC 2001 Marangoni Schramm PDF
No ratings yet
RSC 2001 Marangoni Schramm PDF
46 pages
X-RAY 8000 NXT: User Manual
100% (1)
X-RAY 8000 NXT: User Manual
107 pages
AI Assignment-2 UNIT-III&IV
No ratings yet
AI Assignment-2 UNIT-III&IV
2 pages
DSA Practical File - MCA
No ratings yet
DSA Practical File - MCA
47 pages
O Level Mathematics Paper 2 Topical 2021
No ratings yet
O Level Mathematics Paper 2 Topical 2021
102 pages
Communication For Work Purposes
No ratings yet
Communication For Work Purposes
35 pages
Evan Filbert - UPN Veteran Jakarta - Full Makalah
No ratings yet
Evan Filbert - UPN Veteran Jakarta - Full Makalah
11 pages
Chinhoyi University of Technology
No ratings yet
Chinhoyi University of Technology
15 pages
Time For Action Science Education For An Alternative Future
No ratings yet
Time For Action Science Education For An Alternative Future
26 pages
Product Data Sheet 12 High Solids REVISED 10 2022
No ratings yet
Product Data Sheet 12 High Solids REVISED 10 2022
2 pages
8th Grade Lost All
No ratings yet
8th Grade Lost All
7 pages
Phychem2 Simple Mixtures
No ratings yet
Phychem2 Simple Mixtures
65 pages
FEA-Secant Pile Wall Installation
100% (1)
FEA-Secant Pile Wall Installation
12 pages