Unsupervised Learning Algorithm 1
Unsupervised Learning Algorithm 1
more similar to each other than those in other groups. dimensional data to fewer dimensions preserving the
information of our feature columns. Recommender systems are designed to recommend things
Distance-Based Clustering Density-Based Clustering Distribution-Based Clustering
to the user based on many factors. hese systems predict the
T
with noise)
The main idea of PC is to find the best value of vector u,
A
Starting from each point as Idea is to classify points as Given that data follows gaussian which is the direction of maximum variance (or maximum 4 1 Mar k et-Bas k et A nalysis
Starting with K .
random centroids, each cluster, we group either core point border point
, distribution, we identify mean and information) and along which we should rotate our existing
arket basket analysis is used to analyze the combination of
we assign points to similar points until there is or a noise point based on how
M
variance that best represents the coordinates.
only one cluster. The ideal K densely a point is surrounded shape of the clusters.
products that have been bought together.
Main Idea each of them to
form K-clusters is obtained using a by other points.
he eigenvector associated with A ssociation R ules
dendrogram T
Advantages
noise.
Provides a lot more elliptical neighborhood using some probabilistic methods when the
algorithm. clustering No need to decide K clusters. Con idence :f
datapoint in a higher dimensional space is projected into a
Random lower dimensional space.
Very sensitive towards the Can’t handle high dimensional It assumes normal distribution
initialization choice of linkage data of features We compute Pij for d-dimensions
proble functions DBSCAN struggles with clusters and Qij for d′-dimensions where
Fails for varying L i t:
f
and non-globular
advantages
shapes Need to specify the number of
Need to define K.
clusters.
p ros cons
It is the most simple and It is computationally
2 . A n o m a ly D e t e c t i o n We define qij with the same formulation as pij since every xi easy to understand and expensive
and xj would have corresponding yi and yj in d’ dimensional implement Complexity grows
nomaly is synonymous with an outlier. nomaly means something which is not a part of normal behavior
t is used to calculate large exponentially
A A
ovelty means something unique, or something that you haven't seen before(novel). space. I
N
item sets. Cold start problem.
Main Idea
M
W e randomly make splits in the data The core idea behind LOF is to compare
are very far away from the and make trees out of it until there is the density of a point with its neighbors '
Recommends items with similar content e.g. metadata ( ,
past.
Does not work on non-unimodal O n an average, outliers have lower If the density of a point is less than the For a useful d’ transformation we need pij ≈ qij, we use KL-
data. depth and inliers have more depth in density of its neighbors, we flag that point ros cons
divergence that defines a loss function which measures p
user-specific recommendations series but also to give more value to the recent data and less value
,
P re d i c te d R atings :
L atent F eat u re : k = 4
Triple Exponential Smoothing
- - 3 - 5 1 0 2 1 1 2 3 4 5
Mean / Median The forecasts are equal to the mean /median of Triple xponential Smoothing is an extension of Double xponential Smoothing
3 - - - 1 1 3 0 0 1 0 1 2 1
4 3 4 2 1
E E
- -
= = 1 1 1 0 0
observed data. that explicitly adds support for seasonality to the univariate time series.
- 3 2 0 3 0 0 3 3 3 0 0
-
2 - - - 1 1 3 0 0
= 0 1 0 1
2 1 5 2 4
- 0 2 0 2 3
2 - - - 2 1 0 0 4 2
3 1 3
r p q r’
pros c ons
A R I M A i s c o m b i n a t i o n o f A R a n d M A a l o n g w i t h i n t e g ra t i o n w h i c h i s
upon time
Its different from ARMA in the aspect that ARMA re quires the time
s e r i e s t o b e s t a t i o n a r y.
AR (Autoregressive model)
AR(p )
M A ( M o v i n g A v e ra g e )
S A R I M A c a n b e r e p r e s e n t e d b y,
In MA models, we use the past forecast errors for forecasting.
where m = seasonal period
l o w e r c a s e n o t a t i o n s a r e f o r n o n - s e a s o n a l t e r m s
forecasting.
A R I M AX ( A R I M A + E x o g e n o u s v a r i a b l e )
A R M A ( A u t o R e g r e s s i v e M o v i n g A v e ra g e )
Exogenous variables are variables whose cause is external to the
the model
and MA
MA
Here
lagging values
T h e r e i s a l s o S A R I M AX ( S A R I M A + E x o g e n o u s v a r i a b l e