ML notes
ML notes
Introduction
Machine learning is a subfield of artificial intelligence,
which is defined as the capability of a machine to simulate
intelligent human behavior and to perform complex tasks
in a manner that is similar to the way humans solve
problems.
To understand machine learning, you need to know the
algorithms that drive the opportunities of machine
learning and it’s limitations.
In general, machine learning algorithms are used in a
wide range of applications, like fraud detection,
computer vision, autonomous vehicles, predictive analytics
where it is not computationally feasible to develop
conventional algorithms that meet the requirements of
real time and predictive nature of work.
There are three basic functions of machine learning
algorithms –
Descriptive – Explaining with the help of data
Predictive – Predicting with the help of data
Prescriptive – Suggesting with the help of data
Supervised Learning
Machines are taught by example in supervised learning.
As an operator provides the machine learning algorithm
with a known dataset with desired inputs and outputs, it
must determine how to arrive at the inputs and outputs.
Unlike operators who know the correct answers to
problems, algorithms identify patterns in data, make
predictions based on observations, and learn from them.
Until the algorithm achieves a high level of
accuracy/performance, it makes predictions and is
corrected by the operator.
Under the umbrella of supervised learning fall:
Classification: Observed values are used to draw
conclusions about new observations and determine which
category they belong to in classification tasks. When a
program filters emails as ‘spam’ or ‘not spam’, it must
analyze existing observational data to determine which
emails are spam or not spam.
Regression: In regression tasks, the learning machine
must estimate and understand the relationships between
variables in a system by analyzing only one dependent
variable, as well as a number of other variables that are
constantly changing. Regression analysis is particularly
useful for forecasting and prediction.
3
The confusion matrix is a tool used to evaluate the performance of classification models. It provides
a detailed breakdown of prediction results, including:
Metrics like accuracy, precision, recall, and F1-score are derived from the confusion matrix to assess
a model’s performance.
These topics represent the foundational elements of machine learning. Understanding them equips
individuals with the knowledge to build models that address real-world challenges and contribute to
advancements in various domains.
Unsupervised Learning
It is possible to identify patterns using the machine
learning algorithm, without using an answer key or an
operator to provide instructions. Instead, the machine
analyzes available data in order to determine correlations
and relationships. It is left up to the machine learning
algorithm to interpret large data sets and address them
accordingly in an unsupervised learning environment. The
algorithm tries to organize the data in a manner that
describes the data’s structure. The data might be grouped
into clusters or arranged in a more organized manner.
As it assesses more data, its ability to make decisions on
that data gradually improves and becomes more refined.
The following fall under the unsupervised learning
category:
Clustering: A clustering technique involves grouping
similar data (based on defined criteria). It is useful for
segmenting data and finding patterns in each group.
4
Association Rule
As a rule-based machine learning method, it is useful for
discovering relationships between different features in a
large dataset by using a number of rules.
It basically finds patterns in data which might include:
Regression Algorithms
As the name implies, regression analyses are designed to
estimate the relationship between an independent variable
(features) and a dependent variable (label). A linear
regression is the method that is most widely used in
regression analysis.
Some of the most commonly used Regression Algorithms
are:
1. Linear Regression
2. Logistic Regression
7
From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregate our data points or do a classification
between red and blue circles
For instance, an algorithm can learn to predict whether a given email is spam or ham
(no spam), as show below.
9
KNN
KNN is a simple, supervised machine learning (ML) algorithm that can be
used for classification or regression tasks - and is also frequently used in
missing value imputation. It is based on the idea that the observations
closest to a given data point are the most "similar" observations in a data set,
and we can therefore classify unforeseen points based on the values of the
closest existing points. By choosing K, the user can select the number of
nearby observations to use in the algorithm.
Here, we will show you how to implement the KNN algorithm for
classification.
Example
Start by visualizing some data points:
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
classes = [0, 0, 1, 0, 0, 1, 1, 0, 1, 1]
plt.scatter(x, y, c=classes)
plt.show()