0% found this document useful (0 votes)

0 views

Lecture 10 Linear_Regression

The document provides an overview of supervised machine learning, specifically focusing on linear regression and the gradient descent optimization algorithm. It explains the concepts of simple linear regression, cost functions, and the mechanics of gradient descent, including its types and how to determine the best learning rates. Additionally, it discusses the iterative process of minimizing the cost function to achieve optimal model parameters.

Uploaded by

Alaaeee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Lecture 10 Linear_Regression

Uploaded by

Alaaeee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Supervised Machine Learning

Linear Regression
➢ Introduction
➢ Simple Linear Regression (1 Variable)
➢ Cost Function (MSE)
➢ Gradient Descent Optimization
Regression
➢ Linear Regression is a supervised learning technique
used to model the relationship between a dependent
variable (target) and one or more independent variables
(features).
➢ The goal is to predict the value of the dependent
variable based on the values of the independent
variables.
Linear Regression

Training Set

Learning Algorithm

Feature(s) h Prediction
Linear Regression with one variable
Linear regression aims to fit a linear equation to observed
data.
The simplest form, known as simple linear regression,
models the relationship between two variables (1 Feature
and the target) by fitting a linear equation to observed data.
Linear Regression with one variable
Given Training set of
housing prices

Size in
Price (y)
feet2 (x)

2104 460
1416 232
1534 315
852 178
… …
Linear Regression with one variable
Linear Regression Housing Prices
Price
(in 1000s of dollars)

Size in
Price (y)
feet2 (x)

2104 460
1416 232
1534 315 Size (feet2)
852 178
… …
Training Set

Learning Algorithm

Size of h Estimated
house price
Linear Regression with one variable
Linear Regression with one variable
m = Number of training examples
x’s = “input” variable / feature
y’s = “output” variable / “target” variable
(x,y) => one training example
(X(i), y(i)) => ith training example

Hypothesis:
‘s: Parameters
How to choose ‘s ?
Hypothesis:

Parameters:

Cost Function:
(Squared error function or least square mean

Goal:
Simplified
Hypothesis:

Parameters:

Cost Function
(Squared error function):

Goal:
(for fixed , this is a function of x) (function of the parameter )

x
x

J(0,1) =0
(for fixed , this is a function of x) (function of the parameter )

x
x
(for fixed , this is a function of x) (function of the parameter )

0.58
(for fixed , this is a function of x) (function of the parameter )

x
(for fixed , this is a function of x) (function of the parameter )
Contour Plot

Convex Function
(for fixed , this is a function of x) (function of the parameters )
Training Set

Learning Algorithm
(Gradient descent)

Feature h Prediction
Gradient (or steepest) descent
➢ It is an optimization algorithm used to minimize
the cost function in machine learning and deep
learning.
➢ It is a crucial part of training models
Gradient Descent
Objective (Cost) Function:
● The function that you want to minimize. In machine learning,
this is typically the loss function, which measures the difference
between the model's predictions and the actual values.
Parameters:

● The variables in the model that are adjusted to minimize the cost
function. In a linear regression model, for example, these would be
the coefficients of the line.
Gradient Descent
Gradient:
● The gradient is the vector of partial derivatives of the cost
function with respect to each parameter.

Learning Rate α:
● A hyperparameter that controls how much the
parameters are adjusted with respect to the gradient
during each update.
Gradient Descent
➢ Gradient descent works by moving downward toward the pits or
valleys in the graph to find the minimum value.
➢ This is achieved by taking the derivative of the cost function.
➢ During each iteration, gradient descent step-downs the cost function
in the direction of the steepest descent.
➢ By adjusting the parameters in this direction, it seeks to reach the
minimum of the cost function and find the best-fit values for the
parameters.
➢ The size of each step is determined by parameter α known as
Learning Rate.
Gradient Descent
J(θ0,θ1)

θ1
θ0
J(θ0,θ1)

θ1
θ0
Global minimum
J(θ0,θ1)

Local
optima
θ1
θ0
at local optima

Current value of
Gradient descent
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient descent algorithm

α is learning rate

Correct: Simultaneous update Incorrect:

Linear regression with one variable
Linear Regression Model Gradient descent algorithm

update
and
simultaneously
?

For j=0 →
To simplify, assume θ0=0

=slope
To simplify, assume θ0=0

=slope

If slope is +ve : θj = θj – (+ve value). Hence the value of θj decreases

● If slope is -ve : θj = θj – (-ve value). Hence the value of θj
increases.
Learning Rate

i.e If α is too small,

overshoot and fail to converge
gradient descent is slow.
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.

As we approach a local minimum, gradient descent

will automatically take smaller steps. So, no need to
decrease α over time.
How does gradient descent converge with a fixed step size 𝛼 ?
The intuition behind the convergence is that the derivative of cost function
approaches 0 as we approach the bottom of our convex function. At the
minimum, the derivative will always be 0 and thus we get:
How to find the best learning rates
➢ There is no formula to find
the right learning rate. You
have try several values
before you find the right one.
This is called
hyperparameter tuning
How to find the best learning rates
➢ One strategy is to run gradient descent algorithm
with different values of learning rate and for each
value plot number of iterations versus cost
function
➢ for example: Try …0.001,0.003,0.01,0.03,0.1,0.3,1,..
➢ and each time plot learning curve (number of
iteration vs cost function) and choose the best value
The number of Iterations
➢ The cost function will decrease after each iteration if the
gradient descent is working optimally.
➢ Gradient descent converges when it fails to reduce the cost
function, and stays at the same level.
➢ The number of iterations required for gradient descent to
converge varies considerably. Sometimes it takes fifty
iterations, and other times it can be as many as two or three
million.
➢ It is difficult to estimate the number of iterations in
advance.
Making sure gradient descent is working correctly.

Example automatic convergence test:

Declare convergence if decreases by less than in one

iteration.

45
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Gradient Descent types
1-Batch Gradient Descent: Each step of gradient descent
uses all the training examples (m).
2-Stochastic Gradient Descent (SGD)
You calculate the gradient using just a random small
part of the observations instead of all of them. In some
cases, this approach can reduce computation time.
Example

X Y
1 1
2 3
4 3
3 2
5 5
Example
X Y Assume Θ0 =0 and Θ1=0, and α=0.1

1 1
2 3
Iteration 1: h(x) = 0
4 3
3 2
5 5
Example
X Y Assume Θ0 =0 and Θ1=0, and α=0.1

1 1
2 3
Iteration 1: h(x) = 0
4 3
3 2
5 5 J(0,0)= 1/(2x5) [(0-1)2+(0-3)2+(0-3)2+(0-2)2+(0-5)2 ] = 4.8
Θ0 = 0-0.1/5 [-1 -3-3-2-5] =0.28
Θ1 = 0-0.1/5 [ (-1x1)+(-3x2)+(-3x4)+(-2x3)+(-5x5)] =1
h=0.28+x
J(0.28,1)= 1/(2x5) [(1.28-1)2+(2.28-3)2+(4.28-3)2+(3.28-2)2+(5.28-5)2 ] =
0.3952=0.4
Iteration 2:
Θ0 = 0.2, Θ1 = 0.816 → J(0.2,0.816) = 0.25 (h=0.2+0.816x)
Iteration 3:
Θ0 = 0.185, Θ1 = 0.776 → (h=0.185+0.776x)

Quadratic Equation Short Notes
No ratings yet
Quadratic Equation Short Notes
3 pages
Linear Programming Simplex
0% (1)
Linear Programming Simplex
37 pages
Linear Algebra and Optimization: BITS Pilani
No ratings yet
Linear Algebra and Optimization: BITS Pilani
11 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Gradient_Descent_(1)
No ratings yet
Gradient_Descent_(1)
8 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
LInear
No ratings yet
LInear
14 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
4. Gradient Descent
No ratings yet
4. Gradient Descent
15 pages
Week 04
No ratings yet
Week 04
101 pages
Deep Learning (Part 8) - Coursesteach
No ratings yet
Deep Learning (Part 8) - Coursesteach
16 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gdesc LMS
No ratings yet
Gdesc LMS
7 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Regression
No ratings yet
Regression
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229
No ratings yet
CS229
69 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Introduction-to-Gradient-Descent (2)
No ratings yet
Introduction-to-Gradient-Descent (2)
8 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
AI33
No ratings yet
AI33
6 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
EC539 Lecture 10
No ratings yet
EC539 Lecture 10
22 pages
EC539 Lecture 13
No ratings yet
EC539 Lecture 13
10 pages
EC539 Lecture 11
No ratings yet
EC539 Lecture 11
14 pages
EC539 Lecture 12
No ratings yet
EC539 Lecture 12
10 pages
Lecture 1-Introduction
No ratings yet
Lecture 1-Introduction
21 pages
Angle Modulation-2
No ratings yet
Angle Modulation-2
46 pages
ICBiasing
No ratings yet
ICBiasing
1 page
Hilbert Transform
No ratings yet
Hilbert Transform
1 page
Notes Key Topic 1.4 Polynomial Functions - Rates of Change
No ratings yet
Notes Key Topic 1.4 Polynomial Functions - Rates of Change
3 pages
Unit Ii ML MCQ
No ratings yet
Unit Ii ML MCQ
9 pages
Grid Search in Matlab
No ratings yet
Grid Search in Matlab
15 pages
2122sem1 Dsa2102
No ratings yet
2122sem1 Dsa2102
3 pages
TD Spanningtree
No ratings yet
TD Spanningtree
4 pages
3.1. Numerical Integration and Differentiation
No ratings yet
3.1. Numerical Integration and Differentiation
27 pages
SCM Assignment 2 Due On 7 Sept. 2022
No ratings yet
SCM Assignment 2 Due On 7 Sept. 2022
3 pages
ASSIGNMENT: Numerical Analysis Submitted To: Miss Sidra Ayub Submitted by
No ratings yet
ASSIGNMENT: Numerical Analysis Submitted To: Miss Sidra Ayub Submitted by
7 pages
Question Bank of ADA
No ratings yet
Question Bank of ADA
14 pages
CH-7, MATH-5 - LECTURE - NOTE - Summer - 20-21
No ratings yet
CH-7, MATH-5 - LECTURE - NOTE - Summer - 20-21
10 pages
Routh Herwitz Stability Criterion
No ratings yet
Routh Herwitz Stability Criterion
29 pages
RBF
No ratings yet
RBF
45 pages
L2
No ratings yet
L2
20 pages
B-Spline Interpolation: Charles Frye Introduction To Splines
No ratings yet
B-Spline Interpolation: Charles Frye Introduction To Splines
10 pages
Matrix Computations Fourth Edition Gene H. Golub pdf download
100% (1)
Matrix Computations Fourth Edition Gene H. Golub pdf download
66 pages
03 CDK2FAB3 Kecerdasan Buatan - Heuristic Search (Additional)
No ratings yet
03 CDK2FAB3 Kecerdasan Buatan - Heuristic Search (Additional)
61 pages
Jaipur National University: MCA - III Semester Lab Assignment
No ratings yet
Jaipur National University: MCA - III Semester Lab Assignment
17 pages
Module 1 Lesson 1.2. Graphical Method
No ratings yet
Module 1 Lesson 1.2. Graphical Method
9 pages
Revision Test - 1 (MATHS) ON 28-12-17
No ratings yet
Revision Test - 1 (MATHS) ON 28-12-17
2 pages
Week 3
No ratings yet
Week 3
3 pages
Balanced K-Means Revisited-1
No ratings yet
Balanced K-Means Revisited-1
3 pages
Solution:: Quiz 3
No ratings yet
Solution:: Quiz 3
10 pages
Artificial Intelligence Fundamentals Midterm Q1
No ratings yet
Artificial Intelligence Fundamentals Midterm Q1
4 pages
Lecture 3: Sorting: Set Interface (L03-L08)
No ratings yet
Lecture 3: Sorting: Set Interface (L03-L08)
6 pages
Lecture 9 Si416
No ratings yet
Lecture 9 Si416
14 pages
AOA 2022 Solution
No ratings yet
AOA 2022 Solution
24 pages
Eedy Algorithms
No ratings yet
Eedy Algorithms
63 pages

Lecture 10 Linear_Regression

Uploaded by

Lecture 10 Linear_Regression

Uploaded by

Supervised Machine Learning

Correct: Simultaneous update Incorrect:

If slope is +ve : θj = θj – (+ve value). Hence the value of θj decreases

i.e If α is too small,

As we approach a local minimum, gradient descent

Example automatic convergence test:

You might also like