Lecture 10 Linear_Regression
Lecture 10 Linear_Regression
Linear Regression
➢ Introduction
➢ Simple Linear Regression (1 Variable)
➢ Cost Function (MSE)
➢ Gradient Descent Optimization
Regression
➢ Linear Regression is a supervised learning technique
used to model the relationship between a dependent
variable (target) and one or more independent variables
(features).
➢ The goal is to predict the value of the dependent
variable based on the values of the independent
variables.
Linear Regression
Training Set
Learning Algorithm
Feature(s) h Prediction
Linear Regression with one variable
Linear regression aims to fit a linear equation to observed
data.
The simplest form, known as simple linear regression,
models the relationship between two variables (1 Feature
and the target) by fitting a linear equation to observed data.
Linear Regression with one variable
Given Training set of
housing prices
Size in
Price (y)
feet2 (x)
2104 460
1416 232
1534 315
852 178
… …
Linear Regression with one variable
Linear Regression Housing Prices
Price
(in 1000s of dollars)
Size in
Price (y)
feet2 (x)
2104 460
1416 232
1534 315 Size (feet2)
852 178
… …
Training Set
Learning Algorithm
Size of h Estimated
house price
Linear Regression with one variable
Linear Regression with one variable
m = Number of training examples
x’s = “input” variable / feature
y’s = “output” variable / “target” variable
(x,y) => one training example
(X(i), y(i)) => ith training example
Hypothesis:
‘s: Parameters
How to choose ‘s ?
Hypothesis:
Parameters:
Cost Function:
(Squared error function or least square mean
Goal:
Simplified
Hypothesis:
Parameters:
Cost Function
(Squared error function):
Goal:
(for fixed , this is a function of x) (function of the parameter )
x
x
J(0,1) =0
(for fixed , this is a function of x) (function of the parameter )
x
x
(for fixed , this is a function of x) (function of the parameter )
0.58
(for fixed , this is a function of x) (function of the parameter )
x
(for fixed , this is a function of x) (function of the parameter )
Contour Plot
Convex Function
(for fixed , this is a function of x) (function of the parameters )
Training Set
Learning Algorithm
(Gradient descent)
Feature h Prediction
Gradient (or steepest) descent
➢ It is an optimization algorithm used to minimize
the cost function in machine learning and deep
learning.
➢ It is a crucial part of training models
Gradient Descent
Objective (Cost) Function:
● The function that you want to minimize. In machine learning,
this is typically the loss function, which measures the difference
between the model's predictions and the actual values.
Parameters:
● The variables in the model that are adjusted to minimize the cost
function. In a linear regression model, for example, these would be
the coefficients of the line.
Gradient Descent
Gradient:
● The gradient is the vector of partial derivatives of the cost
function with respect to each parameter.
Learning Rate α:
● A hyperparameter that controls how much the
parameters are adjusted with respect to the gradient
during each update.
Gradient Descent
➢ Gradient descent works by moving downward toward the pits or
valleys in the graph to find the minimum value.
➢ This is achieved by taking the derivative of the cost function.
➢ During each iteration, gradient descent step-downs the cost function
in the direction of the steepest descent.
➢ By adjusting the parameters in this direction, it seeks to reach the
minimum of the cost function and find the best-fit values for the
parameters.
➢ The size of each step is determined by parameter α known as
Learning Rate.
Gradient Descent
J(θ0,θ1)
θ1
θ0
J(θ0,θ1)
θ1
θ0
Global minimum
J(θ0,θ1)
Local
optima
θ1
θ0
at local optima
Current value of
Gradient descent
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient descent algorithm
α is learning rate
update
and
simultaneously
?
For j=0 →
To simplify, assume θ0=0
=slope
To simplify, assume θ0=0
=slope
iteration.
45
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Gradient Descent types
1-Batch Gradient Descent: Each step of gradient descent
uses all the training examples (m).
2-Stochastic Gradient Descent (SGD)
You calculate the gradient using just a random small
part of the observations instead of all of them. In some
cases, this approach can reduce computation time.
Example
X Y
1 1
2 3
4 3
3 2
5 5
Example
X Y Assume Θ0 =0 and Θ1=0, and α=0.1
1 1
2 3
Iteration 1: h(x) = 0
4 3
3 2
5 5
Example
X Y Assume Θ0 =0 and Θ1=0, and α=0.1
1 1
2 3
Iteration 1: h(x) = 0
4 3
3 2
5 5 J(0,0)= 1/(2x5) [(0-1)2+(0-3)2+(0-3)2+(0-2)2+(0-5)2 ] = 4.8
Θ0 = 0-0.1/5 [-1 -3-3-2-5] =0.28
Θ1 = 0-0.1/5 [ (-1x1)+(-3x2)+(-3x4)+(-2x3)+(-5x5)] =1
h=0.28+x
J(0.28,1)= 1/(2x5) [(1.28-1)2+(2.28-3)2+(4.28-3)2+(3.28-2)2+(5.28-5)2 ] =
0.3952=0.4
Iteration 2:
Θ0 = 0.2, Θ1 = 0.816 → J(0.2,0.816) = 0.25 (h=0.2+0.816x)
Iteration 3:
Θ0 = 0.185, Θ1 = 0.776 → (h=0.185+0.776x)