0% found this document useful (0 votes)
1 views

[PR 2024] Lec2 Regression II

Uploaded by

oomaarhmed2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

[PR 2024] Lec2 Regression II

Uploaded by

oomaarhmed2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Pattern Recognition

Lecture 2: Regression II
Dr. Dina Khattab
Faculty of Computer & Information Sciences (FCIS) - Ain Shams University
[email protected]
Instructor: Dr. Dina Khattab

Email: [email protected]

Office Hours: Wednesday 7:00 PM to 9:00 PM


Agenda
• Simple regression
–Cost function
• Gradient Descent

3
(Univariate Linear Regression)

LINEAR REGRESSION
WITH ONE VARIABLE

4
5
Model Representation
Training Set
Price ($) in 1000's 500
Size in feet2 (x)
(y) 400
2104 460
300
1416 232
200
1534 315
852 178 100
… … 0
0 500 1000 1500 2000 2500 3000

Notation: Size (feet2)


m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable

6
500

400

300

200

100

0
0 500 1000 1500 2000 2500 3000

‘s: Parameters
• hypothesis function How to choose ‘s ?
7
3 3 3
h(x) = 1 + 0.5 x
2 h(x) = 1.5 2 h(x) = 0.5 x 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3

8
Interpreting the coefficients
• θ0 is line intersection with y axis

9
Interpreting the coefficients
θ1 is the slope (change in the output per unit change in the input)

10
11
Cost Function
Residual Sum of Squares (RSS)

Hypothesis:

Parameters:

Cost Function:

Idea: Choose so that


is close to for our Goal:
training examples
12
Cost Function
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
J(0.5) = 1/2m [ (0.5-1)2 + (1-2) 2 + (1.5-3)2 ] θ1 = 0.5
J(0.5) = 1/2x3 (3.5) = 0.583 J(θ1) = 0.583

13
Cost Function
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
J(1) = 1/2m [ (0)2 + (0) 2 + (0) 2 ] θ1 = 1
J(1) = 1/2x3 (0) = 0 J(θ1) = 0

14
Cost Function

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
J(0) = 1/2m [ (1)2 + (2) 2 + (3)2 ] Minimize J(θ1) → θ1 = 1
J(0) = 1/6 (14) = 2.3
15
Cost Function

(for fixed , this is a function of x)

500

400
Price ($) 300
in 1000’s
200

100

0
0 1000 2000 3000
Size in feet2 (x)

16
17
GRADIENT DESCENT

18
Gradient Descent algorithm

Have some function


Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum

19
Derivative intuition

20
Gradient Descent algorithm

α is the learning rate or step size

Correct: Simultaneous update Incorrect:

21
Learning Rate intuition

If α is too small, gradient descent


can be slow.

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

Value of α determines the speed & accuracy of the algorithm


https://developers.google.com/machine-learning/crash-course/fitter/graph 22
Good learning rate (α)
- For good (small) α , should decrease on every iteration.
- But if α is too small, gradient descent can be slow to converge.
• Summary:
- If α is too small: slow convergence.
- If α is too large: may not
decrease on every iteration; may
not converge.
- To choose α , try
…, 0.001, , 0.01, , 0.1, , 1, …

23
Gradient descent can converge to a local
minimum, even with the learning rate α fixed

As we approach a local
minimum, gradient
descent will
automatically take
smaller steps. So, no
need to decrease α over
time.
24
How to choose No. of iterations &
learning rate ?

25
Gradient descent algorithm
for linear regression
Gradient descent algorithm Linear Regression Model

26
Gradient descent algorithm
for linear regression

27
Gradient descent algorithm
for linear regression

Gradient descent algorithm

update
and
simultaneously

28
(for fixed , this is a function of x) (function of the parameters )

29
(for fixed , this is a function of x) (function of the parameters )

30
(for fixed , this is a function of x) (function of the parameters )

31
(for fixed , this is a function of x) (function of the parameters )

32
(for fixed , this is a function of x) (function of the parameters )

33
(for fixed , this is a function of x) (function of the parameters )

34
(for fixed , this is a function of x) (function of the parameters )

35
(for fixed , this is a function of x) (function of the parameters )

36
(for fixed , this is a function of x) (function of the parameters )

37
Linear regression is a Bowel-shaped
function

38
Gradient Descent algorithm

J(0,1)

1
0

39
Gradient Descent algorithm

J(0,1)

1
0

40
Credit for

• Machine Learning Specialization (2015) by Emily


Fox & Carlos Guestrin – Uni. Of Washington.

• Machine Learning, by Andrew Ng – Uni. Of


Stanford.

You might also like