0% found this document useful (0 votes)
17 views

Lec06 Derivatives

Uploaded by

awaisqarni640
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lec06 Derivatives

Uploaded by

awaisqarni640
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Intuition about Derivatives

• Consider a function f(a) = 3a, as linear thus would be a straight line.


• For a= 2 => f(a) = 6
• For small increment a= 2.001 => f(a) = 6.003
• Implies that increase in a=0.001 => 3 times in f(a)
• Mean, Slop (Derivative) of f(a) at a=2 is: 3
• i.e., derivative = height/width = 0.003/0.001=3
• Consider another point for a =5 => f(a) = 15
• For small increment a= 5.001 => f(a) = 15.003
• Slop (Derivative= df(a)/da) of f(a) at a=5 is still: 3 for a given function.
Intuition about Derivatives
• Consider another function f(a) = a2
• For a= 2 => f(a) = 4
• For small increment a= 2.001 => f(a) = 4.004001
• Mean, Slop (Derivative) of f(a) at a=2 is: 4
• Slop is different for this fun. For different values
• For a= 5 => f(a) = 25
• For a= 5.001 => f(a) = 25.010
• Slop (Derivative= df(a)/da) of f(a) at a=5 is: 10 for this function.
• Slop is consistent for this complex function.
• Means For small increment in a =0.001, f(a) bump up( 2a )x 0.001
Conclusion about Derivatives
• Derivatives : Rate of change. It is just a slope of a line.
• If you want to look up the derivative of a function, open your calculus
textbook
• Often get a formula for the slope of these functions at different points
Computation Graph
• The computations of a neural network are organized in terms of a
forward pass or a forward propagation step
• To compute the output of the neural network,
• Followed by a backward pass or backpropagation step
• To compute gradients or compute derivatives.
• The computation graph explains why it is organized this way.
Computation Graph
• To illustrate it, let's use a simpler example
As an example, if a=5, b=3 and c=2 then

The computation graph helps


optimizing some variable as J in
• We can take these three steps and draw themthis
in case.
a computation graph
as shown graphically.
In the case of a logistic regression, J was the
cost function that we’ve to minimize.
Computation Graph
• The computation graph organizes a computation with these arrows left-to-
right to compute the value of J.

• To compute derivatives there'll be a right-to-left pass, in the opposite


direction of these blue arrows.
• That would be most natural for computing the derivatives.
Derivatives with a Computation
Graph
• Let's say we want to compute the derivative of J with respect to v.

• This says, if we were to take this Terminology


value of v of and change it a little
backpropagation: bit, how
By computing,
would the value of J change? the derivative of this final output variable J with
respect to v, we've done one step of
• Since, J is defined as 3 times v and right now, v = 11.
backpropagation.
• So if we're to bump up v by a little bit to 11.001, then J, which is 3v,
• Current J=33, will get bumped up to 33.003 on increasing v by 0.001.
• So the derivative of J with respect to v is equal to 3.
Derivatives with a Computation
Graph
• What is dJ/da?

• The increase to J is 3 times the increase to a.


• So that means this derivative is equal to 3.
Derivatives with a Computation
Graph
• One way to break this down is to say that if you change a, then that
will change v.
• Through changing v, that would change J.
• First, by changing a, you end up increasing v.
• Well, how much does v increase?
• It is increased by an amount that's determined by dv/da, and then
• The change in v will cause the value of J to also increase.
• In calculus, this is called the chain rule.
Instead of a long name, just
Or Simply:
Derivatives with a Computation
Graph
• So the key from this example, is that when computing derivatives and computing
all of these derivatives, the most efficient way to do so is through a right to left
computation following the direction of the red arrows.

• In particular, we'll first compute the derivative with respect to v. Then that
becomes useful for computing the derivative with respect to a and the derivative
with respect to u. Then the derivative with respect to u, becomes useful for
computing the derivative with respect to b and the derivative with respect to c.
Exercise: Compute
1. dJ/db?
2. dJ/dc? Hint 
Practical Example
Derivatives with a Computation
Graph
We have discussed the computation graph and how does
1. a forward or left to right calculation to compute the cost function such as J
that you might want to optimize, and
2. a backwards or a right to left calculation to compute derivatives.
These concepts will be applied to compute derivatives of the logistic
regression model.
Logistic Regression Gradient
Descent
Recap: We had set up logistic regression as follows:

• Let's write this out as a computation graph and for this example, If we
have only two features, X1 and X2.

In logistic regression, what we want to do is to


modify the parameters, W and b, in order to
reduce this loss.
Logistic Regression Gradient
Descent
• Let's talk about how you can go backwards to compute the
derivatives.

dL/db = dL/dz .dz/db= dz. dz/db= dz (0+0+1)= dz


dL/dw1 = dL/dz .dz/dw1= dz. dz/dw1= dz (x1+0+0)= x1. dz
Logistic Regression Gradient
Descent
Gradient Descent will be computed by updating w1, w2, and b for
logistic regression w.r.t a single training example:

By using the following values:


• dz = (a-y)
• dw1 = x1 (a-y)
• dw2 = x2 (a-y)
• db = (a-y)
Next we consider for m training examples
Logistic Regression Gradient
Descent on m examples

• Recap

We have discussed, How to compute


them for a Single training example as
mentioned with Superscript i.

On a Single training example i

So, compute these derivatives, as we have shown on the training examples, and
average them to give the overall gradient to implement the gradient descent.
Logistic Regression Gradient
Descent on m examples
What we're going to do is use a for loop over the training set, and
• We would start up as: compute the derivative with respect to each training example and
then add them up.
After Computations, dw1 would be
So, with all of these calculations, you've
derivative of just
costcomputed the
fn J w.r.t. w1.
derivatives of the cost function J with respect to each your
parametersFinally,
w_1,tow_2
implement
and b. one step of gradient
descent,
We're using dw_1update the and
and dw_2 learnable
db as parameters
accumulators, so that
after thisas:
computation, dw_1 is equal to the derivative of your
overall cost function with respect to w_1 and similarly for dw_2
and db.

Notice that dw_1 and dw_2 do not have a superscript i,


because we're using them in this code as accumulators to sum
• So,
over the everything
entire training set. on the slide implements just
one single step of gradient descent.
• We have to repeat everything it multiple
times in order to take multiple steps of
gradient descent.
Logistic Regression Gradient
Descent on m examples
Weakness in the previous Implementations:
• To implement logistic regression this way, We need two for loops.
• The first for loop is this for loop over the m training examples, and
• The second for loop is a for loop over all the n features.
• In the deep learning era, we would move to bigger and bigger
datasets, without using explicit for loops.
• Solution:
Vectorization techniques allow us to get rid of these explicit
for-loops.
Vectorization
• Vectorization is the art of getting rid of explicit “for” in your code.
Vectorization
• Next with More Examples

You might also like