0% found this document useful (0 votes)

16 views66 pages

Lecture 10 Slides - After

The document provides an overview of convolutional neural networks (CNNs) and their applications, particularly in image processing and autoencoders. It discusses the optimization techniques used in training neural networks, including various gradient descent methods and their impact on convergence. Additionally, it highlights the importance of CNNs in efficiently modeling spatial relationships in images without losing structural information.

Uploaded by

baptiste.ferrer10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views66 pages

Lecture 10 Slides - After

Uploaded by

baptiste.ferrer10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

u

Convulotional neural net

Data statistics
Outline

▪ Brief recap of neural networks

• Application: autoencoders
▪ Convolutional neural networks
3
Introduction Linear regression Logistic regression

Feature engineering Data statistics Naive Bayes

KNN Clustering Dimensionality reduction

Neural networks Convolutional neural Decision-trees

networks
Last time

“3-layer Neural Net”, or

“2-hidden-layer Neural Net”

Additional resource: https://www.deeplearningbook.org/

Example application of Neural Networks
Autoencoder
Recall - dimensionality reduction
PCA
Autoencoder
Autoencoder vs. PCA

Top: Some examples of the original MNIST

test samples

Middle: Reconstructed output from an auto-

encoder with a latent space of 8 dimensions
This auto-encoder uses convolutional layers, and
was trained on the MNIST training set

Bottom: Reconstructed output from PCA with

8 latent dimensions

Image credit: F. Fleuret, Deep Learning (EPFL)

Autoencoder
Autoencoder vs. PCA
Training for NN
Goal of optimization in ML:
N (i) (i) (i)
Minimize cost over batch: ∑ L =L(y , ŷ ) of the i
i=1
-th training example of batch

Want the optimization to:

• Converge quickly
• Find a good local minima (or even global minima)

Gradient descent (and variants) is the preferred way to

optimize neural networks

Choice of optimizer and hyper-parameters affect speed of

convergence and kind of local minima found
A. Amini et al. Spatial Uncertainty Sampling for
End-to-End Control, 2019
Gradient descent variants
(Vanilla / Batch) Gradient descent (GD):

1 N
▪ J= N
∑i=1 L (i)

▪ Weights updated after calculating the gradient over the entire dataset
• slow
• requires large memory

Stochastic gradient descent (SGD):

▪ J = L (i)

▪ Weights updated after calculating the gradient of a single example

• requires much less memory than GD
• high variance in parameter updates

Mini-batch Gradient descent

1 Nb
▪ J= Nb
∑i=1 L (i)

▪ Weights updated after calculating the gradient over the entire dataset
• Faster than SGD
• Reduces variance of gradient estimation
Optimization
Learning rate

Image credit: Jeremy Jordan (https://www.jeremyjordan.me/nn-learning-rate/)

Optimization
Optimizers

Variants of gradient descent are

commonly used in practice to speed-up
and improve convergence:

▪ Momentum update
▪ Nesterov Accelerated Gradient (NAG)
▪ Adam
▪ and more…
Convolutional Neural Networks
Real-World Problem
Detecting and Classifying Pavement Distress

! ! ! !
! !

Why? On-time preventive maintenance

Lack of on-time maintenance

• x3 the maintenance cost
• traffic delay
• more fuel consumption
• accidents
• …
Automatic pavement distress monitoring

!
Pavement
distress
Convolutional Neural Networks (CNN)
Intro - Handling images with fully-connected NN

3072x1 input
3x32x32 image

Flatten

height (32)

By flattening, spatial structure gets lost!

width (32)

depth (3)
→ for the 3 color channels: R, G, B
Convolutional Neural Networks
Intro - Handling images with fully-connected NN

A fully-connected neural net:

▪ Requires flattening the image
→ spatial structure gets lost Flatten
▪ Doesn’t scale well to large images
• e.g. 1024x1024x3 image results in
3’145’728 weights for each neuron of first
hidden layer

How to efficiently model correlation between neighboring pixels?

=> Convolutional Neural Networks
Convolution definition
Convolution of a signal with a filter
Convolution - what to do at boundaries
Convolution to get features
Filters can approximate what happens in neighborhood with a few numbers:

give average value of signal in a neighborhood

sharpen the signal

blur the signal

approximate derivative of a signal in a neighborhood:

approximate second derivative of a signal in a neighborhood

Convolution with a filter is a linear function
Convolution extensions
Example of image convolution

-1 -1 -1

-1 8 -1

-1 -1 -1

https://muthu.co/basics-of-image-convolution/
Convolutional
2D Convolution computation example
To show how the convolution operation is computed, let’s use a simpler example:
5x5 input, 3x3 filter

Input (5x5) Filter (3x3)

1 0 3 0 2 1 -1 0

0 3 4 0 2 0 2 -3

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 1x1 + 0x(-1) + 3x0 + 0x0 + 3x2 +

4x(-2) + 1x(-1) + 0x0 + 2x2 + 0
=2
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 0x1 + 3x(-1) + 0x0 + 3x0 + 4x2 +

0x(-2) + 0x(-1) + 2x0 + 0x2 + 0 b
ye
↓

=5
I
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 3x1 + 0x(-1) + 2x0 + 4x0 + 0x2 +

2x(-2) + 2x(-1) + 0x0 + 1x2 + 0
= -1
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 0x1 + 3x(-1) + 4x0 + 1x0 + 0x2 +

2x(-2) + 8x(-1) + 12x0 + 0x2 + 0
= -15
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 3x1 + 4x(-1) + 0x0 + 0x0 + 2x2 +

0x(-2) + 12x(-1) + 0x0 + 1x2 + 0
= -7
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 4x1 + 0x(-1) + 2x0 + 2x0 + 0x2 +

1x(-2) + 0x(-1) + 1x0 + 0x2 + 0
=2
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2 31

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 1x1 + 0x(-1) + 2x0 + 8x0 + 12x2

+ 0x(-2) + 0x(-1) + 6x0 + 3x2 + 0
= 31
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2 31 -8

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 0x1 + 2x(-1) + 0x0 + 12x0 + 0x2 +

1x(-2) + 6x(-1) + 3x0 + 2x2 + 0
= -8
Convolutional Neural Networks
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2 31 -8 1

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 2x1 + 0x(-1) + 1x0 + 0x0 + 1x2 +

0x(-2) + 3x(-1) + 2x0 + 0x2 + 0
=1
Convolution example

Example from ML4Engineers book.

Convolutional Neural Networks (CNN)
Convolution Layer

3x32x32 image

In PyTorch, images are represented as

(CxHxW)
• C: number of channels (depth)
• H: height
• W: width
height (32)
A pixel can be represented by a vector of 3 color (R, G, B)
intensities
width (32) I(c, h, w)

depth (3)
Convolutional Neural Networks
Convolution Layer
3x32x32 image Filters always extend the full
depth of the input volume

3x5x5 filter

Convolve the filter with the image

32 i.e. “slide over the image spatially,
computing dot products”

3 Note:
Filters are sometimes referred to as kernels
Convolutional Neural Networks
Convolution Layer
3x32x32 image

3x5x5 filter (w)

1 number:
32 ▪ The result of taking a dot product between the
filter and a small 3x5x5 chunk of the image
3 • (i.e. 5x5x3=75-dimensional dot product + bias)
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map