0% found this document useful (0 votes)
28 views

4 - DL (v2)

Deep learning involves neural networks with many hidden layers between the input and output layers. It has gained popularity due to breakthroughs like improved initialization methods in 2006, the availability of GPUs in 2009, and wins in image recognition competitions. Deep learning models are functions that take an input, apply transformations through the hidden layers using weights and biases, and produce an output. The network structure and parameters are determined through training to minimize a loss function that measures how far the predictions are from the correct targets.

Uploaded by

Jeffery Chia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

4 - DL (v2)

Deep learning involves neural networks with many hidden layers between the input and output layers. It has gained popularity due to breakthroughs like improved initialization methods in 2006, the availability of GPUs in 2009, and wins in image recognition competitions. Deep learning models are functions that take an input, apply transformations through the hidden layers using weights and biases, and produce an output. The network structure and parameters are determined through training to minimize a loss function that measures how far the predictions are from the correct targets.

Uploaded by

Jeffery Chia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Deep Learning

Deep learning
attracts lots of attention.
• I believe you have seen lots of exciting results
before.

Deep learning trends at Google. Source: SIGMOD/Jeff Dean


Ups and downs of Deep Learning
• 1958: Perceptron (linear model)
• 1969: Perceptron has limitation
• 1980s: Multi-layer perceptron
• Do not have significant difference from DNN today
• 1986: Backpropagation
• Usually more than 3 hidden layers is not helpful
• 1989: 1 hidden layer is “good enough”, why deep?
• 2006: RBM initialization (breakthrough)
• 2009: GPU
• 2011: Start to be popular in speech recognition
• 2012: win ILSVRC image competition
Three Steps for Deep Learning

Step 1: Step 2: Step 3: pick


define a set
Neural goodness of the best
ofNetwork
function function function

Deep Learning is so simple ……


Neural Network

  z 

  z    z 

  z 
“Neuron”
Neural Network
Different connection leads to different network
structures
Network parameter 𝜃: all the weights and biases in the “neurons”
Fully Connect Feedforward
Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function  z 

 z  
1
z
1 e z
Fully Connect Feedforward
Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connect Feedforward
Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function. 1 0.62 0 0.51
𝑓 = 𝑓 =
Input vector, output vector −1 0.83 0 0.85

Given network structure, define a function set


Fully Connect Feedforward
Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2

……
……

……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer
Deep = Many hidden layers
22 layers

http://cs231n.stanford.e
du/slides/winter1516_le 19 layers
cture8.pdf

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)


Deep = Many hidden layers
101 layers
152 layers

Special
structure

3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Matrix Operation
1 4 0.98
1 y1
-2
1
-1 -2 0.12
-1 y2
1
0

1 −2 1 1 0.98
𝜎 + =
−1 1 −1 0 0.12
4
−2
Neural Network
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……

……
xN x a1 ……
a2 y yM

𝜎 W1 x + b1
𝜎 W2 a1 + b2
𝜎 WL aL-1 + bL
Neural Network
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……

……
xN x a1 ……
a2 y yM

Using parallel computing techniques


y =𝑓 x
to speed up matrix operation

=𝜎 WL …𝜎 W2 𝜎 W1 x + b1 + b2 … + bL
Output Layer
Feature extractor replacing
feature engineering
x1 y1
……
x2

Softmax
…… y2
x

……
……

……
……

……
xK
…… yM
Input Output = Multi-class
Layer Hidden Layers Layer Classifier
Example Application

Input Output

y1
0.1 is 1
x1
x2 y2
0.7 is 2
The image
is “2”

……
……
……

x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition

x1 y1 is 1
x2
y2 is 2
Neural
Machine “2”
……

……
……
Network
x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2”

……
……

……

……

……
……
Handwriting Digit Recognition
xN …… y10 is 0
Input Output
Layer Hidden Layers Layer

You need to decide the network structure to


let a good function in your function set.
FAQ

• Q: How many layers? How many neurons for each


layer?
Trial and Error + Intuition
• Q: Can the structure be automatically determined?
• E.g. Evolutionary Artificial Neural Networks
• Q: Can we design the network structure?
Convolutional Neural Network (CNN)
Three Steps for Deep Learning

Step 1: Step 2: Step 3: pick


define a set
Neural goodness of the best
ofNetwork
function function function

Deep Learning is so simple ……


Loss for an Example
target
“1”

x1 …… y1 𝑦ො1 1

x2

Softmax
……
Given a set of y2 𝑦ො2 0
parameters
……

……
……

……
……
……
Cross
Entropy
x256 …… y10 𝑦ො10 0
10 𝑦 𝑦ො
𝐶 𝑦 , 𝑦ො = − ෍ 𝑦ෝ𝑖 𝑙𝑛𝑦𝑖
𝑖=1
Total Loss:
Total Loss 𝑁

𝐿 = ෍ 𝐶𝑛
For all training data … 𝑛=1

x1 NN y1 𝑦ො 1
𝐶1
Find a function in
x2 NN y2 𝑦ො 2
𝐶2 function set that
minimizes total loss L
x3 NN y3 𝑦ො 3
𝐶3
……
……

……
……

Find the network


xN NN yN 𝑦
ො 𝑁 parameters 𝜽∗ that
𝐶 𝑁
minimize total loss L
Three Steps for Deep Learning

Step 1: Step 2: Step 3: pick


define a set
Neural goodness of the best
ofNetwork
function function function

Deep Learning is so simple ……


Gradient Descent
𝜃
Compute 𝜕𝐿Τ𝜕𝑤1 𝜕𝐿
𝑤1 0.2 0.15
−𝜇 𝜕𝐿Τ𝜕𝑤1 𝜕𝑤1
Compute 𝜕𝐿Τ𝜕𝑤2 𝜕𝐿
𝑤2 -0.1
−𝜇 𝜕𝐿Τ𝜕𝑤2
0.05 𝛻𝐿 = 𝜕𝑤2

……

𝜕𝐿
Compute 𝜕𝐿Τ𝜕𝑏1 𝜕𝑏1
𝑏1 0.3 0.2
−𝜇 𝜕𝐿Τ𝜕𝑏1 ⋮
……

gradient
Gradient Descent
𝜃
Compute 𝜕𝐿Τ𝜕𝑤1 Compute 𝜕𝐿Τ𝜕𝑤1
𝑤1 0.2 0.15 0.09
−𝜇 𝜕𝐿Τ𝜕𝑤1 −𝜇 𝜕𝐿Τ𝜕𝑤1 ……
Compute 𝜕𝐿Τ𝜕𝑤2 Compute 𝜕𝐿Τ𝜕𝑤2
𝑤2 -0.1 0.05 0.15
−𝜇 𝜕𝐿Τ𝜕𝑤2 −𝜇 𝜕𝐿Τ𝜕𝑤2
……
……

Compute 𝜕𝐿Τ𝜕𝑏1 Compute 𝜕𝐿Τ𝜕𝑏1


𝑏1 0.3 0.2 0.10
−𝜇 𝜕𝐿Τ𝜕𝑏1 −𝜇 𝜕𝐿Τ𝜕𝑏1
……
……
Gradient Descent
This is the “learning” of machines in deep
learning ……
Even alpha go using this approach.
People image …… Actually …..

I hope you are not too disappointed :p


Backpropagation
• Backpropagation: an efficient way to compute 𝜕𝐿Τ𝜕𝑤 in
neural network

libdnn
台大周伯威
同學開發
Ref:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20b
ackprop.ecm.mp4/index.html
Concluding Remarks

Step 1: Step 2: Step 3: pick


define a set
Neural goodness of the best
ofNetwork
function function function

What are the benefits of deep architecture?


Deeper is Better?
Word Error Word Error
Layer X Size Layer X Size
Rate (%) Rate (%)
1 X 2k 24.2
2 X 2k 20.4 Not surprised, more
3 X 2k 18.4 parameters, better
4 X 2k 17.8 performance
5 X 2k 17.2 1 X 3772 22.5
7 X 2k 17.1 1 X 4634 22.6
1 X 16k 22.1
Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription
Using Context-Dependent Deep Neural Networks." Interspeech. 2011.
Universality Theorem
Any continuous function f

f : R N  RM
Can be realized by a network
with one hidden layer
Reference for the reason:
(given enough hidden http://neuralnetworksandde
neurons) eplearning.com/chap4.html

Why “Deep” neural network not “Fat” neural network?


(next lecture)
“深度學習深度學習”
• My Course: Machine learning and having it deep and
structured
• http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_2.
html
• 6 hour version: http://www.slideshare.net/tw_dsconf/ss-
62245351
• “Neural Networks and Deep Learning”
• written by Michael Nielsen
• http://neuralnetworksanddeeplearning.com/
• “Deep Learning”
• written by Yoshua Bengio, Ian J. Goodfellow and Aaron
Courville
• http://www.deeplearningbook.org
Acknowledgment
• 感謝 Victor Chen 發現投影片上的打字錯誤

You might also like