0% found this document useful (0 votes)

28 views

4 - DL (v2)

Deep learning involves neural networks with many hidden layers between the input and output layers. It has gained popularity due to breakthroughs like improved initialization methods in 2006, the availability of GPUs in 2009, and wins in image recognition competitions. Deep learning models are functions that take an input, apply transformations through the hidden layers using weights and biases, and produce an output. The network structure and parameters are determined through training to minimize a loss function that measures how far the predictions are from the correct targets.

Uploaded by

Jeffery Chia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

4 - DL (v2)

Uploaded by

Jeffery Chia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Deep Learning

Deep learning
attracts lots of attention.
• I believe you have seen lots of exciting results
before.

Deep learning trends at Google. Source: SIGMOD/Jeff Dean

Ups and downs of Deep Learning
• 1958: Perceptron (linear model)
• 1969: Perceptron has limitation
• 1980s: Multi-layer perceptron
• Do not have significant difference from DNN today
• 1986: Backpropagation
• Usually more than 3 hidden layers is not helpful
• 1989: 1 hidden layer is “good enough”, why deep?
• 2006: RBM initialization (breakthrough)
• 2009: GPU
• 2011: Start to be popular in speech recognition
• 2012: win ILSVRC image competition
Three Steps for Deep Learning

Step 1: Step 2: Step 3: pick

define a set
Neural goodness of the best
ofNetwork
function function function

Deep Learning is so simple ……

Neural Network

  z 

  z    z 

  z 
“Neuron”
Neural Network
Different connection leads to different network
structures
Network parameter 𝜃: all the weights and biases in the “neurons”
Fully Connect Feedforward
Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function  z 

 z  
1
z
1 e z
Fully Connect Feedforward
Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connect Feedforward
Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function. 1 0.62 0 0.51
𝑓 = 𝑓 =
Input vector, output vector −1 0.83 0 0.85

Given network structure, define a function set

Fully Connect Feedforward
Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2

……
……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer
Deep = Many hidden layers
22 layers

http://cs231n.stanford.e
du/slides/winter1516_le 19 layers
cture8.pdf

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)

Deep = Many hidden layers
101 layers
152 layers

Special
structure

3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Matrix Operation
1 4 0.98
1 y1
-2
1
-1 -2 0.12
-1 y2
1
0

1 −2 1 1 0.98
𝜎 + =
−1 1 −1 0 0.12
4
−2
Neural Network
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……
xN x a1 ……
a2 y yM

𝜎 W1 x + b1
𝜎 W2 a1 + b2
𝜎 WL aL-1 + bL
Neural Network
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL

……
……

……

……
xN x a1 ……
a2 y yM

Using parallel computing techniques

y =𝑓 x
to speed up matrix operation

=𝜎 WL …𝜎 W2 𝜎 W1 x + b1 + b2 … + bL
Output Layer
Feature extractor replacing
feature engineering
x1 y1
……
x2

Softmax
…… y2
x

……
……

……
xK
…… yM
Input Output = Multi-class
Layer Hidden Layers Layer Classifier
Example Application

Input Output

y1
0.1 is 1
x1
x2 y2
0.7 is 2
The image
is “2”

……
……
……

x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition

x1 y1 is 1
x2
y2 is 2
Neural
Machine “2”
……

……
……
Network
x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2”

……
……

……

……
……
Handwriting Digit Recognition
xN …… y10 is 0
Input Output
Layer Hidden Layers Layer

You need to decide the network structure to

let a good function in your function set.
FAQ

• Q: How many layers? How many neurons for each

layer?
Trial and Error + Intuition
• Q: Can the structure be automatically determined?
• E.g. Evolutionary Artificial Neural Networks
• Q: Can we design the network structure?
Convolutional Neural Network (CNN)
Three Steps for Deep Learning

Step 1: Step 2: Step 3: pick

define a set
Neural goodness of the best
ofNetwork
function function function

Deep Learning is so simple ……

Loss for an Example
target
“1”

x1 …… y1 𝑦ො1 1

Softmax
……
Given a set of y2 𝑦ො2 0
parameters
……

……
……

……
……
……
Cross
Entropy
x256 …… y10 𝑦ො10 0
10 𝑦 𝑦ො
𝐶 𝑦 , 𝑦ො = − ෍ 𝑦ෝ𝑖 𝑙𝑛𝑦𝑖
𝑖=1
Total Loss:
Total Loss 𝑁

𝐿 = ෍ 𝐶𝑛
For all training data … 𝑛=1

x1 NN y1 𝑦ො 1
𝐶1
Find a function in
x2 NN y2 𝑦ො 2
𝐶2 function set that
minimizes total loss L
x3 NN y3 𝑦ො 3
𝐶3
……
……

……
……

Find the network

xN NN yN 𝑦
ො 𝑁 parameters 𝜽∗ that
𝐶 𝑁
minimize total loss L
Three Steps for Deep Learning

Step 1: Step 2: Step 3: pick

define a set
Neural goodness of the best
ofNetwork
function function function

Deep Learning is so simple ……

Gradient Descent
𝜃
Compute 𝜕𝐿Τ𝜕𝑤1 𝜕𝐿
𝑤1 0.2 0.15
−𝜇 𝜕𝐿Τ𝜕𝑤1 𝜕𝑤1
Compute 𝜕𝐿Τ𝜕𝑤2 𝜕𝐿
𝑤2 -0.1
−𝜇 𝜕𝐿Τ𝜕𝑤2
0.05 𝛻𝐿 = 𝜕𝑤2
⋮
……

𝜕𝐿
Compute 𝜕𝐿Τ𝜕𝑏1 𝜕𝑏1
𝑏1 0.3 0.2
−𝜇 𝜕𝐿Τ𝜕𝑏1 ⋮
……

gradient
Gradient Descent
𝜃
Compute 𝜕𝐿Τ𝜕𝑤1 Compute 𝜕𝐿Τ𝜕𝑤1
𝑤1 0.2 0.15 0.09
−𝜇 𝜕𝐿Τ𝜕𝑤1 −𝜇 𝜕𝐿Τ𝜕𝑤1 ……
Compute 𝜕𝐿Τ𝜕𝑤2 Compute 𝜕𝐿Τ𝜕𝑤2
𝑤2 -0.1 0.05 0.15
−𝜇 𝜕𝐿Τ𝜕𝑤2 −𝜇 𝜕𝐿Τ𝜕𝑤2
……
……

Compute 𝜕𝐿Τ𝜕𝑏1 Compute 𝜕𝐿Τ𝜕𝑏1

𝑏1 0.3 0.2 0.10
−𝜇 𝜕𝐿Τ𝜕𝑏1 −𝜇 𝜕𝐿Τ𝜕𝑏1
……
……
Gradient Descent
This is the “learning” of machines in deep
learning ……
Even alpha go using this approach.
People image …… Actually …..

I hope you are not too disappointed :p

Backpropagation
• Backpropagation: an efficient way to compute 𝜕𝐿Τ𝜕𝑤 in
neural network

libdnn
台大周伯威
同學開發
Ref:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20b
ackprop.ecm.mp4/index.html
Concluding Remarks

Step 1: Step 2: Step 3: pick

define a set
Neural goodness of the best
ofNetwork
function function function

What are the benefits of deep architecture?

Deeper is Better?
Word Error Word Error
Layer X Size Layer X Size
Rate (%) Rate (%)
1 X 2k 24.2
2 X 2k 20.4 Not surprised, more
3 X 2k 18.4 parameters, better
4 X 2k 17.8 performance
5 X 2k 17.2 1 X 3772 22.5
7 X 2k 17.1 1 X 4634 22.6
1 X 16k 22.1
Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription
Using Context-Dependent Deep Neural Networks." Interspeech. 2011.
Universality Theorem
Any continuous function f

f : R N  RM
Can be realized by a network
with one hidden layer
Reference for the reason:
(given enough hidden http://neuralnetworksandde
neurons) eplearning.com/chap4.html

Why “Deep” neural network not “Fat” neural network?

(next lecture)
“深度學習深度學習”
• My Course: Machine learning and having it deep and
structured
• http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_2.
html
• 6 hour version: http://www.slideshare.net/tw_dsconf/ss-
62245351
• “Neural Networks and Deep Learning”
• written by Michael Nielsen
• http://neuralnetworksanddeeplearning.com/
• “Deep Learning”
• written by Yoshua Bengio, Ian J. Goodfellow and Aaron
Courville
• http://www.deeplearningbook.org
Acknowledgment
• 感謝 Victor Chen 發現投影片上的打字錯誤

HUTH Influence of Fastener Flexibility On The Prediction of Load Transfer and Fatigue Life For Multiple Row Joints
No ratings yet
HUTH Influence of Fastener Flexibility On The Prediction of Load Transfer and Fatigue Life For Multiple Row Joints
30 pages
Class Test 2 Review
No ratings yet
Class Test 2 Review
7 pages
2022 MATENA1 Semester Test 1 Memo
No ratings yet
2022 MATENA1 Semester Test 1 Memo
11 pages
2 DNN-CNN-RNN
100% (1)
2 DNN-CNN-RNN
87 pages
Deep Learning: Hung-yi Lee 李宏毅
No ratings yet
Deep Learning: Hung-yi Lee 李宏毅
29 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
Deep Learning Tutorial
No ratings yet
Deep Learning Tutorial
133 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Unit-5_AI_ETC
No ratings yet
Unit-5_AI_ETC
64 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
Lecture_09_slides_-_after
No ratings yet
Lecture_09_slides_-_after
57 pages
Deep Learning Tutorial Complete (v3)
No ratings yet
Deep Learning Tutorial Complete (v3)
109 pages
LEC-5 - DL Intro
No ratings yet
LEC-5 - DL Intro
63 pages
UNIT-I.pptx
No ratings yet
UNIT-I.pptx
90 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
SHALLOW NETWORKS VERSUS DEEP NETWORKS
No ratings yet
SHALLOW NETWORKS VERSUS DEEP NETWORKS
6 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
108 pages
Chapter 2 - 3 Deep Neural Network
No ratings yet
Chapter 2 - 3 Deep Neural Network
23 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
20250415 - Deep_learning
No ratings yet
20250415 - Deep_learning
49 pages
Assignment B 3 Customer Churn Modeling
No ratings yet
Assignment B 3 Customer Churn Modeling
7 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
MLT unit 4 and 5 part 2
No ratings yet
MLT unit 4 and 5 part 2
34 pages
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
No ratings yet
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
24 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
8. Deep learning
No ratings yet
8. Deep learning
95 pages
DL Intro
No ratings yet
DL Intro
64 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
ST M Hdstat RNN Deep Learning
No ratings yet
ST M Hdstat RNN Deep Learning
17 pages
AI Lab 1
No ratings yet
AI Lab 1
11 pages
6S191_MIT_DeepLearning_L1
No ratings yet
6S191_MIT_DeepLearning_L1
108 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
CENG3300 Lecture 9
No ratings yet
CENG3300 Lecture 9
19 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
ANNandItsApplicationsinCivilEngineering
No ratings yet
ANNandItsApplicationsinCivilEngineering
264 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
Unit 4
100% (1)
Unit 4
57 pages
Unit 1
No ratings yet
Unit 1
16 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
What is Gradient Based Learning in Deep Learning
No ratings yet
What is Gradient Based Learning in Deep Learning
12 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
CV Lec5
No ratings yet
CV Lec5
54 pages
Unit-3 D.L
No ratings yet
Unit-3 D.L
16 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Deep Learning.pdf
No ratings yet
Deep Learning.pdf
289 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
151 pages
Deep learning (nirali)
No ratings yet
Deep learning (nirali)
32 pages
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Digital and Microprocessor Techniques V11
From Everand
Digital and Microprocessor Techniques V11
Clive W. Humphris
4.5/5 (2)
Learn Digital and Microprocessor Techniques on Your Smartphone
From Everand
Learn Digital and Microprocessor Techniques on Your Smartphone
Clive W. Humphris
No ratings yet
Learn Digital and Microprocessor Techniques On Your Smartphone: Portable Learning, Reference and Revision Tools.
From Everand
Learn Digital and Microprocessor Techniques On Your Smartphone: Portable Learning, Reference and Revision Tools.
Clive W. Humphris
No ratings yet
Virtual Boy Architecture: Architecture of Consoles: A Practical Analysis, #17
From Everand
Virtual Boy Architecture: Architecture of Consoles: A Practical Analysis, #17
Rodrigo Copetti
No ratings yet
Digital and Microprocessor Techniques V10
From Everand
Digital and Microprocessor Techniques V10
Clive W. Humphris
No ratings yet
Computation of The Compression Factor and Fugacity Coefficient of Real Gases
No ratings yet
Computation of The Compression Factor and Fugacity Coefficient of Real Gases
20 pages
Level-Ii: Sample Paper
100% (1)
Level-Ii: Sample Paper
5 pages
Ml-Exp-5 - Jupyter Notebook
No ratings yet
Ml-Exp-5 - Jupyter Notebook
5 pages
Prototype Based Deepm Learning Paper 2 Zhou
No ratings yet
Prototype Based Deepm Learning Paper 2 Zhou
12 pages
Jamovi 1
No ratings yet
Jamovi 1
2 pages
Validity of The World Health Organization Adult ADHD Self-Report Scale (ASRS) Screener in A Representative Sample of Health Plan Members
No ratings yet
Validity of The World Health Organization Adult ADHD Self-Report Scale (ASRS) Screener in A Representative Sample of Health Plan Members
14 pages
Lifts, Elevators, Escalators and Moving Walkways-Travelators
94% (17)
Lifts, Elevators, Escalators and Moving Walkways-Travelators
375 pages
Emi 2
No ratings yet
Emi 2
81 pages
110325 - Bài Tập Tree Digram Probability
No ratings yet
110325 - Bài Tập Tree Digram Probability
8 pages
S.1 MATH EOT
No ratings yet
S.1 MATH EOT
2 pages
OPEN_ENDED_LAB_7_C++
No ratings yet
OPEN_ENDED_LAB_7_C++
4 pages
9-The Three-Point Estimating Technique
100% (1)
9-The Three-Point Estimating Technique
4 pages
2021 Computational Fluid-Dynamics Modelling of Supersonic Ejectors
No ratings yet
2021 Computational Fluid-Dynamics Modelling of Supersonic Ejectors
58 pages
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
No ratings yet
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
29 pages
Lesson 14
No ratings yet
Lesson 14
25 pages
Golden
No ratings yet
Golden
9 pages
Etabs Checklist
No ratings yet
Etabs Checklist
2 pages
Module 4
No ratings yet
Module 4
67 pages
CE 431 Chapter 5 Cables Arches and Bridges
No ratings yet
CE 431 Chapter 5 Cables Arches and Bridges
9 pages
Common Logarithms PDF
No ratings yet
Common Logarithms PDF
7 pages
HW3
No ratings yet
HW3
2 pages
Modeling and Simulation of Temperature Control System of Coating Plant Air Conditioner
No ratings yet
Modeling and Simulation of Temperature Control System of Coating Plant Air Conditioner
6 pages
CH 7
No ratings yet
CH 7
72 pages
MY Notes 1st Year Math Mcqs With Answers
No ratings yet
MY Notes 1st Year Math Mcqs With Answers
24 pages
Worksheet On Dividing Polynomials
No ratings yet
Worksheet On Dividing Polynomials
2 pages
Newton's Law of Cooling
No ratings yet
Newton's Law of Cooling
6 pages
International University-Vnu HCM City School of Biotechnology
No ratings yet
International University-Vnu HCM City School of Biotechnology
6 pages