15-NEURAL-NETWORK-UPDATED
15-NEURAL-NETWORK-UPDATED
Neural Network
The idea of ANNs..?
▪ ANNs learn relationship between cause and effect or organize large volumes of data
into orderly and informative patterns.
What is that?
frog
lion
bird
It’s a frog
Artificial Neural Network
▪ ANN Definition:
“Data processing system consisting of a large number of simple, highly interconnected
processing elements (artificial neurons) in an architecture inspired by the structure of the
cerebral cortex of the brain”
▪ Synaptic connection strengths among neurons are used to store the acquired
knowledge.
x w
𝑤𝑥 + 𝑏 ∫ y
Computer neuron to mimic the properties of biological neurons:
x1
w1
x2 w2 𝑛
wn
𝑥𝑖 𝑤𝑖 + 𝑏
𝑖=1
∫ y
xn
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑠𝑢𝑚 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
▪ passes this value through an activation
b 𝑛
function
𝑠𝑢𝑚 = 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.
𝑖=1
becomes active) if its output is above
zero.
w1 w2 wn
▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛ⅇ𝑡 = 𝜔𝑖 𝑥𝑖 + 𝑏
𝑖=1
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑠𝑢𝑚 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
▪ passes this value through an activation
𝑛
function
𝑠𝑢𝑚 = 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.
𝑖=1
w0 becomes active) if its output is above
zero.
w1 w2 wn
X0 ▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛ⅇ𝑡 = 𝜔𝑖 𝑥𝑖
𝑖=0
Inverter
input output
X1 Y 0.5
0 1 F(x) y
1 0 x0 w0= −1
Boolean OR
x2
input input output
x1 x2 Y + +
1
OR
0 0 0
𝑤1=1
𝑥1 𝑓(𝑥) 𝑦
0 1 1 x1
− +
1 0 1
𝑥2
1 1 1
Boolean AND
input input x2
ouput
x1 x2 1
0 0 0 − +
𝑤1=1
𝑥1 𝑓(𝑥) 𝑦
0 1 0
AND
1 0 0
x1
𝑥2 − −
1 1 1
Boolean XOR
0 0 0
+ −
0 1 1
XOR
1 0 1
x1 x2 x1
1 1 0 − +
Boolean XOR
XOR
input input
ouput −0.5 x2
x1 x2 o
0 0 0 1 −1
+ −
0 1 1 OR AND
1 0 1 h1 h2 −1.5 XOR
1 1 0 w10= −.5 x1
− +
1 1
1 1 1
w20= −1.5
x1 x1
Exercise
▪ Design neural network:
▪ XNOR
▪ NOR
▪ NAND
▪ XOR using OR, AND and NAND
▪ XNOR using OR, AND and NOR
ANN Desing
ANN
Activation
Interconnections Learning
Functions
Parameter
Single Layer Single Layer Supervised Sigmoid
Learning
Structure
Multilayer Multilayer Unsupervised Softmax
Learning
Reinforcement TanH
ReLU
Leaky ReLU
Learning Types
▪ Supervised learning: (Labeled examples)
▪ Agent is given correct answers for each example
▪ Agent is learning a function from examples of its inputs
and outputs
Learning
Neural
Parameter Structure
Supervised 𝑋 Network 𝑌
Learning Learning (Predicted output)
(Input)
𝑾, 𝒃
Connection Change in Update
weights are network Unsupervised learning
updated structure Loss / Error
Parameters Generator 𝑌
𝑾, 𝒃 w. r. t ∆ 𝑌 − 𝑌 (Desired Output)
Reinforcement
of Loss /Error
Activation Function
ANN
Activation
Functions
Sigmoid
Softmax
TanH
ReLU
Leaky ReLU
Activation Function
ANN
Activation
Functions
Sigmoid
Softmax
TanH
ReLU
Leaky ReLU
Characterization of ANN
ANN
Interconnections
Feed
Convolution Recurrent
Forward
Single Layer
Multilayer
Neuron Desing
𝑏
𝑤
𝑥 ∑∫ 𝑦 𝑦 = 𝜎(𝑥𝑤 + 𝑏)
𝑥 = 1.4, 1 1
𝑧 = 2.5 ∗ 1.4 + 1 𝑦= 𝑦=
𝑤 = 2.5, 𝑧 = 4.5 1 + 𝑒 −𝑧 1 + 𝑒 (2.5 ∗1.4+1)
𝑏=1
𝑦 = 0.989 𝑦 = 0.989
Neuron Desing
𝑏
𝑥0 𝑤0 𝑛
𝑤1
𝑥1 𝑦 𝑦 = 𝜎 𝑥𝑖 𝑤𝑖 + 𝑏
𝑤𝑛 𝑖=0
𝑥𝑛
𝑥2
𝒃 = [1]
Single Layer Feedforward Neural Network Desing
𝑏0
2
𝑤0,0 𝑦0 = 𝜎 𝑥𝑖 𝑤0,𝑖 + 𝑏0
𝑥0 𝐻0 𝑦0
𝑏1 𝑖=0
2
𝑥1 𝐻1 𝑦1 𝑦1 = 𝜎 𝑥𝑖 𝑤1,𝑖 + 𝑏1
𝑏2 𝑖=0
2
𝑥2 𝐻2 𝑦2 𝑦2 = 𝜎 𝑥𝑖 𝑤2,𝑖 + 𝑏2
𝑖=0
Single Layer Feedforward Neural Network Desing
𝑏0
𝑤0,0
𝑥0 𝐻0 𝑦0
𝑏1
𝑥1 𝐻1 𝑦1 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑦(1×3)
𝑏2
𝑥2 𝐻2 𝑦2
𝑤0,0
𝑥0 𝐻0 𝑦0
𝑏1
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑌(3×3)
𝑥1 𝐻1 𝑦1
𝑏2
𝒁 = 𝑿 ∙ 𝑾𝑇 + 𝒃
𝑥2 𝐻2 𝑦2
𝑥0,0 × 𝑤0,0 + 𝑥0,1 × 𝑤0,1 + 𝑥0,2 × 𝑤0,2 + 𝑏0 ,
𝑥0,0 𝑥0,1 𝑥0,2 𝒛0 = 𝑥0,0 × 𝑤1,0 + 𝑥0,1 × 𝑤1,1 + 𝑥0,2 × 𝑤1,2 + 𝑏1 ,
𝑥0,0 × 𝑤2,0 + 𝑥0,1 × 𝑤2,1 + 𝑥0,2 × 𝑤2,2 + 𝑏2
𝐼𝑛𝑝𝑢𝑡 = 𝑿(3×𝑛) = 𝑥1,0 𝑥1,1 𝑥1,2 1×3
𝑥2,0 𝑥2,1 𝑥2,2
𝒛0
𝑤0,0 𝑤0,1 𝑤0,2 𝒁 = 𝒛1
𝑊𝑒𝑖𝑔ℎ𝑡𝑠 = 𝑾(3×𝑛) = 𝑤1,0 𝑤1,1 𝑤1,2 𝒛2 3×3
𝑤2,0 𝑤2,1 𝑤2,2
𝑦 = 𝜎(𝒁)
𝑏𝑖𝑎𝑠 = 𝒃(1×3) = [𝑏0 , 𝑏1 , 𝑏2 ]
Multi-Layer Feedforward Neural Network Desing
𝑏0
𝑤0,0 𝑏0
𝑥0 𝐻1,0
𝑏1 𝐻2,0 𝑦0
𝑏1
𝑥1 𝐻1,1
𝑏2 𝐻2,1 𝑦1
𝑥2 𝐻1,2
𝑥2 𝐻1,2
𝑌 = 𝜎(𝜎(𝑋 ∙ 𝑊1 + 𝑏1 ) ∙ 𝑊2 + 𝑏2 )
Multi-Layer Feedforward Neural Network Desing
𝑏1,0
𝑤0,0 𝑏2,0
𝑥0 𝐻1,0
𝑏1,1 𝑏3,0
𝐻2,0
𝑏2,1 𝐻3,0 𝑦0
𝑥1 𝐻1,1
𝑏1,2 𝐻2,1
𝑥2 𝐻1,2
[𝑿](10×3) [𝑾1 ](3×3) [𝑾2 ](2×3) [𝑾3 ](1×2) 𝒀(10×1)
[𝒃1 ](1×3) [𝒃2 ](1×2) [𝒃3 ](1×1)
𝑌 = 𝜎(𝜎(𝜎(𝑋 ∙ 𝑊1 + 𝑏1 ) ∙ 𝑊2 + 𝑏2 ) ∙ 𝑊3 + 𝑏3 )
Multilayer Neural Networks
▪ A multi-layer feedforward neural network with ≥ 1 hidden layers.
First Second
Input hidden hidden Output
layer layer layer layer
Roles of Layers
▪ Input Layer
▪ Accepts input signals from outside world
▪ Distributes the signals to neurons in hidden
layer
𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො (𝑦 − 𝑦)
ො 𝐿 Loss (L)
𝜕𝐿
▪ Find the derivative of 𝐿 w.r.t. 𝑤 i.e.
𝜕𝑤
Loss (L)
Lossmin (L)
Backpropagation
▪ “backward propagation of errors”
▪ Method used in ANN to calculate a gradient of a loss 𝐿 w.r.t.
weights 𝑤 that is needed in the calculation of updated weights.
▪ Computing the gradient one layer at a time, iterating backward
from the last layer to avoid redundant calculations of
intermediate terms in the chain rule.
▪ Used to train ANN by minimizing the difference between
predicted 𝑌 and actual outputs 𝑌.
Input signals
1
P=1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Input signals
1
P=1
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
Derivatives Rule
Derivative Rule
Chain Rule
Single Neuron
𝑏
𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො
3
𝑥
6 𝑅𝑒𝐿𝑈
∗
2 +
7 7
∫
𝑤
1
𝑏
Single Neuron
𝑏
𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො
3
𝑥
∗
6
2 +
7
𝑤
1
𝑏
Single Neuron
𝑓 𝑥, 𝑤, 𝑏 = 𝑥 ⋅ 𝑤 + 𝑏 3
𝑥
𝑚6
∗ 𝑓7
1
1. Forward Pass: 2 +
𝑤 1
• 𝑚 =𝑥⋅𝑤 𝜕𝑓
• 𝑓 =𝑚+𝑏 1
𝑏 𝜕𝑚 𝜕𝑓
1
2. Backward Pass: 𝜕𝑓
𝜕𝑓
• 𝜕𝑓
𝜕𝑥 𝜕𝑓 𝜕(𝑚+𝑏)
𝜕𝑓 𝜕𝑏 = =1
• 𝜕𝑚 𝜕𝑚
𝜕𝑤
𝜕𝑓 𝜕𝑓 𝜕(𝑚+𝑏)
• 𝜕𝑏
=
𝜕𝑏
=1
𝜕𝑏
Single Neuron
𝑓 𝑥, 𝑤, 𝑏 = 𝑥 ⋅ 𝑤 + 𝑏 3
𝑓 𝑥, 𝑤, 𝑏 = 𝑓(𝑠𝑢𝑚 𝑚𝑢𝑙 𝑥, 𝑤 , 𝑏 ) 𝑥
𝑚6
∗ 𝑓7
1
2 +
𝑤 1
1. Forward Pass: 3 = 3*1
• 𝑚 =𝑥⋅𝑤 1
• 𝑓 =𝑚+𝑏 𝑏
1
2. Backward Pass: 𝜕𝑓 𝜕𝑚 𝜕𝑓 𝜕𝑚 𝜕(𝑥∙𝑤)
𝜕𝑓 = = =𝑥=3
• 𝜕𝑤 𝜕𝑤 𝜕𝑚 𝜕𝑤 𝜕𝑤
𝜕𝑥
𝜕𝑓
•
𝜕𝑤
𝜕𝑓 Downstream Local Upstream
• Gradient Gradient Gradient
𝜕𝑏
Single Neuron
𝑓 𝑥, 𝑤, 𝑏 = 𝑥 ⋅ 𝑤 + 𝑏 3
𝑓 𝑥, 𝑤, 𝑏 = 𝑓(𝑠𝑢𝑚 𝑚𝑢𝑙 𝑥, 𝑤 , 𝑏 ) 𝑥
2=2∗1 𝑚6
∗ 𝑓7
1
2 +
𝑤 1
1. Forward Pass: 3
• 𝑚 =𝑥⋅𝑤 1
• 𝑓 =𝑚+𝑏 𝑏
1
2. Backward Pass: 𝜕𝑓 𝜕𝑚 𝜕𝑓 𝜕𝑚 𝜕(𝑥∙𝑤)
𝜕𝑓 = = =𝑤=2
• 𝜕𝑥 𝜕𝑥 𝜕𝑚 𝜕𝑥 𝜕𝑥
𝜕𝑥
𝜕𝑓
•
𝜕𝑤
𝜕𝑓 Downstream Local Upstream
• Gradient Gradient Gradient
𝜕𝑏
Single Neuron Gradient
Single Neuron
𝑏
𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො
𝜕 max 𝑥, 0
= 1 𝑖𝑓(𝑥 > 0)
𝜕𝑥
3
𝑥
2 6 𝑅𝑒𝐿𝑈
∗
2 1 +
7 7
∫
𝑤 1 1
3
1⋅1 1⋅1
1
𝑏
1 Downstream Local Upstream
Gradient Gradient Gradient
Single Neuron
𝑏
𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො 𝜕𝜎 𝑥
= (1 − 𝜎 𝑥 ) 𝜎 𝑥
𝜕𝑥
𝜕𝜎 𝑥
= 1 − 0.99 0.99
𝜕𝑥
3
𝑥
0.01 = 2 ∗ 0.0099 6 𝜎
∗
0.0099 7 0.99
2 + ∫
𝑤 0.02 = 3 ∗ 0.0099
0.0099
1
1 ⋅ 0.0099 0.0099 ⋅ 1
1
𝑏 0.0099 Downstream Local Upstream
Gradient Gradient Gradient
Example
Example
Pattern in Gradient Flow
Vector Derivatives
Backpropagation with Vectors
Backpropogation with Vectors
Backpropagation with Metrics
Reverse-Mode Automatic Differentiation
Backprop: Higher-Order Derivatives
Loss Function and Their Local Gradient
𝑛
Mean Square 1 2
𝜕𝐿𝑖 2
𝐿𝑖 = 𝒚𝑖 − 𝑦ො𝑖 = − 𝒚𝑖 − 𝑦ො𝑖
Error 𝑛 𝜕𝑦ො 𝑛
𝑖=0
Categorical 𝜕𝐿𝑖 𝑦𝑖
𝐿𝑖 = − 𝑦𝑖 ⋅ log(𝑦ෝ𝑖 ) =−
Cross Entropy 𝑖
𝜕𝑦ො 𝑦ො𝑖
Back-propagation Algorithm
▪ In a back-propagation neural network, the learning algorithm has 2 phases.
1. Forward propagation of inputs
2. Backward propagation of errors
▪ The algorithm loops over the 2 phases until the errors obtained are lower than a
certain threshold.
▪ Learning is done in a similar manner as in a perceptron
▪ A set of training inputs is presented to the network.
▪ The network computes the outputs.
▪ The weights are adjusted to reduce errors.
▪ The activation function used is a sigmoid function.
1
=
sigmoid
Y 1 +e − X
Improving the Perceptron
Probabilistic Classification
▪ Naïve Bayes provides probabilistic classification
1: 0.001
1: 0.001 2: 0.703
2: 0.001 …
… 6: 0.264
0: 0.991 …
0: 0.001
Note: I’m going to be lazy and use “x” in place of “f(x)” here – this is just for notational convenience!
A Probabilistic Perceptron
A 1D Example
normalizer
The Soft Max
How to Learn?
▪ Maximum likelihood estimation
o Equivalently:
1D optimization
https://playground.tensorflow.org/