0% found this document useful (0 votes)
13 views

15-NEURAL-NETWORK-UPDATED

Artificial Neural Networks (ANNs) are data processing systems inspired by the human brain, consisting of interconnected processing elements that learn relationships between inputs and outputs. They utilize various learning types, including supervised, unsupervised, and reinforcement learning, to acquire knowledge from their environment. The document also discusses the structure, design, and activation functions of ANNs, as well as examples of simple neural network operations like Boolean logic functions.

Uploaded by

nihar.dhurde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

15-NEURAL-NETWORK-UPDATED

Artificial Neural Networks (ANNs) are data processing systems inspired by the human brain, consisting of interconnected processing elements that learn relationships between inputs and outputs. They utilize various learning types, including supervised, unsupervised, and reinforcement learning, to acquire knowledge from their environment. The document also discusses the structure, design, and activation functions of ANNs, as well as examples of simple neural network operations like Boolean logic functions.

Uploaded by

nihar.dhurde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Artificial Intelligence

Neural Network
The idea of ANNs..?
▪ ANNs learn relationship between cause and effect or organize large volumes of data
into orderly and informative patterns.

What is that?
frog

lion

bird
It’s a frog
Artificial Neural Network
▪ ANN Definition:
“Data processing system consisting of a large number of simple, highly interconnected
processing elements (artificial neurons) in an architecture inspired by the structure of the
cerebral cortex of the brain”

(Tsoukalas & Uhrig, 1997)

▪ Neural network: information processing paradigm inspired by biological nervous


systems, such as our brain
▪ Structure: large number of highly interconnected processing elements (neurons)
working together
▪ Like people, they learn from experience (by example)
Artificial Neural Networks
▪ Computational models inspired by the human brain:
▪ Algorithms that try to mimic the brain.

▪ Massively parallel, distributed system, made up of simple processing units (neurons)

▪ Synaptic connection strengths among neurons are used to store the acquired
knowledge.

▪ Knowledge is acquired by the network from its environment through a learning


process
Biological Inspirations
Biological Neuron
Dendrites: nerve fibres carrying electrical Soma (Cell body): computes a non-
signals to the cell linear function of its inputs

Synapse: the point of contact between


the axon of one cell and the dendrite of
Axon: single long fiber that carries the
another, regulating a chemical connection
electrical signal from the cell body to
whose strength affects the input to the
other neurons
cell.
𝑦 = 𝑤𝑥 + 𝑏
Effect of 𝑤 Effect of 𝑏
Computer neuron to mimic the properties of biological neurons:

x w
𝑤𝑥 + 𝑏 ∫ y
Computer neuron to mimic the properties of biological neurons:

x1
w1
x2 w2 𝑛

wn
෍ 𝑥𝑖 𝑤𝑖 + 𝑏
𝑖=1
∫ y
xn
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑠𝑢𝑚 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
▪ passes this value through an activation
b 𝑛
function
𝑠𝑢𝑚 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.
𝑖=1
becomes active) if its output is above
zero.
w1 w2 wn
▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛ⅇ𝑡 = ෍ 𝜔𝑖 𝑥𝑖 + 𝑏
𝑖=1
Perceptron
Output Y ▪ Perceptron
▪ computes the weighted sum of its input
(called its net input)
▪ adds its bias
1 𝑖𝑓 𝑠𝑢𝑚 > 0
𝑌=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
▪ passes this value through an activation
𝑛
function
𝑠𝑢𝑚 = ෍ 𝑤𝑖 𝑥𝑖 ▪ We say that the neuron “fires” (i.e.
𝑖=1
w0 becomes active) if its output is above
zero.
w1 w2 wn
X0 ▪ Bias can be incorporated as another
x1 x2 ... xn weight clamped to a fixed input of +1.0
Input X ▪ This extra free variable (bias) makes the
𝑛
neuron more powerful.
𝑛ⅇ𝑡 = ෍ 𝜔𝑖 𝑥𝑖
𝑖=0
Inverter
input output
X1 Y 0.5
0 1 F(x) y

1 0 x0 w0= −1
Boolean OR
x2
input input output
x1 x2 Y + +
1
OR
0 0 0
𝑤1=1
𝑥1 𝑓(𝑥) 𝑦
0 1 1 x1
− +
1 0 1
𝑥2
1 1 1
Boolean AND

input input x2
ouput
x1 x2 1
0 0 0 − +
𝑤1=1
𝑥1 𝑓(𝑥) 𝑦
0 1 0
AND
1 0 0
x1
𝑥2 − −
1 1 1
Boolean XOR

input input Output x2


x1 x2 Y

0 0 0
+ −
0 1 1
XOR
1 0 1
x1 x2 x1
1 1 0 − +
Boolean XOR
XOR
input input
ouput −0.5 x2
x1 x2 o

0 0 0 1 −1
+ −
0 1 1 OR AND
1 0 1 h1 h2 −1.5 XOR

1 1 0 w10= −.5 x1
− +
1 1
1 1 1
w20= −1.5

x1 x1
Exercise
▪ Design neural network:
▪ XNOR
▪ NOR
▪ NAND
▪ XOR using OR, AND and NAND
▪ XNOR using OR, AND and NOR
ANN Desing
ANN
Activation
Interconnections Learning
Functions

Feed Learning Learning


Convolution Recurrent Binary Step Linear Non-Linear
Forward Rules Types

Parameter
Single Layer Single Layer Supervised Sigmoid
Learning

Structure
Multilayer Multilayer Unsupervised Softmax
Learning

Reinforcement TanH

ReLU

Leaky ReLU
Learning Types
▪ Supervised learning: (Labeled examples)
▪ Agent is given correct answers for each example
▪ Agent is learning a function from examples of its inputs
and outputs

▪ Unsupervised learning: (Unlabeled examples)


▪ Agent must infer correct answers
▪ Completely unsupervised learning is impractical, since
agent has no context

▪ Reinforcement learning: (Rewards)


▪ Agent is given occasional rewards for correct
▪ Typically involves subproblem of learning “how the world
works”
Learning in ANN
ANN

Learning

Learning Rules Learning Types

Neural
Parameter Structure
Supervised 𝑋 Network 𝑌෠
Learning Learning (Predicted output)
(Input)
𝑾, 𝒃
Connection Change in Update
weights are network Unsupervised learning
updated structure Loss / Error
Parameters Generator 𝑌
𝑾, 𝒃 w. r. t ∆ 𝑌 − 𝑌෠ (Desired Output)
Reinforcement
of Loss /Error
Activation Function
ANN
Activation
Functions

Binary Step Linear Non-Linear

Sigmoid

Softmax

TanH

ReLU

Leaky ReLU
Activation Function
ANN
Activation
Functions

Binary Step Linear Non-Linear

Sigmoid

Softmax

TanH

ReLU

Leaky ReLU
Characterization of ANN
ANN
Interconnections

Feed
Convolution Recurrent
Forward

Single Layer

Multilayer
Neuron Desing
𝑏

𝑤
𝑥 ∑∫ 𝑦 𝑦 = 𝜎(𝑥𝑤 + 𝑏)

𝑥 = 1.4, 1 1
𝑧 = 2.5 ∗ 1.4 + 1 𝑦= 𝑦=
𝑤 = 2.5, 𝑧 = 4.5 1 + 𝑒 −𝑧 1 + 𝑒 (2.5 ∗1.4+1)
𝑏=1
𝑦 = 0.989 𝑦 = 0.989
Neuron Desing
𝑏
𝑥0 𝑤0 𝑛
𝑤1
𝑥1 𝑦 𝑦 = 𝜎 ෍ 𝑥𝑖 𝑤𝑖 + 𝑏
𝑤𝑛 𝑖=0

𝑥𝑛

𝐼𝑛𝑝𝑢𝑡 = 𝒙(1×𝑛) = [𝑥0 , 𝑥1 , ⋯ , 𝑥𝑛 ] 𝑧 = (𝑥0 𝑤0 + 𝑥1 𝑤1 + ⋯ + 𝑥𝑛 𝑤𝑛 + 𝑏)

𝑊𝑒𝑖𝑔ℎ𝑡𝑠 = 𝒘(1×𝑛) = [𝑤0 , 𝑤1 , ⋯ , 𝑤𝑛 ] 𝑧 = 𝒙 ∙ 𝒘𝑇 + 𝑏

𝑏𝑖𝑎𝑠 = 𝒃(1×1) = [𝑏0 ] 𝑦=𝜎 𝑧


Neuron Desing
𝑏
𝑥0 𝑤0
2
𝑤1
𝑥1 𝑦 𝑦 = 𝜎 ෍ 𝑥𝑖 𝑤𝑖 + 𝑏
𝑤2
𝑖=0

𝑥2

𝒙 = 1.4, 0.5, 2.2

𝒘 = 2.5, , 1.0, 1.1

𝒃 = [1]
Single Layer Feedforward Neural Network Desing

𝑏0
2
𝑤0,0 𝑦0 = 𝜎 ෍ 𝑥𝑖 𝑤0,𝑖 + 𝑏0
𝑥0 𝐻0 𝑦0
𝑏1 𝑖=0
2

𝑥1 𝐻1 𝑦1 𝑦1 = 𝜎 ෍ 𝑥𝑖 𝑤1,𝑖 + 𝑏1
𝑏2 𝑖=0
2
𝑥2 𝐻2 𝑦2 𝑦2 = 𝜎 ෍ 𝑥𝑖 𝑤2,𝑖 + 𝑏2
𝑖=0
Single Layer Feedforward Neural Network Desing
𝑏0

𝑤0,0
𝑥0 𝐻0 𝑦0
𝑏1

𝑥1 𝐻1 𝑦1 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑦(1×3)
𝑏2

𝑥2 𝐻2 𝑦2

𝑥0 × 𝑤0,0 + 𝑥1 × 𝑤0,1 + 𝑥2 × 𝑤0,2 + 𝑏0 ,


𝐼𝑛𝑝𝑢𝑡 = 𝒙(1×𝑛) = [𝑥0 , 𝑥1 , 𝑥2 ]
𝒛 = 𝑥0 × 𝑤1,0 + 𝑥1 × 𝑤1,1 + 𝑥2 × 𝑤1,2 + 𝑏1,
𝑤0,0 𝑤0,1 𝑤0,2 𝑥0 × 𝑤2,0 + 𝑥1 × 𝑤2,1 + 𝑥2 × 𝑤2,2 + 𝑏2
𝑊𝑒𝑖𝑔ℎ𝑡𝑠 = 𝒘(3×𝑛) = 𝑤1,0 𝑤1,1 𝑤1,2
𝑤2,0 𝑤2,1 𝑤2,2 𝒛 = 𝒙 ∙ 𝒘𝑇 + 𝒃

𝑏𝑖𝑎𝑠 = 𝒃(1×3) = [𝑏0 , 𝑏1 , 𝑏2 ] 𝑦 = 𝜎(𝒛)


Single Layer Feedforward Neuron Network Desing
𝑏0

𝑤0,0
𝑥0 𝐻0 𝑦0
𝑏1
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑌(3×3)
𝑥1 𝐻1 𝑦1
𝑏2
𝒁 = 𝑿 ∙ 𝑾𝑇 + 𝒃
𝑥2 𝐻2 𝑦2
𝑥0,0 × 𝑤0,0 + 𝑥0,1 × 𝑤0,1 + 𝑥0,2 × 𝑤0,2 + 𝑏0 ,
𝑥0,0 𝑥0,1 𝑥0,2 𝒛0 = 𝑥0,0 × 𝑤1,0 + 𝑥0,1 × 𝑤1,1 + 𝑥0,2 × 𝑤1,2 + 𝑏1 ,
𝑥0,0 × 𝑤2,0 + 𝑥0,1 × 𝑤2,1 + 𝑥0,2 × 𝑤2,2 + 𝑏2
𝐼𝑛𝑝𝑢𝑡 = 𝑿(3×𝑛) = 𝑥1,0 𝑥1,1 𝑥1,2 1×3
𝑥2,0 𝑥2,1 𝑥2,2
𝒛0
𝑤0,0 𝑤0,1 𝑤0,2 𝒁 = 𝒛1
𝑊𝑒𝑖𝑔ℎ𝑡𝑠 = 𝑾(3×𝑛) = 𝑤1,0 𝑤1,1 𝑤1,2 𝒛2 3×3
𝑤2,0 𝑤2,1 𝑤2,2
𝑦 = 𝜎(𝒁)
𝑏𝑖𝑎𝑠 = 𝒃(1×3) = [𝑏0 , 𝑏1 , 𝑏2 ]
Multi-Layer Feedforward Neural Network Desing
𝑏0
𝑤0,0 𝑏0
𝑥0 𝐻1,0
𝑏1 𝐻2,0 𝑦0
𝑏1
𝑥1 𝐻1,1
𝑏2 𝐻2,1 𝑦1

𝑥2 𝐻1,2

[𝑿](10×3) [𝑾1 ](3×3) [𝑾2 ](2×3) 𝒀(10×2)


[𝒃1 ](1×3) [𝒃2 ](1×2)
Multi-Layer Neural Network Feedforward Desing
𝑏0
𝑤0,0 𝑏0
𝑥0 𝐻1,0
𝑏1 𝐻2,0 𝑦0
𝑏1
𝑥1 𝐻1,1
𝑏2 𝐻2,1 𝑦1

𝑥2 𝐻1,2

[𝑿](10×3) [𝑾1 ](3×3) [𝑾2 ](2×3) 𝒀(10×2)


[𝒃1 ](1×3) [𝒃2 ](1×2)

𝑌෠ = 𝜎(𝜎(𝑋 ∙ 𝑊1 + 𝑏1 ) ∙ 𝑊2 + 𝑏2 )
Multi-Layer Feedforward Neural Network Desing
𝑏1,0
𝑤0,0 𝑏2,0
𝑥0 𝐻1,0
𝑏1,1 𝑏3,0
𝐻2,0
𝑏2,1 𝐻3,0 𝑦0
𝑥1 𝐻1,1
𝑏1,2 𝐻2,1

𝑥2 𝐻1,2
[𝑿](10×3) [𝑾1 ](3×3) [𝑾2 ](2×3) [𝑾3 ](1×2) 𝒀(10×1)
[𝒃1 ](1×3) [𝒃2 ](1×2) [𝒃3 ](1×1)

𝑌෠ = 𝜎(𝜎(𝜎(𝑋 ∙ 𝑊1 + 𝑏1 ) ∙ 𝑊2 + 𝑏2 ) ∙ 𝑊3 + 𝑏3 )
Multilayer Neural Networks
▪ A multi-layer feedforward neural network with ≥ 1 hidden layers.

Out put Signals


Input Signals

First Second
Input hidden hidden Output
layer layer layer layer
Roles of Layers
▪ Input Layer
▪ Accepts input signals from outside world
▪ Distributes the signals to neurons in hidden
layer

Out put Signals


Input Signals
▪ Usually does not do any computation
▪ Output Layer (computational neurons)
▪ Accepts output signals from the previous
hidden layer
▪ Outputs to the world First Second
Input hidden hidden Output
▪ Knows the desired outputs layer layer layer layer
▪ Hidden Layer (computational neurons)
▪ Determines its own desired outputs
Roles of Layers
▪ Neurons in hidden layers unobservable
through input and output of the
networks.
▪ Desired output unknown (hidden) from

Out put Signals


Input Signals
the outside and determined by the layer
itself
▪ 1 hidden layer for continuous functions
▪ 2 hidden layers for discontinuous First Second
functions Input hidden hidden Output
layer layer layer layer
▪ Practical applications mostly use 3 layers
▪ More layers are possible, but each
additional layer exponentially increases
computing load
Supervise Learning
Supervised Learning
Error / Loss Function
Mean Square Error / Loss (MSE)
𝑛
1 2
𝐿𝑖 = ෍ 𝒚𝑖 − 𝑦ො𝑖
𝑏1,0 𝑛
𝑖=0
𝑤0,0 𝑏2,0 𝑦
𝑥0 𝐻1,0
𝑏1,1 𝑏3,0
𝐻2,0
𝑏2,1 𝐻3,0 𝑦ො (𝑦 − 𝑦)
ො 𝐿
𝑥1 𝐻1,1
𝑏1,2 𝐻2,1

𝑥2 𝐻1,2 Update the Learning Parameter w.r.t. ∆𝐿


Single Neuron
𝑏 𝑦

𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො (𝑦 − 𝑦)
ො 𝐿 Loss (L)

𝜕𝐿
▪ Find the derivative of 𝐿 w.r.t. 𝑤 i.e.
𝜕𝑤

▪ If 𝑤 changes by a small amount, how


much will 𝐿 change?
▪ Update the 𝑤 to reduce the Loss 𝐿
▪ 𝑤 = 𝑤 − 𝛼 ⋅ ∆𝑤
Optimization

Loss (L)

Lossmin (L)
Backpropagation
▪ “backward propagation of errors”
▪ Method used in ANN to calculate a gradient of a loss 𝐿 w.r.t.
weights 𝑤 that is needed in the calculation of updated weights.
▪ Computing the gradient one layer at a time, iterating backward
from the last layer to avoid redundant calculations of
intermediate terms in the chain rule.
▪ Used to train ANN by minimizing the difference between
predicted 𝑌෠ and actual outputs 𝑌.
Input signals
1
P=1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
Input signals
1
P=1
x1 Weights trained Weights trained
1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals
Derivatives Rule
Derivative Rule
Chain Rule
Single Neuron
𝑏

𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො

3
𝑥
6 𝑅𝑒𝐿𝑈

2 +
7 7

𝑤
1
𝑏
Single Neuron
𝑏

𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො

3
𝑥

6
2 +
7
𝑤
1
𝑏
Single Neuron
𝑓 𝑥, 𝑤, 𝑏 = 𝑥 ⋅ 𝑤 + 𝑏 3
𝑥
𝑚6
∗ 𝑓7
1
1. Forward Pass: 2 +
𝑤 1
• 𝑚 =𝑥⋅𝑤 𝜕𝑓
• 𝑓 =𝑚+𝑏 1
𝑏 𝜕𝑚 𝜕𝑓
1
2. Backward Pass: 𝜕𝑓
𝜕𝑓
• 𝜕𝑓
𝜕𝑥 𝜕𝑓 𝜕(𝑚+𝑏)
𝜕𝑓 𝜕𝑏 = =1
• 𝜕𝑚 𝜕𝑚
𝜕𝑤
𝜕𝑓 𝜕𝑓 𝜕(𝑚+𝑏)
• 𝜕𝑏
=
𝜕𝑏
=1
𝜕𝑏
Single Neuron
𝑓 𝑥, 𝑤, 𝑏 = 𝑥 ⋅ 𝑤 + 𝑏 3
𝑓 𝑥, 𝑤, 𝑏 = 𝑓(𝑠𝑢𝑚 𝑚𝑢𝑙 𝑥, 𝑤 , 𝑏 ) 𝑥
𝑚6
∗ 𝑓7
1
2 +
𝑤 1
1. Forward Pass: 3 = 3*1
• 𝑚 =𝑥⋅𝑤 1
• 𝑓 =𝑚+𝑏 𝑏
1
2. Backward Pass: 𝜕𝑓 𝜕𝑚 𝜕𝑓 𝜕𝑚 𝜕(𝑥∙𝑤)
𝜕𝑓 = = =𝑥=3
• 𝜕𝑤 𝜕𝑤 𝜕𝑚 𝜕𝑤 𝜕𝑤
𝜕𝑥
𝜕𝑓

𝜕𝑤
𝜕𝑓 Downstream Local Upstream
• Gradient Gradient Gradient
𝜕𝑏
Single Neuron
𝑓 𝑥, 𝑤, 𝑏 = 𝑥 ⋅ 𝑤 + 𝑏 3
𝑓 𝑥, 𝑤, 𝑏 = 𝑓(𝑠𝑢𝑚 𝑚𝑢𝑙 𝑥, 𝑤 , 𝑏 ) 𝑥
2=2∗1 𝑚6
∗ 𝑓7
1
2 +
𝑤 1
1. Forward Pass: 3
• 𝑚 =𝑥⋅𝑤 1
• 𝑓 =𝑚+𝑏 𝑏
1
2. Backward Pass: 𝜕𝑓 𝜕𝑚 𝜕𝑓 𝜕𝑚 𝜕(𝑥∙𝑤)
𝜕𝑓 = = =𝑤=2
• 𝜕𝑥 𝜕𝑥 𝜕𝑚 𝜕𝑥 𝜕𝑥
𝜕𝑥
𝜕𝑓

𝜕𝑤
𝜕𝑓 Downstream Local Upstream
• Gradient Gradient Gradient
𝜕𝑏
Single Neuron Gradient
Single Neuron
𝑏

𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො
𝜕 max 𝑥, 0
= 1 𝑖𝑓(𝑥 > 0)
𝜕𝑥
3
𝑥
2 6 𝑅𝑒𝐿𝑈

2 1 +
7 7

𝑤 1 1
3
1⋅1 1⋅1
1
𝑏
1 Downstream Local Upstream
Gradient Gradient Gradient
Single Neuron
𝑏

𝑤
𝑥 𝑥𝑤 + 𝑏 𝑦ො 𝜕𝜎 𝑥
= (1 − 𝜎 𝑥 ) 𝜎 𝑥
𝜕𝑥
𝜕𝜎 𝑥
= 1 − 0.99 0.99
𝜕𝑥
3
𝑥
0.01 = 2 ∗ 0.0099 6 𝜎

0.0099 7 0.99
2 + ∫
𝑤 0.02 = 3 ∗ 0.0099
0.0099
1
1 ⋅ 0.0099 0.0099 ⋅ 1
1
𝑏 0.0099 Downstream Local Upstream
Gradient Gradient Gradient
Example
Example
Pattern in Gradient Flow
Vector Derivatives
Backpropagation with Vectors
Backpropogation with Vectors
Backpropagation with Metrics
Reverse-Mode Automatic Differentiation
Backprop: Higher-Order Derivatives
Loss Function and Their Local Gradient
𝑛
Mean Square 1 2
𝜕𝐿𝑖 2
𝐿𝑖 = ෍ 𝒚𝑖 − 𝑦ො𝑖 = − 𝒚𝑖 − 𝑦ො𝑖
Error 𝑛 𝜕𝑦ො 𝑛
𝑖=0

Binary Cross 𝜕𝐿𝑖 𝑦𝑖 1 − 𝑦𝑖


𝐿𝑖 = −𝑦𝑖 ⋅ log(𝑦෢
𝑖 ) − 1 − 𝑦 ⋅ log(1 − 𝑦
ෝ𝑖 ) =− −
Entropy 𝜕𝑦ො 𝑦ො𝑖 1 − 𝑦ෝ𝑖

Categorical 𝜕𝐿𝑖 𝑦𝑖
𝐿𝑖 = − ෍ 𝑦𝑖 ⋅ log(𝑦ෝ𝑖 ) =−
Cross Entropy 𝑖
𝜕𝑦ො 𝑦ො𝑖
Back-propagation Algorithm
▪ In a back-propagation neural network, the learning algorithm has 2 phases.
1. Forward propagation of inputs
2. Backward propagation of errors
▪ The algorithm loops over the 2 phases until the errors obtained are lower than a
certain threshold.
▪ Learning is done in a similar manner as in a perceptron
▪ A set of training inputs is presented to the network.
▪ The network computes the outputs.
▪ The weights are adjusted to reduce errors.
▪ The activation function used is a sigmoid function.

1
=
sigmoid
Y 1 +e − X
Improving the Perceptron
Probabilistic Classification
▪ Naïve Bayes provides probabilistic classification

1: 0.001
1: 0.001 2: 0.703
2: 0.001 …
… 6: 0.264
0: 0.991 …
0: 0.001

▪ Perceptron just gives us a class prediction


▪ Can we get it to give us probabilities?
▪ Turns out it also makes it easier to train!

Note: I’m going to be lazy and use “x” in place of “f(x)” here – this is just for notational convenience!
A Probabilistic Perceptron
A 1D Example

definitely blue not sure definitely red

probability increases exponentially


as we move away from boundary

normalizer
The Soft Max
How to Learn?
▪ Maximum likelihood estimation

▪ Maximum conditional likelihood estimation


Local Search
o Simple, general idea:
o Start wherever
o Repeat: move to the best neighboring state
o If no neighbors better than current, quit
o Neighbors = small perturbations of w
Our Status
o Our objective

o Challenge: how to find a good w ?

o Equivalently:
1D optimization

o Could evaluate and


o Then step in best direction

o Or, evaluate derivative:

o Which tells which direction to step into


2-D Optimization

Source: Thomas Jungblut’s Blog


Steepest Descent
o Idea:
o Start somewhere
o Repeat: Take a step in the steepest descent direction

Figure source: Mathworks


Steepest Direction
o Steepest Direction = direction of the gradient
How to Learn?
Optimization Procedure: Gradient Descent
Stochastic Gradient Descent

compare this to the


multiclass perceptron:
probabilistic weighting!
probability of incorrect answer probability of incorrect answer
Logistic Regression Demo!

https://playground.tensorflow.org/

You might also like