0% found this document useful (0 votes)

34 views9 pages

Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures

Deep Learning is a subset of machine learning that utilizes artificial neural networks to automatically learn hierarchical representations of data, excelling in tasks like image and speech recognition. Key concepts include the structure of neural networks, forward and backward propagation, and optimization techniques, with advanced architectures like CNNs for image processing and RNNs for sequential data. The document also highlights the importance of attention mechanisms and transformers in enhancing natural language processing.

Uploaded by

Bartosz Markiewicz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views9 pages

Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures

Uploaded by

Bartosz Markiewicz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

Introduction to Deep Learning

What is Deep Learning?

Deep Learning is a subset of machine learning inspired by the structure and function of the
human brain, using artificial neural networks (ANNs) with multiple layers (hence “deep”).
These models automatically learn hierarchical representations of data, making them
extremely effective for complex tasks like image recognition, natural language processing,
and speech recognition.

Why Deep Learning?

● Automatically extracts features from raw data, reducing the need for manual feature
engineering.

● Scales well with large datasets and complex data.

● State-of-the-art performance in many domains.

2. Basics of Neural Networks

2.1 The Neuron

The fundamental building block of a neural network is the neuron or node. It receives
multiple inputs, applies weights and bias, sums them, and passes the result through an
activation function.

Mathematically:

z=∑i=1nwixi+bz = \sum_{i=1}^n w_i x_i + bz=i=1∑nwixi+b

where xix_ixiare inputs, wiw_iwiare weights, and bbb is the bias.

The output is:

a=σ(z)a = \sigma(z)a=σ(z)

where σ\sigmaσ is an activation function.

2.2 Activation Functions

Activation functions introduce non-linearity allowing neural networks to learn complex

patterns.
Common functions:

● Sigmoid:

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1

Output range: (0, 1) — useful for binary classification.

● Tanh:

tanh⁡(z)=ez−e−zez+e−z\tanh(z) = \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}tanh(z)=ez+e−zez−e−z

Output range: (-1, 1)

● ReLU (Rectified Linear Unit):

ReLU(z)=max⁡(0,z)\text{ReLU}(z) = \max(0, z)ReLU(z)=max(0,z)

Most popular, alleviates vanishing gradient problem.

● Leaky ReLU:
Allows small gradients when z<0z < 0z<0, defined as:

LeakyReLU(z)={zz>00.01zz≤0\text{LeakyReLU}(z) = \begin{cases} z & z > 0 \\ 0.01z & z \leq

0 \end{cases}LeakyReLU(z)={z0.01zz>0z≤0

2.3 Layers and Architecture

Neurons are organized in layers:

● Input layer: Takes raw features.

● Hidden layers: Perform transformations; can be multiple.

● Output layer: Produces final prediction.

3. Forward and Backward Propagation

3.1 Forward Propagation

Input data passes through the network layer by layer applying weights, biases, and
activation functions to generate predictions.
3.2 Loss Function

Measures how far predictions are from true labels. Examples:

● Mean Squared Error (MSE) for regression.

● Cross-Entropy Loss for classification.

3.3 Backward Propagation

Uses the chain rule to compute gradients of the loss with respect to weights. This allows the
network to update weights to minimize loss.

3.4 Optimization

Gradient descent algorithms adjust weights using gradients:

● Batch Gradient Descent: Uses all data at once.

● Stochastic Gradient Descent (SGD): Uses one data point at a time.

● Mini-batch Gradient Descent: Uses small batches (commonly used).

Advanced optimizers:

● Adam: Combines momentum and adaptive learning rates.

● RMSProp, Adagrad, etc.

4. Implementing a Simple Neural Network in Python

(Using NumPy)
Here’s a barebones example of a simple neural network for binary classification.

python
CopyEdit
import numpy as np

def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

# Sample data (AND logic gate)

X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [0], [0], [1]])

np.random.seed(1)

# Initialize weights randomly with mean 0

weights0 = 2 * np.random.random((2, 3)) - 1 # input to hidden
weights1 = 2 * np.random.random((3, 1)) - 1 # hidden to output

learning_rate = 0.1

for epoch in range(10000):

# Forward propagation
layer0 = X
layer1 = sigmoid(np.dot(layer0, weights0))
layer2 = sigmoid(np.dot(layer1, weights1))

# Calculate error
layer2_error = y - layer2

if epoch % 1000 == 0:
print(f"Error at epoch {epoch}:
{np.mean(np.abs(layer2_error))}")

# Backpropagation
layer2_delta = layer2_error * sigmoid_derivative(layer2)

layer1_error = layer2_delta.dot(weights1.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)

# Update weights
weights1 += learning_rate * layer1.T.dot(layer2_delta)
weights0 += learning_rate * layer0.T.dot(layer1_delta)

# Predictions
print("Output after training:")
print(layer2)
Would you like me to continue with:

● More advanced architectures (CNNs, RNNs)?

● Practical examples with frameworks like TensorFlow or PyTorch?

● Or a different angle?

Just say the word!

You said:
yes please

ChatGPT said:

Awesome! Continuing with the next part focused on Convolutional Neural Networks
(CNNs) — a foundational architecture for deep learning in image tasks.

5. Convolutional Neural Networks (CNNs)

5.1 What are CNNs?

CNNs are specialized neural networks designed to process data with grid-like topology —
mainly images. They automatically learn spatial hierarchies of features through convolutional
layers, reducing the need for manual feature extraction.

5.2 Key Components

● Convolutional Layers: Apply filters/kernels that slide over input data to extract
features like edges, textures.

● Pooling Layers: Downsample feature maps to reduce dimensionality and

computation.

● Fully Connected Layers: After convolution/pooling, these layers perform

classification.

5.3 How Convolution Works

Convolution uses small matrices (filters) to scan the input and produce feature maps.

For example, a 3x3 filter moves over the image, computing dot products between the filter
weights and input patch.
5.4 Example: Simple CNN with Keras
python
CopyEdit
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load data
(train_images, train_labels), (test_images, test_labels) =
mnist.load_data()

# Preprocess data
train_images = train_images.reshape((60000, 28, 28,
1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28,
1)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Define model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

# Compile model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train model
model.fit(train_images, train_labels, epochs=5, batch_size=64)

# Evaluate model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

5.5 Why CNNs Work Well for Images

● Local connectivity: Filters focus on small regions.

● Parameter sharing: Same filter applies across the image, reducing parameters.

● Translation invariance: CNNs detect features regardless of location.

6. Recurrent Neural Networks (RNNs)

6.1 What Are RNNs?

RNNs are specialized for sequential data, such as time series or text. Unlike feedforward
networks, RNNs have loops allowing information to persist across steps.

6.2 Basic Structure

At each time step ttt, RNN takes input xtx_txtand previous hidden state ht−1h_{t-1}ht−1,
producing new hidden state hth_tht:

ht=tanh⁡(Wxhxt+Whhht−1+bh)h_t = \tanh(W_{xh} x_t + W_{hh} h_{t-1} +

b_h)ht=tanh(Wxhxt+Whhht−1+bh)

Output yty_tytcan be computed from hth_tht.

6.3 Problems with Basic RNNs

● Vanishing and exploding gradients make training hard for long sequences.

● Solutions: LSTM and GRU architectures.

7. LSTM and GRU

7.1 Long Short-Term Memory (LSTM)

LSTM adds gating mechanisms to control information flow:

● Forget gate: Decides what to discard.

● Input gate: Decides what to store.

● Output gate: Decides what to output.

This helps retain long-term dependencies.

7.2 Gated Recurrent Unit (GRU)

Simplified version of LSTM with combined gates, often faster to train.

8. Example: Simple LSTM for Text Classification (Keras)

python
CopyEdit
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
texts = ['I love machine learning', 'Deep learning is amazing', 'I
dislike spam emails']
labels = [1, 1, 0]

# Tokenize and pad sequences

tokenizer = Tokenizer(num_words=100)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(sequences, maxlen=5)

# Build model
model = Sequential([
Embedding(100, 8, input_length=5),
LSTM(16),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(padded, labels, epochs=10)

# Predict
print(model.predict(padded))

9. Attention Mechanisms and Transformers

9.1 What is Attention?

Attention allows models to focus on important parts of input when producing output,
improving performance on long sequences.

9.2 Transformers

Introduced by Vaswani et al. (2017), transformers use self-attention layers and have
revolutionized NLP, powering models like BERT and GPT.

10. Summary
● Deep learning uses multi-layer neural networks.

● CNNs are great for images.

● RNNs (especially LSTM/GRU) handle sequential data.

● Attention and transformers have advanced NLP massively.

● Python frameworks like TensorFlow and PyTorch enable practical implementations.

Instagram Playbook
No ratings yet
Instagram Playbook
58 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
Notes From Training
No ratings yet
Notes From Training
12 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
DL Experiments
No ratings yet
DL Experiments
19 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
Introduction To Deep Learning With IBM PDF
No ratings yet
Introduction To Deep Learning With IBM PDF
15 pages
Neural Network Essentials for Developers
No ratings yet
Neural Network Essentials for Developers
2 pages
Lab - 2 Csa
No ratings yet
Lab - 2 Csa
10 pages
Four Unit
No ratings yet
Four Unit
3 pages
2630 20230529 Mahdi Momen Aldawood HH 15261 946399124
No ratings yet
2630 20230529 Mahdi Momen Aldawood HH 15261 946399124
11 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
Module 4
No ratings yet
Module 4
36 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Lec 07 8
No ratings yet
Lec 07 8
40 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Deep Learning Notes (1) 2
No ratings yet
Deep Learning Notes (1) 2
54 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
Unit Iv (CNN)
No ratings yet
Unit Iv (CNN)
8 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
3 Short
No ratings yet
3 Short
10 pages
Deep Learning: A Technical Guide
No ratings yet
Deep Learning: A Technical Guide
106 pages
Deep Learning Models (Basic)
No ratings yet
Deep Learning Models (Basic)
35 pages
Deep Learning Curriculum
No ratings yet
Deep Learning Curriculum
23 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
ML 6
No ratings yet
ML 6
10 pages
Future Proof Yourself-An AI Era Survival Guide
No ratings yet
Future Proof Yourself-An AI Era Survival Guide
259 pages
NNML Full
No ratings yet
NNML Full
19 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Basic Models of Artificial Neural Networks
No ratings yet
Basic Models of Artificial Neural Networks
5 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
8 pages
Deep Learning Essentials for Experts
No ratings yet
Deep Learning Essentials for Experts
8 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Set A
No ratings yet
Set A
20 pages
AIDS Module 4
No ratings yet
AIDS Module 4
29 pages
Guddu Jha - Organized
No ratings yet
Guddu Jha - Organized
3 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
31 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
ANN Notes
No ratings yet
ANN Notes
8 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
Module V
No ratings yet
Module V
19 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
22 pages
Introduction To ANN With Steps 10 25
No ratings yet
Introduction To ANN With Steps 10 25
30 pages
Introduction To Genetic Algorithm Neural Networks
No ratings yet
Introduction To Genetic Algorithm Neural Networks
44 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
AIDS-II PT1 Question Bank
No ratings yet
AIDS-II PT1 Question Bank
27 pages
Deep Learning Basics (Lecture Notes) : Romain Tavenard
No ratings yet
Deep Learning Basics (Lecture Notes) : Romain Tavenard
49 pages
Data Scientist Guide
No ratings yet
Data Scientist Guide
10 pages
IntroductiontoDeepLearning T1753856843586
No ratings yet
IntroductiontoDeepLearning T1753856843586
3 pages
Generative AI in Business Consulting Analyzing Its Impact On Client Engagement and Service Delivery Models
No ratings yet
Generative AI in Business Consulting Analyzing Its Impact On Client Engagement and Service Delivery Models
8 pages
CSE710 - Advanced Artificial Intelligence - Course Outline - Fall2024
No ratings yet
CSE710 - Advanced Artificial Intelligence - Course Outline - Fall2024
3 pages
Alisha Final Report
No ratings yet
Alisha Final Report
36 pages
Article 1 - The Pros and Cons of AI in Education and How It Will Impact Teachers in 2023
No ratings yet
Article 1 - The Pros and Cons of AI in Education and How It Will Impact Teachers in 2023
2 pages
Sentient Shadows
No ratings yet
Sentient Shadows
12 pages
Employee Attrition Prediction
No ratings yet
Employee Attrition Prediction
3 pages
Virus-MNIST: Malware Image Dataset
No ratings yet
Virus-MNIST: Malware Image Dataset
6 pages
A Computational Model of Empathy For Interactive Agents
No ratings yet
A Computational Model of Empathy For Interactive Agents
6 pages
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
No ratings yet
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
47 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
Graduation IIT Bombay IIT Bombay 2024 8.65
No ratings yet
Graduation IIT Bombay IIT Bombay 2024 8.65
1 page
A Novel Method For Facial Recognition Ba
No ratings yet
A Novel Method For Facial Recognition Ba
5 pages
Automated Generation of Multiple-Choice Questions For Computer Science Education Using Conditional Generative Adversarial Networks
No ratings yet
Automated Generation of Multiple-Choice Questions For Computer Science Education Using Conditional Generative Adversarial Networks
19 pages
Inbound 2794278826819479127
No ratings yet
Inbound 2794278826819479127
1 page
Speech Recognition System
No ratings yet
Speech Recognition System
16 pages
The Impact of Auto-Correction Tools On The Written Spelling Skill Among Students of English
No ratings yet
The Impact of Auto-Correction Tools On The Written Spelling Skill Among Students of English
97 pages
The Cyber Frontiern
No ratings yet
The Cyber Frontiern
12 pages
5 - AIML - Module3 - PPT
No ratings yet
5 - AIML - Module3 - PPT
37 pages
HUMAN-TECHNOLOGY COMMUNICATION Internet-Of Robotic-Things and Ubiquitous. R. Anandandownload
100% (3)
HUMAN-TECHNOLOGY COMMUNICATION Internet-Of Robotic-Things and Ubiquitous. R. Anandandownload
58 pages
Cybersecurity and Digital Forensics
100% (1)
Cybersecurity and Digital Forensics
204 pages
HPCL Engineer (Mechanical) Official Paper (Held On - 18 Aug, 2024)
No ratings yet
HPCL Engineer (Mechanical) Official Paper (Held On - 18 Aug, 2024)
62 pages
Learning AI and AI For Learning
No ratings yet
Learning AI and AI For Learning
7 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Logic Systems
No ratings yet
Logic Systems
2 pages
Statement of Purpose Galway
100% (2)
Statement of Purpose Galway
2 pages
Zygon 14871 Oviedo
No ratings yet
Zygon 14871 Oviedo
15 pages
Self-Supervised Learning: Generative or Contrastive
No ratings yet
Self-Supervised Learning: Generative or Contrastive
20 pages
Fabric Defect Detection System
No ratings yet
Fabric Defect Detection System
14 pages
August 2020 Edition
No ratings yet
August 2020 Edition
1 page

Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures

Uploaded by

Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures

Uploaded by

1.

Introduction to Deep Learning

Why Deep Learning?

●​ Scales well with large datasets and complex data.​

●​ State-of-the-art performance in many domains.​

2. Basics of Neural Networks

z=∑i=1nwixi+bz = \sum_{i=1}^n w_i x_i + bz=i=1∑n​wi​xi​+b

where xix_ixi​are inputs, wiw_iwi​are weights, and bbb is the bias.

The output is:

where σ\sigmaσ is an activation function.

2.2 Activation Functions

Activation functions introduce non-linearity allowing neural networks to learn complex

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1​

Output range: (0, 1) — useful for binary classification.

tanh⁡(z)=ez−e−zez+e−z\tanh(z) = \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}tanh(z)=ez+e−zez−e−z​

Output range: (-1, 1)

●​ ReLU (Rectified Linear Unit):​

ReLU(z)=max⁡(0,z)\text{ReLU}(z) = \max(0, z)ReLU(z)=max(0,z)

Most popular, alleviates vanishing gradient problem.

LeakyReLU(z)={zz>00.01zz≤0\text{LeakyReLU}(z) = \begin{cases} z & z > 0 \\ 0.01z & z \leq

2.3 Layers and Architecture

Neurons are organized in layers:

●​ Input layer: Takes raw features.​

●​ Hidden layers: Perform transformations; can be multiple.​

●​ Output layer: Produces final prediction.​

3. Forward and Backward Propagation

Measures how far predictions are from true labels. Examples:

●​ Mean Squared Error (MSE) for regression.​

●​ Cross-Entropy Loss for classification.​

3.3 Backward Propagation

Gradient descent algorithms adjust weights using gradients:

●​ Batch Gradient Descent: Uses all data at once.​

●​ Stochastic Gradient Descent (SGD): Uses one data point at a time.​

●​ Mini-batch Gradient Descent: Uses small batches (commonly used).​

●​ Adam: Combines momentum and adaptive learning rates.​

●​ RMSProp, Adagrad, etc.​

4. Implementing a Simple Neural Network in Python

# Sample data (AND logic gate)

# Initialize weights randomly with mean 0

for epoch in range(10000):

●​ More advanced architectures (CNNs, RNNs)?​

●​ Practical examples with frameworks like TensorFlow or PyTorch?​

Just say the word!

5. Convolutional Neural Networks (CNNs)

5.2 Key Components

●​ Pooling Layers: Downsample feature maps to reduce dimensionality and

●​ Fully Connected Layers: After convolution/pooling, these layers perform

5.3 How Convolution Works

5.5 Why CNNs Work Well for Images

●​ Local connectivity: Filters focus on small regions.​

●​ Translation invariance: CNNs detect features regardless of location.​

6. Recurrent Neural Networks (RNNs)

6.2 Basic Structure

ht=tanh⁡(Wxhxt+Whhht−1+bh)h_t = \tanh(W_{xh} x_t + W_{hh} h_{t-1} +

Output yty_tyt​can be computed from hth_tht​.

6.3 Problems with Basic RNNs

●​ Solutions: LSTM and GRU architectures.​

7. LSTM and GRU

LSTM adds gating mechanisms to control information flow:

●​ Input gate: Decides what to store.​

●​ Output gate: Decides what to output.​

This helps retain long-term dependencies.

7.2 Gated Recurrent Unit (GRU)

Simplified version of LSTM with combined gates, often faster to train.

8. Example: Simple LSTM for Text Classification (Keras)

# Tokenize and pad sequences

9. Attention Mechanisms and Transformers

●​ CNNs are great for images.​

●​ RNNs (especially LSTM/GRU) handle sequential data.​

●​ Attention and transformers have advanced NLP massively.​

●​ Python frameworks like TensorFlow and PyTorch enable practical implementations.

You might also like

● Scales well with large datasets and complex data.

● State-of-the-art performance in many domains.

z=∑i=1nwixi+bz = \sum_{i=1}^n w_i x_i + bz=i=1∑nwixi+b

where xix_ixiare inputs, wiw_iwiare weights, and bbb is the bias.

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1

tanh⁡(z)=ez−e−zez+e−z\tanh(z) = \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}tanh(z)=ez+e−zez−e−z

● ReLU (Rectified Linear Unit):

● Input layer: Takes raw features.

● Hidden layers: Perform transformations; can be multiple.

● Output layer: Produces final prediction.

● Mean Squared Error (MSE) for regression.

● Cross-Entropy Loss for classification.

● Batch Gradient Descent: Uses all data at once.

● Stochastic Gradient Descent (SGD): Uses one data point at a time.

● Mini-batch Gradient Descent: Uses small batches (commonly used).

● Adam: Combines momentum and adaptive learning rates.

● RMSProp, Adagrad, etc.

● More advanced architectures (CNNs, RNNs)?

● Practical examples with frameworks like TensorFlow or PyTorch?

● Pooling Layers: Downsample feature maps to reduce dimensionality and

● Fully Connected Layers: After convolution/pooling, these layers perform

● Local connectivity: Filters focus on small regions.

● Translation invariance: CNNs detect features regardless of location.

Output yty_tytcan be computed from hth_tht.

● Solutions: LSTM and GRU architectures.

● Input gate: Decides what to store.

● Output gate: Decides what to output.

● CNNs are great for images.

● RNNs (especially LSTM/GRU) handle sequential data.

● Attention and transformers have advanced NLP massively.

● Python frameworks like TensorFlow and PyTorch enable practical implementations.