0% found this document useful (0 votes)
34 views9 pages

Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures

Deep Learning is a subset of machine learning that utilizes artificial neural networks to automatically learn hierarchical representations of data, excelling in tasks like image and speech recognition. Key concepts include the structure of neural networks, forward and backward propagation, and optimization techniques, with advanced architectures like CNNs for image processing and RNNs for sequential data. The document also highlights the importance of attention mechanisms and transformers in enhancing natural language processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views9 pages

Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures

Deep Learning is a subset of machine learning that utilizes artificial neural networks to automatically learn hierarchical representations of data, excelling in tasks like image and speech recognition. Key concepts include the structure of neural networks, forward and backward propagation, and optimization techniques, with advanced architectures like CNNs for image processing and RNNs for sequential data. The document also highlights the importance of attention mechanisms and transformers in enhancing natural language processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

Introduction to Deep Learning


What is Deep Learning?

Deep Learning is a subset of machine learning inspired by the structure and function of the
human brain, using artificial neural networks (ANNs) with multiple layers (hence “deep”).
These models automatically learn hierarchical representations of data, making them
extremely effective for complex tasks like image recognition, natural language processing,
and speech recognition.

Why Deep Learning?

●​ Automatically extracts features from raw data, reducing the need for manual feature
engineering.​

●​ Scales well with large datasets and complex data.​

●​ State-of-the-art performance in many domains.​

2. Basics of Neural Networks


2.1 The Neuron

The fundamental building block of a neural network is the neuron or node. It receives
multiple inputs, applies weights and bias, sums them, and passes the result through an
activation function.

Mathematically:

z=∑i=1nwixi+bz = \sum_{i=1}^n w_i x_i + bz=i=1∑n​wi​xi​+b

where xix_ixi​are inputs, wiw_iwi​are weights, and bbb is the bias.

The output is:

a=σ(z)a = \sigma(z)a=σ(z)

where σ\sigmaσ is an activation function.

2.2 Activation Functions

Activation functions introduce non-linearity allowing neural networks to learn complex


patterns.
Common functions:

●​ Sigmoid:​

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1​

Output range: (0, 1) — useful for binary classification.

●​ Tanh:​

tanh⁡(z)=ez−e−zez+e−z\tanh(z) = \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}tanh(z)=ez+e−zez−e−z​

Output range: (-1, 1)

●​ ReLU (Rectified Linear Unit):​

ReLU(z)=max⁡(0,z)\text{ReLU}(z) = \max(0, z)ReLU(z)=max(0,z)

Most popular, alleviates vanishing gradient problem.

●​ Leaky ReLU:​
Allows small gradients when z<0z < 0z<0, defined as:​

LeakyReLU(z)={zz>00.01zz≤0\text{LeakyReLU}(z) = \begin{cases} z & z > 0 \\ 0.01z & z \leq


0 \end{cases}LeakyReLU(z)={z0.01z​z>0z≤0​

2.3 Layers and Architecture

Neurons are organized in layers:

●​ Input layer: Takes raw features.​

●​ Hidden layers: Perform transformations; can be multiple.​

●​ Output layer: Produces final prediction.​

3. Forward and Backward Propagation


3.1 Forward Propagation

Input data passes through the network layer by layer applying weights, biases, and
activation functions to generate predictions.
3.2 Loss Function

Measures how far predictions are from true labels. Examples:

●​ Mean Squared Error (MSE) for regression.​

●​ Cross-Entropy Loss for classification.​

3.3 Backward Propagation

Uses the chain rule to compute gradients of the loss with respect to weights. This allows the
network to update weights to minimize loss.

3.4 Optimization

Gradient descent algorithms adjust weights using gradients:

●​ Batch Gradient Descent: Uses all data at once.​

●​ Stochastic Gradient Descent (SGD): Uses one data point at a time.​

●​ Mini-batch Gradient Descent: Uses small batches (commonly used).​

Advanced optimizers:

●​ Adam: Combines momentum and adaptive learning rates.​

●​ RMSProp, Adagrad, etc.​

4. Implementing a Simple Neural Network in Python


(Using NumPy)
Here’s a barebones example of a simple neural network for binary classification.

python
CopyEdit
import numpy as np

def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

# Sample data (AND logic gate)


X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [0], [0], [1]])

np.random.seed(1)

# Initialize weights randomly with mean 0


weights0 = 2 * np.random.random((2, 3)) - 1 # input to hidden
weights1 = 2 * np.random.random((3, 1)) - 1 # hidden to output

learning_rate = 0.1

for epoch in range(10000):


# Forward propagation
layer0 = X
layer1 = sigmoid(np.dot(layer0, weights0))
layer2 = sigmoid(np.dot(layer1, weights1))

# Calculate error
layer2_error = y - layer2

if epoch % 1000 == 0:
print(f"Error at epoch {epoch}:
{np.mean(np.abs(layer2_error))}")

# Backpropagation
layer2_delta = layer2_error * sigmoid_derivative(layer2)

layer1_error = layer2_delta.dot(weights1.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)

# Update weights
weights1 += learning_rate * layer1.T.dot(layer2_delta)
weights0 += learning_rate * layer0.T.dot(layer1_delta)

# Predictions
print("Output after training:")
print(layer2)
Would you like me to continue with:

●​ More advanced architectures (CNNs, RNNs)?​

●​ Practical examples with frameworks like TensorFlow or PyTorch?​

●​ Or a different angle?​

Just say the word!

You said:
yes please

ChatGPT said:

Awesome! Continuing with the next part focused on Convolutional Neural Networks
(CNNs) — a foundational architecture for deep learning in image tasks.

5. Convolutional Neural Networks (CNNs)


5.1 What are CNNs?

CNNs are specialized neural networks designed to process data with grid-like topology —
mainly images. They automatically learn spatial hierarchies of features through convolutional
layers, reducing the need for manual feature extraction.

5.2 Key Components

●​ Convolutional Layers: Apply filters/kernels that slide over input data to extract
features like edges, textures.​

●​ Pooling Layers: Downsample feature maps to reduce dimensionality and


computation.​

●​ Fully Connected Layers: After convolution/pooling, these layers perform


classification.​

5.3 How Convolution Works

Convolution uses small matrices (filters) to scan the input and produce feature maps.

For example, a 3x3 filter moves over the image, computing dot products between the filter
weights and input patch.
5.4 Example: Simple CNN with Keras
python
CopyEdit
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load data
(train_images, train_labels), (test_images, test_labels) =
mnist.load_data()

# Preprocess data
train_images = train_images.reshape((60000, 28, 28,
1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28,
1)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Define model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

# Compile model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train model
model.fit(train_images, train_labels, epochs=5, batch_size=64)

# Evaluate model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

5.5 Why CNNs Work Well for Images

●​ Local connectivity: Filters focus on small regions.​

●​ Parameter sharing: Same filter applies across the image, reducing parameters.​

●​ Translation invariance: CNNs detect features regardless of location.​

6. Recurrent Neural Networks (RNNs)


6.1 What Are RNNs?

RNNs are specialized for sequential data, such as time series or text. Unlike feedforward
networks, RNNs have loops allowing information to persist across steps.

6.2 Basic Structure

At each time step ttt, RNN takes input xtx_txt​and previous hidden state ht−1h_{t-1}ht−1​,
producing new hidden state hth_tht​:

ht=tanh⁡(Wxhxt+Whhht−1+bh)h_t = \tanh(W_{xh} x_t + W_{hh} h_{t-1} +


b_h)ht​=tanh(Wxh​xt​+Whh​ht−1​+bh​)

Output yty_tyt​can be computed from hth_tht​.

6.3 Problems with Basic RNNs

●​ Vanishing and exploding gradients make training hard for long sequences.​

●​ Solutions: LSTM and GRU architectures.​

7. LSTM and GRU


7.1 Long Short-Term Memory (LSTM)

LSTM adds gating mechanisms to control information flow:


●​ Forget gate: Decides what to discard.​

●​ Input gate: Decides what to store.​

●​ Output gate: Decides what to output.​

This helps retain long-term dependencies.

7.2 Gated Recurrent Unit (GRU)

Simplified version of LSTM with combined gates, often faster to train.

8. Example: Simple LSTM for Text Classification (Keras)


python
CopyEdit
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
texts = ['I love machine learning', 'Deep learning is amazing', 'I
dislike spam emails']
labels = [1, 1, 0]

# Tokenize and pad sequences


tokenizer = Tokenizer(num_words=100)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(sequences, maxlen=5)

# Build model
model = Sequential([
Embedding(100, 8, input_length=5),
LSTM(16),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(padded, labels, epochs=10)

# Predict
print(model.predict(padded))

9. Attention Mechanisms and Transformers


9.1 What is Attention?

Attention allows models to focus on important parts of input when producing output,
improving performance on long sequences.

9.2 Transformers

Introduced by Vaswani et al. (2017), transformers use self-attention layers and have
revolutionized NLP, powering models like BERT and GPT.

10. Summary
●​ Deep learning uses multi-layer neural networks.​

●​ CNNs are great for images.​

●​ RNNs (especially LSTM/GRU) handle sequential data.​

●​ Attention and transformers have advanced NLP massively.​

●​ Python frameworks like TensorFlow and PyTorch enable practical implementations.

You might also like