Deep Learning
• Deep structured learning or hierarchical learning or deep learning in
short is part of the family of machine learning methods which are
themselves a subset of the broader field of Artificial Intelligence.
• Deep learning is a class of machine learning algorithms that use
several layers of nonlinear processing units for feature extraction and
transformation. Each successive layer uses the output from the
previous layer as input.
• Deep neural networks, deep belief networks and recurrent neural
networks have been applied to fields such as computer vision, speech
recognition, natural language processing, audio recognition, social
network filtering, machine translation, and bioinformatics where they
produced results comparable to and in some cases better than human
experts have.
The environment set up for Python
Deep Learning
• Python 2.7+
• Scipy with Numpy
• Matplotlib
• Theano
• Keras
• TensorFlow
AI , ML and DL
• Artificial Intelligence (AI) is any code, algorithm or technique that
enables a computer to mimic human cognitive behaviour or
intelligence.
• Machine Learning (ML) is a subset of AI that uses statistical methods
to enable machines to learn and improve with experience.
• Deep Learning is a subset of Machine Learning, which makes the
computation of multi-layer neural networks feasible.
• Machine Learning is seen as shallow learning while Deep Learning is
seen as hierarchical learning with abstraction.
The importance of feature
engineering
• Feature engineering is the process of selecting, transforming, or
creating the most relevant variables, known as "features," from raw
data to use in machine learning models.
• In traditional machine learning, feature engineering is often a manual
and time-consuming process that requires domain expertise.
However, one of the advantages of deep learning is that it can
automatically learn relevant features from the raw data, reducing the
need for manual intervention.
Why is Deep Learning Important?
• The reasons why deep learning has become the industry standard:
• Handling unstructured data: Models trained on structured data can easily
learn from unstructured data, which reduces time and resources in
standardizing data sets.
• Handling large data: Due to the introduction of graphics processing units
(GPUs), deep learning models can process large amounts of data with
lightning speed.
• High Accuracy: Deep learning models provide the most accurate results in
computer visions, natural language processing (NLP), and audio processing.
• Pattern Recognition: Most models require machine learning engineer
intervention, but deep learning models can detect all kinds of patterns
automatically.
Deep neural networks
• What makes a neural network "deep" is the number of layers it has
between the input and output.
• A deep neural network has multiple layers, allowing it to learn more
complex features and make more accurate predictions.
• The "depth" of these networks is what gives deep learning its name
and its power to solve intricate problems.
How Deep Learning Works
• Deep learning uses feature extraction to recognize similar features of
the same label and then uses decision boundaries to determine which
features accurately represent each label. In the cats and dogs
classification, the deep learning models will extract information such
as the eyes, face, and body shape of animals and divide them into two
classes.
• The deep learning model consists of deep neural networks. The
simple neural network consists of an input layer, a hidden layer, and
an output layer. Deep learning models consist of multiple hidden
layers, with additional layers that the model's accuracy has improved.
What is Deep Learning Used For?
• Computer Vision
• Computer vision (CV) is used in self-driving cars to detect objects and
avoid collisions. It is also used for face recognition, pose estimation,
image classification, and anomaly detection.
• Automatic Speech Recognition
• Automatic speech recognition (ASR) is used by billions of people
worldwide. It is in our phones and is commonly activated by saying
"Hey, Google" or "Hi, Siri." Such audio applications are also used for
text-to-speech, audio classification, and voice activity detection.
What is Deep Learning Used For?
• Generative AI
• Generative AI has seen a surge in demand as CryptoPunk NFT just sold
for $1 million. CryptoPunk is a generative art collection that was created
using deep learning models. The introduction of the GPT-4 model by
OpenAI has revolutionized the text generation domain with its powerful
ChatGPT tool; now, you can teach models to write an entire novel or
even write code for your data science projects.
• Translation
• Deep learning translation is not limited to language translation, as we
are now able to translate photos to text by using OCR, or translate text
to images by using NVIDIA GauGAN2 .
What is Deep Learning Used For?
• Time Series Forecast
• Time series forecasting is used for predicting market crashes, stock prices, and
changes in the weather. The financial sector survives on speculation and future
projections. Deep learning and time series models are better than humans in
detecting patterns and so are pivotal tools in this and similar industries.
• Automation
• Deep learning is used for automating tasks, for example, training robots for
warehouse management. The most popular application is playing video games
and getting better at solving puzzles. Recently, OpenAI's Dota AI beat pro team
OG, which shocked the world as people were not expecting all five bots to
outsmart the world champions.
What is Deep Learning Used For?
• Customer Feedback
• Deep learning is used for handling customers' feedback and
complaints. It is used in every chatbot application to provide seamless
customer services.
• Biomedical
• This field has benefited the most with the introduction of deep
learning. DL is used in biomedicine to detect cancer, build stable
medicine, for anomaly detection in chest X-rays, and to assist medical
equipment.
Deep Learning Models
• Supervised Learning
• Supervised learning uses a labeled dataset to train models to either
classify data or predict values. The dataset contains features and target
labels, which allow the algorithm to learn over time by minimizing the
loss between predicted and actual labels. Supervised learning can be
divided into classification and regression problems.
• Classification
• The classification algorithm divides the dataset into various categories
based on feature extractions. The popular deep learning models are
ResNet50 for image classification and BERT (language model)) for text
classification.
Deep Learning Models
• Regression
• Instead of dividing the dataset into categories, the regression model
learns the relationship between input and output variables to predict
the outcome. Regression models are commonly used for predictive
analysis, weather forecasting, and predicting stock market
performance. LSTM and RNN are popular deep learning regression
models.
Deep Learning Models
• Unsupervised Learning
• Unsupervised learning algorithms learn the pattern within an
unlabeled dataset and create clusters. Deep learning models can learn
hidden patterns without human intervention and these models are
often used in recommendation engines.
• Unsupervised learning is used for grouping various species, medical
imaging, and market research. The most common deep learning
model for clustering is the deep embedded clustering algorithm.
Deep Learning Models
• Reinforcement Learning
• Reinforcement learning (RL) is a machine learning method where agents
learn various behaviors from the environment. This agent takes random
actions and gets rewards. The agent learns to achieve goals by trial and
error in a complex environment without human intervention.
• Just like a baby with encouragement from its parents learns to walk, the
AI learns to perform certain tasks by maximizing rewards, and the
designer sets the rewards policy. Recently, RL has seen high demands in
automation due to advancements in robotics, self-driving cars,
defeating pro players in games, and landing rockets back to earth.
Generative Adversarial Networks
• Generative adversarial networks (GANs) use two neural networks, and
together, they produce synthetic instances of original data. GANs have
gained a lot of popularity in recent years as they are able to mimic
some of the great artists to produce masterpieces. They are widely
used for generating synthetic art, video, music, and texts.
Graph Neural Network
• A graph neural network (GNN) is a type of deep learning architecture
that directly operates on graph structures. GNNs are applied in large
dataset analysis, recommendation systems, and computer visions.
• They are also used for node classification, link prediction, and
clustering.
Natural Language Processing
• Natural language processing (NLP) uses deep learning technology to
aid computers to learn a natural human language. NLP uses deep
learning to read, decipher, and understand human language. It is
widely used for processing speech, text, and images. The introduction
of transfer learning has taken NLP to the next level as we are able to
fine-tune the model with a few samples and achieve state-of-the-art
performance.
NLP can be divided into multiple
fields:
• Translation: translating languages, molecular structure, and
mathematical equations
• Summarization: summarizing large chunks of text into a few lines
while maintaining the key information.
• Classification: dividing the text into various categories.
• Generation: text to text generation; it can be used to generate entire
essays with a single line of text.
• Conversational: Virtual assistant, retaining past knowledge of
conversation and mimicking human conversations.
NLP can be divided into multiple
fields:
• Answering questions: AI answers questions by using Q&A data.
• Feature extraction: to detect patterns in text or extract information such as
"name entity recognition" and "part of speech".
• Sentence similarities: evaluating similarities between various texts.
• Text to speech: converting text to audible speech.
• Automatic speech recognition: understanding various sounds and converting
them into text.
• Optical character recognition: extraction of text data from images.
*try Hugging Face Spaces. The Spaces hosts all types of web applications that you
can play around with to get inspiration for your NLP project.
Deep Learning
Concepts
Activation Functions
• In neural networks, the activation function produces output decision boundaries
and is used to improve performance of the model. The activation function is a
mathematical expression that decides whether the input should pass through a
neuron or not based on its significance. It also provides non-linearity to networks.
Without activation function, the neural network becomes a simple linear
regression model.
• There are several types of activation functions:
• Tanh
• ReLU
• Sigmoid
• Linear
• Softmax
• Swish
• These functions produce various output boundaries. ith multiple layers and
activation functions, you can solve any complex problem.
Loss Function
• The loss function is the difference between actual and predicted values.
It allows neural networks to track the model's overall performance.
Depending on specific problems, we chose a certain type of function,
for example, mean squared error.
• Loss = Sum (Predicted - Actual)²
• The most used loss functions in deep learning are:
• Binary cross-entropy
• Categorical hinge
• Mean squared error
• Huber
• Sparse categorical cross-entropy
Backpropagation
• In forwarding propagation, we initialize our neural network with
random inputs to produce an output that is random too. To make our
model perform better, we adjust weights randomly using
backpropagation. To track the model's performance, we need a loss
function that will find global minima to maximize the model's
accuracy.
Stochastic Gradient Descent
• Gradient descent is used to optimize loss function by changing
weights in a controlled way to achieve minimum loss. Now we have
an objective, but we need direction on whether to increase or
decrease the weights to achieve better performance. The derivative
of the loss function will give us direction and we can use it to update
the weights of the network.
• The equation below shows how weights are updated using gradient
descent.
• w = w -Jw
Stochastic Gradient Descent
• In stochastic gradient descent, samples are divided into batches
instead of using the entire dataset to optimize gradient descent. This
is useful if you want to achieve minimum loss faster and optimize
computational power.
Hyperparameter
• Hyperparameters are the tunable parameters adjusted before running
the training process. These parameters directly affect model
performance and help you achieve faster global minima.
• List of most used hyperparameters:
• Learning rate: step size of each iteration and can be set from 0.1 to 0.0001. In
short, it determines the speed at which the model learns.
• Batch size: number of samples passed through a neural network at a time.
• Number of epochs: an iteration of how many times the model changes
weights. Too many epochs can cause models to overfit and too few can cause
models to underfit, so we have to choose a medium number.
Popular Algorithms
Convolutional Neural Networks
• The convolutional neural network (CNN) is a feed-forward neural
network capable of processing a structured array of data. It is widely
used for computer vision applications such as image classification.
Convolutional Neural Networks
• CNNs are good at recognizing patterns, lines, and shapes. The CNN
consists of a convolutional layer, pooling layer, and output layer (fully
connected layers). The image classification models usually contain
multiple convolution layers, followed by pooling layers, as additional
layers increase the accuracy of the model.
Recurrent Neural Networks
• Recurrent neural networks (RNN) are different from feed-forward
networks as the output of the layer is fed back into the input to
predict the output of the layer. This helps it perform better with
sequential data as it can store the information of previous samples to
predict future samples.
Recurrent Neural Networks
• In traditional neural networks, the output of the layers is calculated
based on the current input values, but in RNN the output is calculated
based on previous inputs too. This makes it quite good at predicting
the next word, forecasting stock prices, in AI chatbots, and anomaly
detection.
Long Short-term Memory Networks
• Long short-term memory networks (LSTM) are advanced types of
recurrent neural networks that can retain greater information on past
values. It solves vanishing gradient problems that exist in simple RNN.
Long Short-term Memory Networks
• The typical RNN consists of repeating neural networks with a single
tanh layer, whereas LSTM consists of four interactive layers that
communicate to process large sequences of data.
Deep Learning
Frameworks
Tensorflow
• Tensorflow (TF) is an open-source library used for creating deep
learning applications. It includes all the necessary tools for you to
experiment and develop commercial AI products. It supports both
CPU, GPU, and TPU for training complex models. TF was originally
developed by the Google AI team for internal use and is now available
to the public.
• The Tensorflow API is available for browser-based applications, mobile
devices, and TensorFlow Extended is ideal for production. TF has now
become the industry standard, and it is used for both academic
research and deploying deep learning models in production.
Keras
• Keras is a neural network framework written in Python and capable of
running on multiple frameworks such as Tensorflow and Theano.
Keras is an open-source library developed to enable fast
experimentation in deep learning so that you can easily convert your
concepts into working AI applications.
• Just like TF, Keras can also run on CPU, GPU, and TPU, based on
available hardware.
PyTorch
• PyTorch is the most popular and easiest deep learning framework. It
uses tensor instead of Numpy array to perform fast numerical
computation powered by GPU. PyTorch is mainly used for deep
learning and developing complex machine learning models.
Others
• Theano, DL4J, Caffe, Chainer, Microsoft CNTK
How to Evaluate CNN Models
• Accuracy: Accuracy is the percentage of test images that the CNN
correctly classifies.
• Precision: Precision is the percentage of test images that the CNN
predicts as a particular class and that are actually of that class.
• Recall: Recall is the percentage of test images that are of a particular
class and that the CNN predicts as that class.
• F1 Score: The F1 Score is a harmonic mean of precision and recall. It is
a good metric for evaluating the performance of a CNN on classes
that are imbalanced.
Different Types of CNN
Models
1. LeNet
• LeNet developed by Yann LeCun and his colleagues in the late 1990s
was one of the first successful CNNs designed for handwritten digit
recognition. It laid the foundation for modern CNNs and achieved
high accuracy on the MNIST dataset which contains 70,000 images of
handwritten digits (0-9).
2. AlexNet
• AlexNet is a CNN architecture that was developed by Alex Krizhevsky,
Ilya Sutskever and Geoffrey Hinton in 2012. It was the first CNN to win
the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) a
major image recognition competition. It consists of several layers of
convolutional and pooling layers followed by fully connected layers.
The architecture includes five convolutional layers, three pooling
layers and three fully connected layers.
3. Resnet
• ResNets (Residual Networks) are designed for image recognition and
processing tasks. They are renowned for their ability to train very
deep networks without overfitting making them highly effective for
complex tasks. It introduces skip connections that allow the network
to learn residual functions making it easier to train deep architecture.
4. GoogleNet
• GoogleNet also known as InceptionNet is renowned for achieving high
accuracy in image classification while using fewer parameters and
computational resources compared to other state-of-the-art CNNs.
The core component of GoogleNet allows the network to learn
features at different scales simultaneously to enhance performance.
5. VGG
• VGGs are developed by the Visual Geometry Group at Oxford, it uses
small 3x3 convolutional filters stacked in multiple layers, creating a
deep and uniform structure. Popular variants like VGG-16 and VGG-
19 achieved state-of-the-art performance on the ImageNet dataset
demonstrating the power of depth in CNNs.
Advantages of CNN
• High Accuracy: They can achieve high accuracy in various image
recognition tasks.
• Efficiency: They are efficient, especially when implemented on GPUs.
• Robustness: They are robust to noise and variations in input data.
• Adaptability: It can be adapted to different tasks by modifying their
architecture.
Disadvantages of CNN
• Complexity: It can be complex and difficult to train, especially for
large datasets.
• Resource-Intensive: It require significant computational resources for
training and deployment.
• Data Requirements: They need large amounts of labeled data for
training.
• Interpretability: They can be difficult to interpret making it
challenging to understand their predictions.