0% found this document useful (0 votes)
33 views69 pages

Generative AI

Generative AI is a subset of artificial intelligence that focuses on creating new content, such as images and text, using machine learning techniques like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). GANs consist of a generator and a discriminator that work together to produce realistic data, while VAEs learn a probabilistic representation of data for tasks like anomaly detection and feature extraction. The document also discusses applications of generative AI, including text-to-image and text-to-video generation, highlighting the architecture and training processes involved.

Uploaded by

sb505158
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views69 pages

Generative AI

Generative AI is a subset of artificial intelligence that focuses on creating new content, such as images and text, using machine learning techniques like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). GANs consist of a generator and a discriminator that work together to produce realistic data, while VAEs learn a probabilistic representation of data for tasks like anomaly detection and feature extraction. The document also discusses applications of generative AI, including text-to-image and text-to-video generation, highlighting the architecture and training processes involved.

Uploaded by

sb505158
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Gen AI: The Next

Level of Intelligence
Introduction
 Generative AI is a subset of artificial intelligence.

 It is used as an umbrella term to describe machine


learning solutions trained on massive amounts of data
 It focuses on creating new data, images, or fresh and
innovative content, often through advanced techniques
like Generative Adversarial Networks(GAN) and
Variational Auto Encoders (VAEs).
RNN
GPT Generative Pre-Trained Transformers
What is Generative AI?

A class of artificial Generative AI Two popular types Applications of


intelligence systems leverages machine of generative Generative AI:
designed to learning techniques, models are Image synthesis,
generate new particularly deep Generative text generation,
content, often in the learning, to learn Adversarial music composition,
form of images, patterns and Networks and and more.
text, or other media structures from Variational
large datasets. Autoencoders.
Generative Adversarial Networks
GANs consist of two neural networks –
I) Generator
II) Discriminator
•The generator trying to maximize the probability of making the
discriminator mistakes its inputs as real.
•The discriminator guiding the generator to produce more realistic
images.
Generative Adversarial Networks (GANs) can be broken
down into three parts:
Generative: To learn a generative model, which describes how data is generated in terms of a
probabilistic model.

Adversarial: The word adversarial refers to setting one thing up against another. This means
that, in the context of GANs, the generative result is compared with the actual images in the data
set. A mechanism known as a discriminator is used to apply a model that attempts to distinguish
between real and fake images.

Networks: Use deep neural networks as artificial intelligence (AI) algorithms for training
purposes.
I) Generator
The generator creates
new content by
transforming random
noise into data that
resembles the training
set.
The working formula for updating the parameters
of the generator in a GAN can be expressed as
follows:

Where,
represents the parameters of the generator at
iteration t.
•(G(z)). represents the discriminator's output when the generator generates a sample
G(z) from random noise z.
•log(1−D(G(z))) is the learning rate, controlling the size of the update step is the
objective function for the generator. The generator aims to maximize this value is the
learning rate, controlling the size of the update step.
•e, as it corresponds to the probability of the discriminator making a mistake when
classifying the generated sample as real.
•n is learning rate coefficient
II) Discriminator
The discriminator in a GAN is simply a classifier.

 It tries to distinguish real data from the data created by


the generator.
 It could use any network architecture appropriate to the
type of data it's classifying.

The discriminator's training data comes from two


sources:

i) Real data instances, such as real pictures of people.


These instances are used as positive examples during
training.

ii) Fake data instances created by the generator. These


instances are used as negative examples during training.
The working formula for updating the parameters of the discriminator in a
GAN can be expressed as follows:

Where:
• θD(t)​represents the parameters of the discriminator at iteration t.
• η is the learning rate, controlling the size of the update step.
• D(x) represents the discriminator's output when given a real data sample x.
• G(z) represents the output of the generator when given random noise z.
log(D(x)) and log(1−D(G(z))) are the objective functions for the discriminator.
The discriminator aims to minimize the sum of these terms, as it corresponds to
the negative log likelihood of correctly classifying real and generated samples.
Training phases of Generator
 Forward Pass:

 The generator takes random noise vectors as input and generates fake samples (e.g., images).

 These fake samples are passed through the discriminator to obtain its predictions.

 The generator's objective is to generate samples that the discriminator incorrectly classifies as
real.
 Loss Computation:

 The loss for the generator is typically based on the discriminator's predictions for the generated
samples.
 The generator aims to minimize this loss, indicating that it is producing realistic samples that can
"fool" the discriminator.
 Backpropagation:

 The gradients of the generator's parameters with respect to its loss are computed using
backpropagation.
 These gradients are then used to update the generator's parameters through optimization
algorithms like stochastic gradient descent (SGD) or Adam.
Block diagram of generative AI
Auto encoder
Variational Autoencoders (VAEs)
Evaluates the generated content and
compares it to real examples from the training
data.
Provides a probabilistic manner for describing
an observation in latent space.
 It consists of an encoder network that maps
input data to a latent space, and a decoder
network that reconstructs the input data from
points in the latent space.
 VAEs are especially suited for learning a
latent representation of the input data Variational Autoencoders
because of their unique training method.
Different technical aspects of VAEs and GAN

Technical aspects of VAEs:

Latent Space Learning: VAEs learn a continuous, probabilistic latent space representation,
which is beneficial for capturing the underlying structure of raw data.
Data Generation: After training, VAEs can generate new data points by sampling from the
latent space. This is particularly useful for augmenting those datasets where certain types of
data may be scarce.
Anomaly Detection: VAEs can be used to detect anomalies in the data, by measuring how
well new data fits within the learned distribution.
Feature Extraction: The encoder part of a VAE can serve as a feature extractor, reducing the
dimensionality of the data while retaining important information.
Interpolation: VAEs can interpolate between different data points in the latent space, which
can be used to simulate gradual changes in particular conditions and their impact on that.
The main purpose of a dimensionality reduction method is to find the best
encoder/decoder pair among a given family. If we denote respectively E and D the
families of encoders and decoders we are considering, then the dimensionality
reduction problem can be written

where defines the reconstruction error measure between the input data
x and the encoded-decoded data d(e(x)). Notice finally that in the following we will
denote N the number of data, n_d the dimension of the initial (decoded) space and
n_e the dimension of the reduced (encoded) space.
The loss function of the variational autoencoder is the negative log-likelihood with a
regularizer.
The loss function li​for datapoint xi​is:

The first term is the reconstruction loss, or expected negative log-likelihood of the i-th
datapoint. The second term is a regularizer. This is the Kullback-Leibler divergence
between the encoder’s distribution qθ​(z∣x) and p(z). This divergence measures how
much information is lost (in units of nats) when using q to represent p.
In the variational autoencoder, p is specified as a standard Normal distribution with mean
zero and variance one, or p(z)=Normal(0,1).
VAE for network bandwidth saving by
compression
Technical aspects of GAN:

Data Augmentation: GANs can augment agricultural datasets by generating new samples that
increase the diversity and volume of the data.
High-Quality Generation: GANs are known for their ability to generate sharp and realistic
samples, which is crucial for creating synthetic agricultural data that closely resembles real-world
conditions.
Conditional Generation: Conditional GANs can generate data based on specific conditions or
parameters.
Privacy Preservation: GANs can be used to generate synthetic data that maintains the
statistical properties of the original data while preserving the privacy of sensitive information.
Challenges in Training: GANs are notoriously difficult to train, with issues such as mode
collapse and training instability. However, various techniques and modifications to the basic
architecture can mitigate these problems.
The goal of the discriminator is to correctly label generated images as false and
empirical data points as true. Therefore, we might consider the following to be the
loss function of the discriminator:

The goal of the generator is to confuse the discriminator as much as possible


such that it mislabels generated images as being true.

x: Real data, z: Latent vector, G(z): Fake data, D(x): Discriminator's evaluation of real
data, D(G(z)): Discriminator's evaluation of fake data, Error(a,b): Error
between a and b.
Generative AI: Diffusion Models

i) Diffusion models are adding noise to the images in the training data by a process called forward
diffusion process

ii) Reversing the process to recover the original image using reverse diffusion. These models can be
trained on large unlabeled datasets in an unsupervised manner.

Stable Diffusion: Stable Diffusion is a text-to-image model that produces unique


photorealistic images from text and image prompts. It originally launched in 2022. . A stable
diffusion model has four important elements

- Diffusion Probabilistic Model


- U-Net Architecture
- Latent Text Encoding
- Classifier-Free Guidance https://stablediffusionweb.com
How Stable Diffusion Model (Text to Image) Works?
Archiecture of Stable Diffusion Model?
The main architectural components of Stable Diffusion include a variational auto encoder,
forward and reverse diffusion, a noise predictor, and text conditioning.
Variational Auto-Encoder

 The variational autoencoder consists of a separate encoder and decoder .

Given an input image , the encoder produces latent variables representing the image's latent
space distribution. The decoder then reconstructs the input image from these latent variables.
Forward diffusion

 Forward diffusion progressively adds Gaussian noise to an image until all that remains is
random noise. It’s not possible to identify what the image was from the final noisy image.
 During training, all images go through this process. Forward diffusion is not further used
except when performing an image-to-image conversion.

Where,
continue

 Reverse diffusion

 This process is essentially a parameterized process that iteratively undoes the forward diffusion.

For example,we could train the model with only two images, like a cat and a dog. If we did, the reverse
process would drift towards either a cat or dog and nothing in between. In practice, model training involves
billions of images and uses prompts to create unique images.
Where,

Noise predictor (U-Net)

It is key for denoising images. Stable Diffusion uses a U-Net model to perform this.
U-Net models are convolutional neural networks originally developed for image segmentation in
biomedicine.
Stable Diffusion uses the Residual Neural Network (ResNet) model developed for computer vision.
The noise predictor estimates the amount of noise in the latent space and subtracts this from the
image. It repeats this process a specified number of times, reducing noise according to user-
specified steps. The noise predictor is sensitive to conditioning prompts that help determine the final
image.


continue

 Text conditioning
The most common form of conditioning is text prompts.

Text conditioning involves incorporating textual descriptions into the image


generation process. This is typically achieved by conditioning the VAE's latent
variables or the noise predictor on the text embedding.

Text Embedding:
Architecture of Stable Diffusion Model Works?
Generative AI: Text to Video Model
 Text-to-video generation is a specific instance of generative AI.
Workflow:
 Text Input: The process begins with a descriptive text input.
 Encoding & Representation: i) The text is encoded into a numerical
representation using
techniques like Word2Vec or BERT.
ii) This representation captures
semantic meaning and context.
 Conditional Generation:
 Generative models, such as Variational Autoencoders (VAEs) or
Generative Adversarial Networks (GANs), take the encoded text as
input.
 These models generate video frames or sequences based on the
provided description.
 The generator network produces visual content that aligns with the
continue

 Video Synthesis:

 The generated frames are stitched together to create a video.

 Techniques like optical flow estimation ensure smooth transitions between frames.

 Background music or ambient sounds can enhance the video's realism.

 Fine-tuning and Refinement:

 The initial output may need more perfection or coherence.


 Fine-tuning involves iteratively improving the generated video by adjusting
model parameters.
 Human feedback and reinforcement learning play a crucial role in this stage.
 Output: The final result is a video that visually represents the original text. In our
example, the video would depict a serene sunset scene with gliding birds over a tranquil
lake.

.
Text to Image Samples
Text: A serene sunset over a tranquil lake, with birds
gliding across the water
Large Language Model (LLM)
 Employing neural network techniques with
extensive parameters for advanced
language processing.
 It can understand human language based
text input and generate human-like
responses.
 These models are built using deep learning
techniques, specifically using a type of
neural network called the Transformer
architecture.
 Ex: OpenAI’s GPT-3, Google’s BERT, and
NVIDIA’s Megatron.
 Specifically designed for natural language
processing (NLP) tasks. These models
consist of multiple layers of interconnected
neurons,
Transformer-based LLM Architecture
 The general architecture of LLM consists of
many layers such as the feed forward
layers, embedding layers, attention layers.
A text which is embedded inside is
collaborated together to generate
predictions.
 Transformer-based models include
following components:
 Input Embeddings
 Positional Encoding
 Encoder layer
 Decoder Layer
 Multi-Head Attention
 Layer Normalization
 Output Layers
Step by step mathematical process of generated text using LLM

 Tokenization:
It's a preprocessing step where each token is assigned a
unique numerical representation.
Embedding Layer:
Input: Token indices x (corresponding to words or
subwords).
Embedding matrix: of size V, where V is the size of the
vocabulary and is the dimensionality of the embedding
space.
Continue


Transformer Encoder:
Input: Token embeddings .
Self Attention: i) Query, Key, and Value matrices: , , of size
where is the model's hidden dimension and is the dimension
of queries, keys, and values.
ii) Attention scores: Attention = softmax
Feed-Forward Neural Network:
Two linear transformations with ReLU activation:
+
Continue
Positional Encoding:

Input: Positional indices and positional embedding matrix of size where is the
maximum sequence length.
Operation:

Layer Normalization and Residual Connections:

Layer Normalization: LayerNorm= , where and are the mean and variance of
respectively and is a small constant to avoid division by zero.
Residual connection: LayerNorm where represents either self-attention or feed-
forward neural network output.
Continue
Softmax Layer:

Input: Output of the final transformer layer .

Softmax function: , where is the size of the vocabulary.

Output probabilities: ), where represents the next token in the vocabulary.

 Sampling:

Various sampling strategies can be employed, such as greedy decoding, nucleus


sampling, or temperature-adjusted sampling, to select the next token based on the
output probabilities.
Chain-of-Thought Prompting

Chain-of-Thought Prompting is a technique that enables LLMs to complex reasoning


generating a chain-of-thought, a series of intermediate reasoning steps.

Prompti Datasets and Performan


ng Example ce
Problems
Generative AI Vs LLM
Generative AI LLM
A wide array of AI systems dedicated to A specific category of generative AI
producing fresh and innovative content, models with a specialized focus on text-
spanning text, images, etc. based data.

These models are designed to create text LLMs specifically focus on language
or other forms of media based on modelling. These models are trained on
patterns vast amounts of text data and learn the
statistical properties of language.

Encompasses various types of models LLMs are specialized in understanding


that can generate content beyond just and generating human-like text based on
textual data, including images or even patterns they have learned from training
music. data.
Developing and fine-tuning generative AI LLMs, especially pre-trained models, are
models can be challenging and require more accessible and user-friendly for text-
expertise in machine learning and based tasks, requiring less specialized
domain-specific knowledge. expertise.
Use Cases

HEALTHCARE: FINANCE: FRAUD IOT: CUSTOMIZED


PRIVACY- DETECTION DECENTRALIZED EXAMPLES BASED
PRESERVING WITHOUT LEARNING FOR ON YOUR AUDIENCE
MEDICAL IMAGE COMPROMISING SMART DEVICES OR INDUSTRY
ANALYSIS SENSITIVE DATA
Applications
Challenges in Generative AI

Ethical Considerations Overfitting and Lack of Computational


Control Resources
i) Generative models can i) Generative models may i) Training large-scale generative
inadvertently produce biased or overfit to training data, leading models demands substantial
inappropriate content, raising to poor generalization. computational resources.
concerns about responsible use. ii) Lack of fine-grained control
ii) Efficiently scaling up and
ii) Ensuring that AI-generated over generated content can be
outputs align with ethical problematic, especially when deploying these models remains
guidelines is crucial. precision is required. a challenge.
Explainability and Security Concerns Legal and Regulatory
Interpretability Issues
i) Understanding how i) Adversarial attacks can i) As generative AI becomes
generative models arrive at exploit vulnerabilities in more prevalent, legal
specific outputs is essential. generative models frameworks need to adapt
ii) Lack of transparency ii) Ensuring robustness ii) Intellectual property
hinders trust and against attacks is critical for rights, privacy, and liability
interpretability real-world deployment are areas of concern
Case Study

Privacy-Preserving Fraud Detection in IoT and


Healthcare Finance Decentralized
Analysis Learning
Generative Music
AI Music Generation refers to using artificial intelligence (AI)
techniques to generate music. It involves training algorithms
on large datasets of existing music to analyze
A. Patterns
B. Structures
C. styles
Once trained, these algorithms can generate authentic
musical compositions, ranging from melodies and harmonies
to complete tracks, often without direct human intervention.
The AI music generation can enhance musical creativity, help
What is the Future of AI in Music Generation?

AI is expected to have a significant impact.


• Personalized Music Creation: AI music generators are becoming increasingly sophisticated, capable of creating
personalized music based on individual preferences and moods. This could revolutionize the way we listen to music, with
AI creating unique soundtracks for our daily lives.
• Collaboration Between AI and Artists: AI is not here to replace musicians but to collaborate with them. Artists can use
AI to explore new musical styles and ideas, pushing the boundaries of creativity. AI can handle the technical aspects of
composition, allowing artists to focus on the emotional and creative side of music.
• Democratization of Music Production: AI music generators can make music creation accessible to everyone,
regardless of their musical background or skills. This democratization of music production could lead to an explosion of
creativity and diversity in the music industry.
• Live Performances: AI could also play a role in live performances, with AI DJs or virtual artists performing in concerts or
music festivals. These performances could be interactive, with the AI responding to the crowd’s reactions in real-time.
• Music Therapy: AI-generated music could be used in therapy, helping to reduce stress, improve mood, and promote
mental well-being. By analyzing a person’s biometric data, AI could create personalized music to support their emotional
health.
Music generation
GenAI IoT
AI can enhance IoT:

1.Predictive Maintenance: AI algorithms can analyze data collected from IoT sensors to predict when equipment is
likely to fail, allowing for proactive maintenance to be performed, reducing downtime and costs.

2.Anomaly Detection: AI can identify abnormal patterns or behavior in IoT data, helping to detect potential security
breaches, equipment malfunctions, or other issues.

3.Optimization: AI algorithms can optimize energy usage, resource allocation, and other processes based on real-time
data collected from IoT devices.

4.Personalization: AI can analyze data from IoT devices to personalize experiences for users, such as adjusting settings
based on individual preferences or behaviors.

5.Autonomous Decision Making: AI can enable IoT devices to make decisions autonomously based on the data they
collect and analyze, reducing the need for human intervention.

6.Natural Language Processing (NLP): Integrating NLP with IoT devices allows users to interact with them using voice
commands or natural language, enhancing usability and accessibility.

7.Image and Video Recognition: AI-powered image and video recognition can be used in IoT applications for tasks such
as security surveillance, quality control, or identifying objects in smart home environments.

Combining AI with IoT has the potential to revolutionize various industries by enabling smarter, more efficient, and more
autonomous systems
Generative Internet of Drone
Things
Generative AI with IoT and drone technology opens up several possibilities:
• Autonomous Exploration and Mapping: Drones equipped with generative AI can autonomously explore unknown
environments and create detailed maps or 3D reconstructions in real-time.
• Dynamic Mission Planning: Generative AI algorithms can analyze real-time data from IoT sensors and other sources
to dynamically adjust drone flight paths and mission objectives based on changing environmental conditions or user
requirements.
• Adaptive Response to Emergencies: In emergency situations such as natural disasters or search-and-rescue
missions, drones equipped with generative AI can quickly assess the situation, generate adaptive response strategies,
and coordinate with other IoT devices or rescue teams to provide assistance.
• Environmental Monitoring and Conservation: Drones equipped with sensors can collect data on environmental
factors such as air quality, water pollution, or wildlife habitats. Generative AI can analyze this data to identify patterns,
predict future trends, and generate recommendations for conservation efforts or environmental management strategies.

The integration of generative AI with the Internet of Drone Things has the potential to revolutionize various industries,
including agriculture, infrastructure monitoring, public safety, and environmental conservation. By enabling drones to
collect, analyze, and respond to data in real-time, this technology can lead to more efficient, intelligent, and sustainable
solutions.
The ethical considerations of GenAI
The AI control problem highlights the increasingly complex challenge of ensuring artificial
intelligence continues to align with humanity’s values and work for us, not against us. The human
biases hidden in the algorithms which power these pervasive technologies are concerning now – what
will happen when the tech advances, furthering the already muddy waters of technology compliance?

Another challenge to consider is the black box problem which explains the difficulty to understand
how some generative models arrive at their outputs. How do we know if the information we receive is
accurate and devoid of bias or discrimination?

This leads to legal concerns about the potential for GenAI models to generate harmful or misleading
content, which raises questions about responsible AI usage and accountability. There are serious
concerns surrounding the use of this technology in the political space, where viewers are to duped by
deep fake technology showing political figures expressing political views that are not theirs. This has
immense implications regarding the impact this can have on shaping the political sentiments of voters
via fake content.

GenAI poses intellectual property and copyright issues. Currently, the legal implications of GenAI
content generated by existing content from original creatives are unclear.
Conclusion
 Synergistic Impact:
The combination of Generative AI and
Federated Learning presents a synergistic
impact on the field of machine learning,
addressing challenges related to privacy,
decentralized processing, and model
generalization, making it a potent solution
for various applications.
Privacy-Driven Innovation:

The amalgamation of Generative AI and Federated


Learning prioritizes privacy in data-driven
applications. This approach ensures that
advancements in AI come with ethical
considerations, providing a framework for
responsible and secure AI development.
Versatile Applications and Future Potential:

It's evident that the union of Generative AI and


Federated Learning has far-reaching implications.
Its versatile applications, from healthcare to
finance and IoT, coupled with ongoing research,
position it as a driving force for the future of
collaborative, privacy-preserving, and efficient
machine learning.
CSI Kolkata GenAI hands on
training March – April – May 2024
* 3 days a week
* Wed – Sat – Sunday
* LLM
* GenAI
References
I. Peng, C., Guo, Y., Chen, Y., Rui, Q., Yang, Z., & Xu, C. (2023,
August). Fedgm: Heterogeneous federated learning via
generative learning and mutual distillation. In European
Conference on Parallel Processing (pp. 339-351). Cham:
Springer Nature Switzerland.

II. Fan, C., & Liu, P. (2020). Federated generative adversarial


learning. In Pattern Recognition and Computer Vision: Third
Chinese Conference, PRCV 2020, Nanjing, China, October 16–
18, 2020, Proceedings, Part III 3 (pp. 3-15). Springer
International Publishing.

III. Zhang, J., Chen, B., Cheng, X., Binh, H. T. T., & Yu, S. (2020).
PoisonGAN: Generative poisoning attacks against federated
learning in edge computing systems. IEEE Internet of Things
Journal, 8(5), 3310-3322.

IV. Zhang, J., Zhao, L., Yu, K., Min, G., Al-Dubai, A. Y., & Zomaya, A.
Y. (2023). A Novel Federated Learning Scheme for Generative
Adversarial Networks. IEEE Transactions on Mobile Computing.

You might also like