Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Generative AI with Python and PyTorch
Generative AI with Python and PyTorch

Generative AI with Python and PyTorch: Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications , Second Edition

Arrow left icon
Profile Icon Joseph Babcock Profile Icon Raghav Bali
Arrow right icon
$38.99 $43.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Apr 2025 450 pages 2nd Edition
eBook
$38.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Arrow left icon
Profile Icon Joseph Babcock Profile Icon Raghav Bali
Arrow right icon
$38.99 $43.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Apr 2025 450 pages 2nd Edition
eBook
$38.99 $43.99
Paperback
$54.99
Subscription
Free Trial
eBook
$38.99 $43.99
Paperback
$54.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Generative AI with Python and PyTorch

Introduction to Generative AI: Drawing Data from Models

At the Colorado State Fair in 2022, the winning entry was a fantastical sci-fi landscape created by video game designer Jason Allen titled Théâtre D’opéra Spatial (Figure 1.1). The first-prize art was remarkable both from the dramatic subject matter and due to the unusual origin of this image. Unlike the majority of other artworks entered into the competition, Théâtre D’opéra Spatial was not painted using oil or watercolors, nor was its “creator” even human; rather, it is an entirely digital image produced by a sophisticated machine learning algorithm called Midjourney. Jason used Midjourney, which has been trained on diverse images, along with natural language instructions to create the image, rather than a brush and canvas.

Figure 1.1: Théâtre D’opéra Spatial1

Figure 1.1: Théâtre D’opéra Spatial1

Visual art is far from the only area in which machine learning has demonstrated astonishing results. Indeed, if you have paid attention to the news in the last few years, you have likely seen many stories about the groundbreaking results of modern AI systems applied to diverse problems, from the hard sciences to online avatars and interactive chat. Deep neural network models, such as the one powering Midjourney, have shown amazing abilities to generate realistic human language2, author computer code3, and solve school exams with human-level ability2. Such models can also classify X-ray images of human anatomy on the level of trained physicians4, beat human masters at both classic board games such as Go (an Asian form of chess) as well as multiplayer computer games5, 6, and translate French into English with amazing sensitivity to grammatical nuances7.

Discriminative versus generative models

However, these latter examples of AI differ in an important way from the model that generated Théâtre D’opéra Spatial. In all of these other applications, the model is presented with a set of inputs—data such as English text, or X-ray images—that is paired with a target output, such as the next word in a translated sentence or the diagnostic classification of an X-ray. Indeed, this is probably the kind of AI model you are most familiar with from prior experiences in predictive modeling; they are broadly known as discriminative models, whose purpose is to create a mapping between a set of input variables and a target output. The target output could be a set of discrete classes (such as which word in the English language appears next in a translation), or a continuous outcome (such as the expected amount of money a customer will spend in an online store over the next 12 months).

However, this kind of model, in which data is “labeled” or “scored,” represents only half of the capabilities of modern machine learning. Another class of algorithms, such as the one that generated the winning entry in the Colorado State Art Fair, doesn’t compute a score or label from input variables but rather generates new data. Unlike discriminative models, the input variables are often vectors of numbers that aren’t related to real-world values at all and are often even randomly generated. This kind of model, known as a generative model, which can produce complex outputs such as text, music, or images from random noise, is the topic of this book.

Even if you did not know it at the time, you have probably seen other instances of generative models mentioned in the news alongside the discriminative examples given previously. A prominent example is deepfakes—videos in which one person’s face has been systematically replaced with another’s by using a neural network to remap the pixels8 (Figure 1.2).

Figure 1.2: A deepfake image9

Figure 1.2: A deepfake image9

Maybe you have also seen stories about AI models that generate “fake news,” which scientists at the firm OpenAI were initially terrified to release to the public due to concerns it could be used to create propaganda and misinformation online (Figure 1.3)11.

Figure 1.3: A chatbot dialogue created using GPT-210

Figure 1.3: A chatbot dialogue created using GPT-210

In these and other applications—such as Google’s voice assistant Duplex, which can make a restaurant reservation by dynamically creating conservation with a human in real time12, or even software that can generate original musical compositions13—we are surrounded by the outputs of generative AI algorithms. These models are able to handle complex information in a variety of domains: creating photorealistic images or stylistic “filters” on pictures, synthetic sound, conversational text, and even rules for optimally playing video games. You might ask: Where did these models come from? How can I implement them myself?

Implementing generative models

While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capture the complex variation in data such as images or language. In this book, we will focus on implementing these deep-learning-based generative models for many different applications using PyTorch. PyTorch is a Python programming library used to develop and produce deep learning models. It was open-sourced by Meta (formerly Facebook) in 2016 and has become one of the most popular libraries for the research and deployment of neural network models. We’ll execute PyTorch code on the cloud using Google’s Colab notebook environment, which allows you to access world-class computing infrastructure including graphic processing units (GPUs) and tensor processing units (TPUs) on demand and without the need for onerous environment setups. We’ll also leverage the Pipelines library from Hugging Face, which provides an easy interface to run experiments using a catalog of some of the most sophisticated models available.

In the following chapters, you will learn not only the underlying theory behind these models but also the practical skills to implement them in popular programming frameworks. In Chapter 2, we’ll review how, since 2006, an explosion of research in “deep learning” using large neural network models has produced a wide variety of generative modeling applications. Innovations arising from this research included variational autoencoders (VAEs), which can efficiently generate complex data samples from random numbers that are “decoded” into realistic images, using techniques we will describe in Chapter 11. We will also describe a related image generation algorithm, the generative adversarial network (GAN), in more detail in Chapters 12-14 of this book through applications for image generation, style transfer, and deepfakes. Conceptually, the GAN model creates a competition between two neural networks.

One (termed the generator) produces realistic (or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers that are “decoded” into realistic images by applying a mathematical transformation. In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration. The second network, known as the discriminator, attempts to classify whether a picture comes from a set of real-world images, or whether it was created by the generator. Thus, the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its output becomes closer and closer to the historical examples it is designed to copy. In Chapter 11, we’ll also describe the algorithm used in Théâtre D’opéra Spatial, the latent diffusion model, which builds on VAEs to provide scalable image synthesis based on natural language prompts from a human user.

Another key innovation in generative models is in the domain of natural language data—by representing the complex interrelationship between words in a sentence in a computationally scalable way, the Transformer network and the Bidirectional Encoder from Transformers (BERT) model built on top of it present powerful building blocks to generate textual data in applications such as chatbots and large language models (LLMs), which we’ll cover in Chapters 4 and 5. In Chapter 6, we will dive deeper into the most famous open-source models in the current LLM landscape, including Llama. In Chapters 7 and 8.

Before diving into further details on the various applications of generative models and how to implement them in PyTorch, we will take a step back and examine how exactly generative models are different from other kinds of machine learning. This difference lies with the basic units of any machine learning algorithm: probability and the various ways we use mathematics to quantify the shape and distribution of data we encounter in the world. In the rest of this chapter, we will cover the following:

  • How we can use the statistical rules of probability to describe how machine learning models represent the shapes of the datasets we study
  • The difference between discriminative and generative models, based on the kinds of probability rules they embody
  • Examples of areas where generative modeling has been applied: image generation, style transfer, chatbots and text synthesis, and reinforcement learning

The rules of probability

At the simplest level, a model, be it machine learning or a more classical method such as linear regression, is a mathematical description of how a target variable changes in response to variation in a predictive variable; that relationship could be a linear slope or any of a number of more complex mathematical transformations. In the task of modeling, we usually think of separating the variables in our dataset into two broad classes:

  • Independent data, by which we primarily mean inputs to a model, is often denoted by X. For example, if we are trying to predict the grades of school students on an end-of-year exam based on their characteristics, we could think of several kinds of features:
    • Categorical: If there are six schools in a district, the school that a student attends could be represented by a six-element vector for each student. The elements are all 0, except for one that is 1, indicating which of the six schools they are enrolled in.
    • Continuous: The student heights or average prior test scores can be represented as continuous real numbers.
    • Ordinal: The rank of the student in their class is not meant to be an absolute quantity (like their height) but rather a measure of relative difference.
  • Dependent variables, conversely, are the outputs of our models and are denoted by the letter Y. Note that, in some cases, Y is a “label” that can be used to condition a generative output, such as in a conditional GAN. It can be categorical, continuous, or ordinal, and could be an individual element or multidimensional matrix (tensor) for each element of the dataset.

How can we describe the data in our model using statistics? In other words, how can we quantitatively describe what values we are likely to see, how frequently, and which values are more likely to appear together and others? One way is by asking how likely it is to observe a particular value in the data or the probability of that value. For example, if we were to ask what the probability of observing a roll of four on a six-sided die is, the answer is that, on average, we would observe a four once every six rolls. We write this as follows:

P(X=4) = 1⁄6 = 16.67%

Here, P denotes “probability of.” What defines the allowed probability values for a particular dataset? If we imagine the set of all possible values of a dataset—such as all values of a die—then a probability maps each value to a number between 0 and 1. The minimum is 0 because we cannot have a negative chance of seeing a result; the most unlikely result is that we would never see a particular value, or 0% probability, such as rolling a seven on a six-sided die. Similarly, we cannot have a greater than 100% probability of observing a result, represented by the value 1; an outcome with probability 1 is absolutely certain. This set of probability values associated with a dataset belongs to discrete classes (such as the faces of a die) or an infinite set of potential values (such as variations in height or weight). In either case, however, these values have to follow certain rules, the probability axioms described by the mathematician Andrey Kolmogorov in 193314:

  1. The probability of an observation (a die roll, a particular height) is a non-negative, finite number between 0 and 1.
  2. The probability of at least one of the observations in the space of all possible observations occurring is 1.
  3. The probability of distinct, mutually exclusive events (such as the rolls 1-6 on a die) is the sum of the probability of the individual events.

While these rules might seem abstract, we will see in Chapter 3 that they have direct relevance to developing neural network models. For example, an application of rule 1 is to generate the probability between 1 and 0 for a particular outcome in a softmax function for predicting target classes. For example, if our model is asked to classify whether an image contains a cat, dog, or horse, each potential class receives a probability between 0 and 1 as the output of a sigmoid function based on a deep neural network applying nonlinear, multi-layer transformations on the input pixels of an image we are trying to classify. Rule 3 is used to normalize these outcomes into the range 0–1, under the guarantee that they are mutually distinct predictions of a deep neural network (in other words, a real-world image logically cannot be classified as both a dog and cat, but rather a dog or cat, with the probability of these two outcomes additive). Finally, the second rule provides the theoretical guarantees that we can generate data at all using these models.

However, in the context of machine learning and modeling, we are not usually interested in just the probability of observing a piece of input data, X; we instead want to know the conditional probability of an outcome Y given the data X. Said another way, we want to know how likely a label for a set of data is, based on that data. We write this as the probability of Y given X, or the probability of Y conditional on X:

P(Y|X)

Another question we could ask about Y and X is how likely they are to occur together—their joint probability—which can be expressed using the preceding conditional probability expression as:

P(X, Y) = P(Y|X)P(X) = P(X|Y)P(Y)

This formula expressed the probability of X and Y. In the case of X and Y being completely independent of one another, this is simply their product:

P(X|Y)P(Y) = P(Y|X)P(X) = P(X)P(Y)

You will see that these expressions become important in our discussion of complementary priors in Chapter 4, and the ability of restricted Boltzmann machines to simulate independent data samples. They are also important as building blocks of Bayes’ theorem, which we describe next.

Discriminative and generative modeling, and Bayes’ theorem

Now, let us consider how these rules of conditional and joint probability relate to the kinds of predictive models that we build for various machine learning applications. In most cases—such as predicting whether an email is fraudulent or the dollar amount of the future lifetime value of a customer—we are interested in the conditional probability, P(Y|X=x), where Y is the set of outcomes we are trying to model and X is the input features, and x is a particular value of the input features. For example, we are trying to calculate the probability that an email is fraudulent based on the knowledge of the set of words (the x) in the message. This approach is known as discriminative modeling15, 16, 17. Discriminative modeling attempts to learn a direct mapping between the data, X, and the outcomes, Y.

Another way to understand discriminative modeling is in the context of Bayes’ theorem18, which relates the conditional and joint probabilities of a dataset, as follows:

P(Y|X) = P(X|Y)P(Y)/P(X) = P(X, Y)/P(X)

As a side note, the theorem was published two years following the author’s death, and in a foreword, Richard Price described it as a mathematical argument for the existence of God, perhaps appropriate given that Thomas Bayes served as a Reverend during his life. In the formula for Bayes’ theorem, the expression P(X|Y)/P(X) is known as the likelihood or the supporting evidence that the observation X gives to the likelihood of observing Y, P(Y) is the prior or the plausibility of the outcome, and P(Y|X) is the posterior or the probability of the outcome given all the independent data we have observed related to the outcome thus far. Conceptually, Bayes’ theorem states that the probability of an outcome is the product of its baseline probability and the probability of the input data conditional on this outcome.

In the context of discriminative learning, we can thus see that a discriminative model directly computes the posterior; we could have a model of the likelihood or prior, but it is not required in this approach. Even though you may not have realized it, most of the models you have probably used in the machine learning toolkit are discriminative, such as:

  • Linear regression
  • Logistic regression
  • Random forests19, 20
  • Gradient-boosted decision trees (GBDTs)21
  • Support vector machines (SVMs)22

The first two (linear and logistic regression) models the outcome Y conditional on the data X using a Normal or Gaussian (linear regression) or sigmoidal (logistic regression) probability function. In contrast, the last three have no formal probability model—they compute a function (an ensemble of trees for random forests or GBDTs, or an inner product distribution for SVM) that maps X to Y, using a loss or error function to tune those estimates; given this nonparametric nature, some authors have argued that these constitute a separate class of “non-model” or “non-parametric” discriminative algorithms15.

In contrast, a generative model attempts to learn the joint distribution P(Y, X) of the labels and the input data. Recall that using the definition of joint probability:

P(X, Y) = P(X|Y)P(Y)

We can rewrite Bayes’ theorem as:

P(Y|X) = P(X, Y)/P(X)

Instead of learning a direct mapping of X to Y using P(Y|X), as in the discriminative case, our goal is to model the joint probabilities of X and Y using P(X, Y). While we can use the resulting joint distribution of X and Y to compute the posterior P(Y|X) and learn a “targeted” model, we can also use this distribution to sample new instances of the data by either jointly sampling new tuples (x, y), or sampling new data inputs using a target label Y with the expression:

P(X|Y=y) = P(X, Y)/P(Y)

Examples of generative models include:

  • Naive Bayes classifiers
  • Gaussian mixture models
  • Latent Dirichlet allocation (LDA)
  • Hidden Markov models
  • Deep Boltzmann machines
  • VAEs
  • GANs

Naive Bayes classifiers, though named as a discriminative model, utilize Bayes’ theorem to learn the joint distribution of X and Y under the assumption that the X variables are independent. Similarly, Gaussian mixture models describe the likelihood of a data point belonging to one of a group of normal distributions using the joint probability of the label and these distributions. LDA represents a document as the joint probability of a word and a set of underlying keyword lists (topics) that are used in a document. Hidden Markov models express the joint probability of a state and the next state of a piece of data, such as the weather on successive days of the week. The VAE and GAN models we cover in Chapters 3–6 also utilize joint distributions to map between complex data types—this mapping allows us to generate data from random vectors or transform one kind of data into another.

As mentioned previously, another view of generative models is that they allow us to generate samples of X if we know an outcome Y. In the first four models listed previously, this conditional probability is just a component of the model formula, with the posterior estimates still being the ultimate objective. However, in the last three examples, which are all deep neural network models, learning the conditional probability of X dependent upon a hidden or “latent” variable Z is actually the main objective, in order to generate new data samples. Using the rich structure allowed by multi-layered neural networks, these models can approximate the distribution of complex data types such as images, natural language, and sound. Also, instead of being a target value, Z is often a random number in these applications, serving merely as an input from which to generate a large space of hypothetical data points. To the extent we have a label (such as whether a generated image should be of a dog or dolphin, or the genre of a generated song), the model is P(X|Y=y, Z=z), where the label Y “controls” the generation of data that is otherwise unrestricted by the random nature of Z.

Why generative models?

Now that we have reviewed what generative models are and defined them more formally in the language of probability, why would we have a need for such models in the first place? What value do they provide in practical applications? To answer this question, let us take a brief tour of the topics that we will cover in more detail in the rest of this book.

The promise of deep learning

As noted previously, many of the models we will survey in the book are deep, multi-level neural networks. The last 15 years have seen a renaissance in the development of deep learning models for image classification, natural language processing (NLP) and understanding, and reinforcement learning. These advances were enabled by breakthroughs in traditional challenges in tuning and optimizing very complex models, combined with access to larger datasets, distributed computational power in the cloud, and frameworks such as PyTorch, which make it easier to prototype and reproduce research. We will also lay the theoretical groundwork for the components used in models in the rest of the book, by providing an overview of neural network architectures, optimizers, and regularization in Chapter 2.

Generating images

A challenge to generating images—such as the Théâtre D’opéra Spatial—is that, frequently, images have no labels (such as a digit); rather, we want to map the space of random numbers into a set of artificial images using a latent vector Z, as we described earlier in the chapter. A further constraint is that we want to promote the diversity of these images—if we input numbers within a certain range, we would like to know that they generate different outputs, and be able to tune the resulting image features. For this purpose, VAEs23—a kind of deep neural network model that learns to encode images as a latent variable Z, which it decodes into the input image—were developed to generate diverse and photorealistic images (Figure 1.4), which we will cover in Chapter 3.

Figure 1.4: Sample images from a VAE24, 25

Figure 1.4: Sample images from a VAE24, 25

In the context of image classification tasks, being able to generate new images can help us increase the number of examples in an existing dataset, or reduce the bias if our existing dataset is heavily skewed toward a particular kind of photograph. Applications could include generating alternative poses (angles, shades, and perspective shots) for product photographs on a fashion e-commerce website (Figure 1.5).

Figure 1.5: Simulating alternative poses with deep generative models26

Figure 1.5: Simulating alternative poses with deep generative models26

In a similar application, 2D images of automotive designs can be translated into 3D models using generative AI methods39.

Data augmentation

Another powerful use case for generative models is to augment the limitations of small existing datasets with additional examples. These additional examples can help improve the quality of discriminate models trained from this expanded dataset by improving their generalization abilities. This augmented data can be used for semi-supervised learning; an initial discriminative model is trained using the real limited data. That model is then used to generate labels for the synthetic data, augmenting the dataset. Finally, a second discriminate model is trained using the combined real and synthetic datasets. Examples of these kinds of applications include increasing the number of diagnostic examples in medical image datasets for cancer and bone lesions37, 38.

Style transfer and image transformation

In addition to mapping artificial images to a space of random numbers, we could also use generative models to learn a mapping between one kind of image and a second. This kind of model can, for example, be used to convert an image of a horse into that of a zebra (Figure 1.627), transform a photo into a painting, or create “deepfake videos,” in which one actor’s face has been replaced with another’s (Figure 1.2).

Figure 1.6: CycleGANs apply stripes to horses to generate zebras27

Figure 1.6: CycleGANs apply stripes to horses to generate zebras27

Another fascinating example of applying generative modeling is a study in which a lost masterpiece of the artist Pablo Picasso was discovered to have been painted over with another image. After X-ray imaging of The Old Guitarist and The Crouching Beggar indicated that earlier images of a woman and a landscape lay underneath (Figure 1.7), researchers used the other paintings from Picasso’s “blue period” or other color photographs to train a “neural style transfer” model that transforms black and white images (the X-ray radiographs of the overlying paintings) to the coloration of the original artwork. Then, applying this transfer model to the “hidden” images allowed them to reconstruct “colored-in” versions of the lost paintings.

Figure 1.7: The Picasso paintings The Old Guitarist (top) and The Crouching Beggar (bottom) hid older paintings that were recovered using deep learning to color in the X-ray image of the painted-over scenes (middle) with color patterns learned from examples (column d), generating colorized versions of the lost art (far right)28

Figure 1.7: The Picasso paintings The Old Guitarist (top) and The Crouching Beggar (bottom) hid older paintings that were recovered using deep learning to color in the X-ray image of the painted-over scenes (middle) with color patterns learned from examples (column d), generating colorized versions of the lost art (far right)28

All of these models use the previously mentioned GANs, a type of deep learning model proposed in 201429. In addition to changing the contents of an image (as in the preceding zebra example), these models can also be used to map one image into another, such as paired images (dogs and humans with similar facial features, shown in Figure 1.8), or generate textual descriptions from images (Figure 1.9).

Figure 1.8: Sim-GAN for mapping human to animal or anime faces30

Figure 1.8: Sim-GAN for mapping human to animal or anime faces30

Figure 1.9: GAN for generating descriptions from images31

Figure 1.9: GAN for generating descriptions from images31

We could also condition the properties of the generated images on some auxiliary information such as labels, an approach used in the GANGogh algorithm, which synthesizes images in the style of different artists by supplying the desired artist as input to the generative model. We will describe these applications in Chapters 4 and 6. Generative AI is also enabling programmers to become artists through models such as Stable Diffusion, which translates natural language descriptions of an image into a visual rendering (Figure 1.10)—we’ll cover how it does this in Chapter 7 and try to reproduce Théâtre D’opéra Spatial.

Figure 1.10: Stable Diffusion examples32

Figure 1.10: Stable Diffusion examples32

Fake news and chatbots

Humans have always wanted to talk to machines; the first chatbot, ELIZA33, was written at MIT in the 1960s and used a simple program to transform a user’s input and generate a response, in the mode of a “therapist” who frequently responds in the form of a question. More sophisticated models can generate entirely novel text, such as Google’s BERT34 and GPT-211, which use a unit called a “transformer” to generate new words based on past words in a body of text. A transformer module in a neural network allows a network to propose a new word in the context of preceding words in a piece of text, emphasizing those that are more relevant in order to generate plausible stretches of language. The BERT model then combines transformer units into a powerful multi-dimensional encoding of natural language patterns and contextual significance. This approach can be used in document creation for NLP tasks, or for chatbot dialogue systems (Figure 1.3), which we will cover in Chapters 8 and 9.

Increasingly powerful LLMs have demonstrated remarkable performance in language generation, creative writing, and authoring novel code. In Chapters 10 and 11, we’ll cover some of the most important general, or “foundational,” models that can be tuned for specific tasks after being trained on large sets of diverse language data. These include both closed-source (ChatGPT) and openly available (Llama) models (Figure 1.11).

To adapt these models to specific problems, we will apply methods such as prompt engineering (Chapter 12), fine-tuning, and RAG (Chapter 14). We’ll do so using common tools in this ecosystem such as LangChain and the Hugging Face Pipelines library, which are the topic of Chapter 13.

Figure 1.11: LLM examples—GPT-4 (top) and Llama2 (bottom)35, 36

Figure 1.11: LLM examples—GPT-4 (top) and Llama2 (bottom)35, 36

Unique challenges of generative models

Given the powerful applications that generative models are applied to, what are the major challenges in implementing them? As described, most of these models utilize complex data, requiring us to fit large models with sufficiently diverse inputs to capture all the nuances of their features and distribution. That complexity arises from sources including:

  • Range of variation: The number of potential images generated from a set of three color channel pixels is immense, as is the vocabulary of many languages
  • Heterogeneity of sources: Language models, in particular, are often developed using a mixture of data from several websites
  • Size: Once data becomes large, it becomes more difficult to catch duplications, factual errors (such as mistranslations), noise (such as scrambled images), and systematic biases
  • Rate of change: Many developers of LLMs struggle to keep model information current with the state of the world and thus provide relevant answers to user prompts

This has implications both for the number of examples that we must collect to adequately represent the kind of data we are trying to generate, and the computational resources needed to build the model. Throughout this book, we will use cloud-based tools to accelerate our experiments with these models. A more subtle problem that comes from having complex data, and the fact that we are trying to generate data rather than a numerical label or value, is that our notion of model “accuracy” is much more complicated—we cannot simply calculate the distance to a single label or scores. We will discuss, in Chapter 3 and Chapter 4, how deep generative models such as VAE and GAN algorithms take different approaches to determining whether a generated image is comparable to a real-world image. Finally, our models need to allow us to generate both large and diverse samples, and the various methods we will discuss take different approaches to controlling the diversity of data.

Summary

In this chapter, we discussed what generative modeling is, and how it fits into the landscape of more familiar machine learning methods, using probability theory and Bayes’ theorem to describe how these models approach prediction in an opposite manner to discriminative learning. We reviewed use cases for generative learning, both for specific kinds of data and general prediction tasks. As we saw, text and images are the two major forms of data that these models are applied to. For images, the major models we discussed were VAE, GAN, and similar algorithms. For text, the dominant models are transformer architectures such as Llama, GPT, and BERT. Finally, we examined some of the specialized challenges that arise from building these models.

References

  1. Smithsonian Magazine. 2022. “Art Made with Artificial Intelligence Wins at State Fair.” https://www.smithsonianmag.com/smart-news/artificial-intelligence-art-wins-colorado-state-fair-180980703/.
  2. ChatGPT Technical Report. 2024. arXiv. https://arxiv.org/abs/2303.08774.
  3. Chen, Mark, Jerry Tworek, Heewoo Jun, et al. 2021. “Evaluating Large Language Models Trained on Code.” arXiv. https://arxiv.org/abs/2107.03374.
  4. Scientific Reports. 2019. “Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification.” https://www.nature.com/articles/s41598-019-42294-8.
  5. Google DeepMind. n.d. “AlphaGo: The Story So Far.” https://deepmind.com/research/case-studies/alphago-the-story-so-far.
  6. Google DeepMind. 2019. “AlphaStar: Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning.” https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning.
  7. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv. https://arxiv.org/abs/1810.04805.
  8. Fox News. 2018. “Terrifying High-Tech Porn: Creepy ‘Deepfake’ Videos Are on the Rise.” https://www.foxnews.com/tech/terrifying-high-tech-porn-creepy-deepfake-videos-are-on-the-rise.
  9. Deepfake Image Sample. Wikimedia. https://upload.wikimedia.org/wikipedia/en/thumb/7/71/Deepfake_example.gif/280px-Deepfake_example.gif.
  10. A Chatbot Dialogue Created Using GPT-2. Devopstar. https://devopstar.com/static/2293f764e1538f357dd1c63035ab25b0/d024a/fake-facebook-conversation-example-1.png.
  11. OpenAI. 2019. “Better Language Models and Their Implications.” OpenAI Blog. https://openai.com/blog/better-language-models/.
  12. Google Research. 2018. “Google Duplex: An AI System for Accomplishing Real-World Tasks over the Phone.” Google AI Blog. https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html.
  13. Software That Generates Original Musical Compositions. MuseGAN. https://salu133445.github.io/musegan/.
  14. Kolmogorov, Andrey. 1950 [1933]. Foundations of the Theory of Probability. New York, USA: Chelsea Publishing Company.
  15. Jebara, Tony. 2004. Machine Learning: Discriminative and Generative. Kluwer Academic (Springer).
  16. Ng, Andrew Y., and Michael I. Jordan. 2002. “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes.” Advances in Neural Information Processing Systems.
  17. Mitchell, Tom M. 2015. “Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression.” Machine Learning.
  18. Bayes, Thomas, and Richard Price. 1763. “An Essay towards Solving a Problem in the Doctrine of Chance.” Philosophical Transactions of the Royal Society of London 53: 370–418.
  19. Ho, Tin Kam. 1995. “Random Decision Forests.” Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, August 14–16, 1995, 278–282.
  20. Breiman, L. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.
  21. Friedman, J. H. 1999. “Greedy Function Approximation: A Gradient Boosting Machine.”
  22. Cortes, Corinna, and Vladimir N. Vapnik. 1995. “Support-Vector Networks.” Machine Learning 20 (3): 273–297.
  23. Kingma, Diederik P., and Max Welling. 2022. “Auto-Encoding Variational Bayes.” arXiv. https://arxiv.org/abs/1312.6114.
  24. Sample Images from a VAE: https://miro.medium.com/max/2880/1*jcCjbdnN4uEowuHfBoqITQ.jpeg
  25. Chen, Ricky T. Q., Xuechen Li, Roger Grosse, and David Duvenaud. 2019. “Isolating Sources of Disentanglement in VAEs.” arXiv Vanity. https://www.arxiv-vanity.com/papers/1802.04942/.
  26. Esser, Patrick, Johannes Haux, and Björn Ommer. 2019. “Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis.” arXiv. https://arxiv.org/pdf/1910.10223.pdf.
  27. CycleGANs Apply Stripes to Horses to Generate Zebras.” GitHub. https://github.com/jzsherlock4869/cyclegan-pytorch?tab=readme-ov-file.
  28. Bourached, Anthony, and George Cann. 2019. “Raiders of the Lost Art.” arXiv. https://arxiv.org/pdf/1909.05677.pdf.
  29. Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Networks.” Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014), 2672–2680.
  30. Hindawi Journal of Mathematical Problems in Engineering. 2020. https://www.hindawi.com/journals/mpe/2020/6216048/.
  31. Gorti, Satya, and Jeremy Ma. 2018. “Text-to-Image-to-Text Translation Using Cycle Consistent Adversarial Networks.”
  32. arXiv. 2021. https://arxiv.org/pdf/2112.10752.pdf.
  33. Weizenbaum, Joseph. 1976. Computer Power and Human Reason: From Judgment to Calculation. New York: W. H. Freeman and Company.
  34. Schwartz, Barry. 2019. “Welcome BERT: Google’s Latest Search Algorithm to Better Understand Natural Language.” Search Engine Land. https://searchengineland.com/welcome-bert-google-artificial-intelligence-for-understanding-search-queries-323976.
  35. X post: https://x.com/TonyHoWasHere/status/1636347961813655557.
  36. TheSequence. 2023. “Edge 314: A Deep Dive into Llama 2: Meta AI LLM That Has Become a Symbol in Open Source AI.” https://thesequence.substack.com/p/a-deep-dive-into-llama-2-meta-ai.
  37. Gupta, Anant, Srivas Venkatesh, Sumit Chopra, and Christian Ledig. 2019. “Generative Image Translation for Data Augmentation of Bone Lesion Pathology.” Proceedings of Machine Learning Research. https://proceedings.mlr.press/v102/gupta19b.html.
  38. Mulé, Sébastien, Littisha Lawrance, Younes Belkouchi, and Valérie Vilgrain. 2022. “Generative Adversarial Networks (GAN)-Based Data Augmentation of Rare Liver Cancers: The SFR 2021 Artificial Intelligence Data Challenge.” ScienceDirect. https://www.sciencedirect.com/science/article/pii/S2211568422001711.
  39. Shapiro, Danny. 2023. “Generative AI Revs Up New Age in Auto Industry, from Design and Engineering to Production and Sales.” NVIDIA Blog. https://blogs.nvidia.com/blog/generative-ai-auto-industry/.
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Implement real-world applications of LLMs and generative AI
  • Fine-tune models with PEFT and LoRA to speed up training
  • Expand your LLM toolbox with Retrieval Augmented Generation (RAG) techniques, LangChain, and LlamaIndex
  • Purchase of the print or Kindle book includes a free eBook in PDF format

Description

Become an expert in Generative AI through immersive, hands-on projects that leverage today’s most powerful models for Natural Language Processing (NLP) and computer vision. Generative AI with Python and PyTorch is your end-to-end guide to creating advanced AI applications, made easy by Raghav Bali, a seasoned data scientist with multiple patents in AI, and Joseph Babcock, a PhD and machine learning expert. Through business-tested approaches, this book simplifies complex GenAI concepts, making learning both accessible and immediately applicable. From NLP to image generation, this second edition explores practical applications and the underlying theories that power these technologies. By integrating the latest advancements in LLMs, it prepares you to design and implement powerful AI systems that transform data into actionable intelligence. You’ll build your versatile LLM toolkit by gaining expertise in GPT-4, LangChain, RLHF, LoRA, RAG, and more. You’ll also explore deep learning techniques for image generation and apply styler transfer using GANs, before advancing to implement CLIP and diffusion models. Whether you’re generating dynamic content or developing complex AI-driven solutions, this book equips you with everything you need to harness the full transformative power of Python and AI.

Who is this book for?

This book is for data scientists, machine learning engineers, and software developers seeking practical skills in building generative AI systems. A basic understanding of math and statistics and experience with Python coding is required.

What you will learn

  • Grasp the core concepts and capabilities of LLMs
  • Craft effective prompts using chain-of-thought, ReAct, and prompt query language to guide LLMs toward your desired outputs
  • Understand how attention and transformers have changed NLP
  • Optimize your diffusion models by combining them with VAEs
  • Build text generation pipelines based on LSTMs and LLMs
  • Leverage the power of open-source LLMs, such as Llama and Mistral, for diverse applications

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Last updated date : Apr 14, 2025
Publication date : Mar 28, 2025
Length: 450 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835884454
Vendor :
Facebook
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Last updated date : Apr 14, 2025
Publication date : Mar 28, 2025
Length: 450 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835884454
Vendor :
Facebook
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts

Table of Contents

17 Chapters
Introduction to Generative AI: Drawing Data from Models Chevron down icon Chevron up icon
Building Blocks of Deep Neural Networks Chevron down icon Chevron up icon
The Rise of Methods for Text Generation Chevron down icon Chevron up icon
NLP 2.0: Using Transformers to Generate Text Chevron down icon Chevron up icon
LLM Foundations Chevron down icon Chevron up icon
Open-Source LLMs Chevron down icon Chevron up icon
Prompt Engineering Chevron down icon Chevron up icon
LLM Toolbox Chevron down icon Chevron up icon
LLM Optimization Techniques Chevron down icon Chevron up icon
Emerging Applications in Generative AI Chevron down icon Chevron up icon
Neural Networks Using VAEs Chevron down icon Chevron up icon
Image Generation with GANs Chevron down icon Chevron up icon
Style Transfer with GANs Chevron down icon Chevron up icon
Deepfakes with GANs Chevron down icon Chevron up icon
Diffusion Models and AI Art Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Leonard Hall Jul 23, 2025
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a masterpiece! As someone actively working in AI, I can confidently say Generative AI with Python and PyTorch (Second Edition) is one of the most comprehensive and engaging resources on the subject. Whether you're exploring generative models for the first time or looking to deepen your expertise in LLMs, VAEs, GANs, or diffusion models, this book delivers with clarity, hands-on depth, and authority. Joseph Babcock and Raghav Bali do an outstanding job of walking readers through the foundational concepts of generative AI—from the math of probability and Bayes’ theorem to the latest advancements in large language models (LLMs) and Stable Diffusion. Each chapter builds logically on the previous one, with lucid explanations and real-world code examples using PyTorch and Hugging Face that make even complex topics approachable.
Feefo Verified review Feefo
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Modal Close icon
Modal Close icon