Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Generative AI with Python and PyTorch
Generative AI with Python and PyTorch

Generative AI with Python and PyTorch: Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications , Second Edition

Arrow left icon
Profile Icon Joseph Babcock Profile Icon Raghav Bali
Arrow right icon
€41.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Apr 2025 450 pages 2nd Edition
eBook
€28.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Joseph Babcock Profile Icon Raghav Bali
Arrow right icon
€41.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Apr 2025 450 pages 2nd Edition
eBook
€28.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
€28.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Generative AI with Python and PyTorch

Introduction to Generative AI: Drawing Data from Models

At the Colorado State Fair in 2022, the winning entry was a fantastical sci-fi landscape created by video game designer Jason Allen titled Théâtre D’opéra Spatial (Figure 1.1). The first-prize art was remarkable both from the dramatic subject matter and due to the unusual origin of this image. Unlike the majority of other artworks entered into the competition, Théâtre D’opéra Spatial was not painted using oil or watercolors, nor was its “creator” even human; rather, it is an entirely digital image produced by a sophisticated machine learning algorithm called Midjourney. Jason used Midjourney, which has been trained on diverse images, along with natural language instructions to create the image, rather than a brush and canvas.

Figure 1.1: Théâtre D’opéra Spatial1

Figure 1.1: Théâtre D’opéra Spatial1

Visual art is far from the only area in which machine learning has demonstrated astonishing results. Indeed, if you have paid attention to the news in the last few years, you have likely seen many stories about the groundbreaking results of modern AI systems applied to diverse problems, from the hard sciences to online avatars and interactive chat. Deep neural network models, such as the one powering Midjourney, have shown amazing abilities to generate realistic human language2, author computer code3, and solve school exams with human-level ability2. Such models can also classify X-ray images of human anatomy on the level of trained physicians4, beat human masters at both classic board games such as Go (an Asian form of chess) as well as multiplayer computer games5, 6, and translate French into English with amazing sensitivity to grammatical nuances7.

Discriminative versus generative models

However, these latter examples of AI differ in an important way from the model that generated Théâtre D’opéra Spatial. In all of these other applications, the model is presented with a set of inputs—data such as English text, or X-ray images—that is paired with a target output, such as the next word in a translated sentence or the diagnostic classification of an X-ray. Indeed, this is probably the kind of AI model you are most familiar with from prior experiences in predictive modeling; they are broadly known as discriminative models, whose purpose is to create a mapping between a set of input variables and a target output. The target output could be a set of discrete classes (such as which word in the English language appears next in a translation), or a continuous outcome (such as the expected amount of money a customer will spend in an online store over the next 12 months).

However, this kind of model, in which data is “labeled” or “scored,” represents only half of the capabilities of modern machine learning. Another class of algorithms, such as the one that generated the winning entry in the Colorado State Art Fair, doesn’t compute a score or label from input variables but rather generates new data. Unlike discriminative models, the input variables are often vectors of numbers that aren’t related to real-world values at all and are often even randomly generated. This kind of model, known as a generative model, which can produce complex outputs such as text, music, or images from random noise, is the topic of this book.

Even if you did not know it at the time, you have probably seen other instances of generative models mentioned in the news alongside the discriminative examples given previously. A prominent example is deepfakes—videos in which one person’s face has been systematically replaced with another’s by using a neural network to remap the pixels8 (Figure 1.2).

Figure 1.2: A deepfake image9

Figure 1.2: A deepfake image9

Maybe you have also seen stories about AI models that generate “fake news,” which scientists at the firm OpenAI were initially terrified to release to the public due to concerns it could be used to create propaganda and misinformation online (Figure 1.3)11.

Figure 1.3: A chatbot dialogue created using GPT-210

Figure 1.3: A chatbot dialogue created using GPT-210

In these and other applications—such as Google’s voice assistant Duplex, which can make a restaurant reservation by dynamically creating conservation with a human in real time12, or even software that can generate original musical compositions13—we are surrounded by the outputs of generative AI algorithms. These models are able to handle complex information in a variety of domains: creating photorealistic images or stylistic “filters” on pictures, synthetic sound, conversational text, and even rules for optimally playing video games. You might ask: Where did these models come from? How can I implement them myself?

Implementing generative models

While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capture the complex variation in data such as images or language. In this book, we will focus on implementing these deep-learning-based generative models for many different applications using PyTorch. PyTorch is a Python programming library used to develop and produce deep learning models. It was open-sourced by Meta (formerly Facebook) in 2016 and has become one of the most popular libraries for the research and deployment of neural network models. We’ll execute PyTorch code on the cloud using Google’s Colab notebook environment, which allows you to access world-class computing infrastructure including graphic processing units (GPUs) and tensor processing units (TPUs) on demand and without the need for onerous environment setups. We’ll also leverage the Pipelines library from Hugging Face, which provides an easy interface to run experiments using a catalog of some of the most sophisticated models available.

In the following chapters, you will learn not only the underlying theory behind these models but also the practical skills to implement them in popular programming frameworks. In Chapter 2, we’ll review how, since 2006, an explosion of research in “deep learning” using large neural network models has produced a wide variety of generative modeling applications. Innovations arising from this research included variational autoencoders (VAEs), which can efficiently generate complex data samples from random numbers that are “decoded” into realistic images, using techniques we will describe in Chapter 11. We will also describe a related image generation algorithm, the generative adversarial network (GAN), in more detail in Chapters 12-14 of this book through applications for image generation, style transfer, and deepfakes. Conceptually, the GAN model creates a competition between two neural networks.

One (termed the generator) produces realistic (or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers that are “decoded” into realistic images by applying a mathematical transformation. In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration. The second network, known as the discriminator, attempts to classify whether a picture comes from a set of real-world images, or whether it was created by the generator. Thus, the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its output becomes closer and closer to the historical examples it is designed to copy. In Chapter 11, we’ll also describe the algorithm used in Théâtre D’opéra Spatial, the latent diffusion model, which builds on VAEs to provide scalable image synthesis based on natural language prompts from a human user.

Another key innovation in generative models is in the domain of natural language data—by representing the complex interrelationship between words in a sentence in a computationally scalable way, the Transformer network and the Bidirectional Encoder from Transformers (BERT) model built on top of it present powerful building blocks to generate textual data in applications such as chatbots and large language models (LLMs), which we’ll cover in Chapters 4 and 5. In Chapter 6, we will dive deeper into the most famous open-source models in the current LLM landscape, including Llama. In Chapters 7 and 8.

Before diving into further details on the various applications of generative models and how to implement them in PyTorch, we will take a step back and examine how exactly generative models are different from other kinds of machine learning. This difference lies with the basic units of any machine learning algorithm: probability and the various ways we use mathematics to quantify the shape and distribution of data we encounter in the world. In the rest of this chapter, we will cover the following:

  • How we can use the statistical rules of probability to describe how machine learning models represent the shapes of the datasets we study
  • The difference between discriminative and generative models, based on the kinds of probability rules they embody
  • Examples of areas where generative modeling has been applied: image generation, style transfer, chatbots and text synthesis, and reinforcement learning

The rules of probability

At the simplest level, a model, be it machine learning or a more classical method such as linear regression, is a mathematical description of how a target variable changes in response to variation in a predictive variable; that relationship could be a linear slope or any of a number of more complex mathematical transformations. In the task of modeling, we usually think of separating the variables in our dataset into two broad classes:

  • Independent data, by which we primarily mean inputs to a model, is often denoted by X. For example, if we are trying to predict the grades of school students on an end-of-year exam based on their characteristics, we could think of several kinds of features:
    • Categorical: If there are six schools in a district, the school that a student attends could be represented by a six-element vector for each student. The elements are all 0, except for one that is 1, indicating which of the six schools they are enrolled in.
    • Continuous: The student heights or average prior test scores can be represented as continuous real numbers.
    • Ordinal: The rank of the student in their class is not meant to be an absolute quantity (like their height) but rather a measure of relative difference.
  • Dependent variables, conversely, are the outputs of our models and are denoted by the letter Y. Note that, in some cases, Y is a “label” that can be used to condition a generative output, such as in a conditional GAN. It can be categorical, continuous, or ordinal, and could be an individual element or multidimensional matrix (tensor) for each element of the dataset.

How can we describe the data in our model using statistics? In other words, how can we quantitatively describe what values we are likely to see, how frequently, and which values are more likely to appear together and others? One way is by asking how likely it is to observe a particular value in the data or the probability of that value. For example, if we were to ask what the probability of observing a roll of four on a six-sided die is, the answer is that, on average, we would observe a four once every six rolls. We write this as follows:

P(X=4) = 1⁄6 = 16.67%

Here, P denotes “probability of.” What defines the allowed probability values for a particular dataset? If we imagine the set of all possible values of a dataset—such as all values of a die—then a probability maps each value to a number between 0 and 1. The minimum is 0 because we cannot have a negative chance of seeing a result; the most unlikely result is that we would never see a particular value, or 0% probability, such as rolling a seven on a six-sided die. Similarly, we cannot have a greater than 100% probability of observing a result, represented by the value 1; an outcome with probability 1 is absolutely certain. This set of probability values associated with a dataset belongs to discrete classes (such as the faces of a die) or an infinite set of potential values (such as variations in height or weight). In either case, however, these values have to follow certain rules, the probability axioms described by the mathematician Andrey Kolmogorov in 193314:

  1. The probability of an observation (a die roll, a particular height) is a non-negative, finite number between 0 and 1.
  2. The probability of at least one of the observations in the space of all possible observations occurring is 1.
  3. The probability of distinct, mutually exclusive events (such as the rolls 1-6 on a die) is the sum of the probability of the individual events.

While these rules might seem abstract, we will see in Chapter 3 that they have direct relevance to developing neural network models. For example, an application of rule 1 is to generate the probability between 1 and 0 for a particular outcome in a softmax function for predicting target classes. For example, if our model is asked to classify whether an image contains a cat, dog, or horse, each potential class receives a probability between 0 and 1 as the output of a sigmoid function based on a deep neural network applying nonlinear, multi-layer transformations on the input pixels of an image we are trying to classify. Rule 3 is used to normalize these outcomes into the range 0–1, under the guarantee that they are mutually distinct predictions of a deep neural network (in other words, a real-world image logically cannot be classified as both a dog and cat, but rather a dog or cat, with the probability of these two outcomes additive). Finally, the second rule provides the theoretical guarantees that we can generate data at all using these models.

However, in the context of machine learning and modeling, we are not usually interested in just the probability of observing a piece of input data, X; we instead want to know the conditional probability of an outcome Y given the data X. Said another way, we want to know how likely a label for a set of data is, based on that data. We write this as the probability of Y given X, or the probability of Y conditional on X:

P(Y|X)

Another question we could ask about Y and X is how likely they are to occur together—their joint probability—which can be expressed using the preceding conditional probability expression as:

P(X, Y) = P(Y|X)P(X) = P(X|Y)P(Y)

This formula expressed the probability of X and Y. In the case of X and Y being completely independent of one another, this is simply their product:

P(X|Y)P(Y) = P(Y|X)P(X) = P(X)P(Y)

You will see that these expressions become important in our discussion of complementary priors in Chapter 4, and the ability of restricted Boltzmann machines to simulate independent data samples. They are also important as building blocks of Bayes’ theorem, which we describe next.

Discriminative and generative modeling, and Bayes’ theorem

Now, let us consider how these rules of conditional and joint probability relate to the kinds of predictive models that we build for various machine learning applications. In most cases—such as predicting whether an email is fraudulent or the dollar amount of the future lifetime value of a customer—we are interested in the conditional probability, P(Y|X=x), where Y is the set of outcomes we are trying to model and X is the input features, and x is a particular value of the input features. For example, we are trying to calculate the probability that an email is fraudulent based on the knowledge of the set of words (the x) in the message. This approach is known as discriminative modeling15, 16, 17. Discriminative modeling attempts to learn a direct mapping between the data, X, and the outcomes, Y.

Another way to understand discriminative modeling is in the context of Bayes’ theorem18, which relates the conditional and joint probabilities of a dataset, as follows:

P(Y|X) = P(X|Y)P(Y)/P(X) = P(X, Y)/P(X)

As a side note, the theorem was published two years following the author’s death, and in a foreword, Richard Price described it as a mathematical argument for the existence of God, perhaps appropriate given that Thomas Bayes served as a Reverend during his life. In the formula for Bayes’ theorem, the expression P(X|Y)/P(X) is known as the likelihood or the supporting evidence that the observation X gives to the likelihood of observing Y, P(Y) is the prior or the plausibility of the outcome, and P(Y|X) is the posterior or the probability of the outcome given all the independent data we have observed related to the outcome thus far. Conceptually, Bayes’ theorem states that the probability of an outcome is the product of its baseline probability and the probability of the input data conditional on this outcome.

In the context of discriminative learning, we can thus see that a discriminative model directly computes the posterior; we could have a model of the likelihood or prior, but it is not required in this approach. Even though you may not have realized it, most of the models you have probably used in the machine learning toolkit are discriminative, such as:

  • Linear regression
  • Logistic regression
  • Random forests19, 20
  • Gradient-boosted decision trees (GBDTs)21
  • Support vector machines (SVMs)22

The first two (linear and logistic regression) models the outcome Y conditional on the data X using a Normal or Gaussian (linear regression) or sigmoidal (logistic regression) probability function. In contrast, the last three have no formal probability model—they compute a function (an ensemble of trees for random forests or GBDTs, or an inner product distribution for SVM) that maps X to Y, using a loss or error function to tune those estimates; given this nonparametric nature, some authors have argued that these constitute a separate class of “non-model” or “non-parametric” discriminative algorithms15.

In contrast, a generative model attempts to learn the joint distribution P(Y, X) of the labels and the input data. Recall that using the definition of joint probability:

P(X, Y) = P(X|Y)P(Y)

We can rewrite Bayes’ theorem as:

P(Y|X) = P(X, Y)/P(X)

Instead of learning a direct mapping of X to Y using P(Y|X), as in the discriminative case, our goal is to model the joint probabilities of X and Y using P(X, Y). While we can use the resulting joint distribution of X and Y to compute the posterior P(Y|X) and learn a “targeted” model, we can also use this distribution to sample new instances of the data by either jointly sampling new tuples (x, y), or sampling new data inputs using a target label Y with the expression:

P(X|Y=y) = P(X, Y)/P(Y)

Examples of generative models include:

  • Naive Bayes classifiers
  • Gaussian mixture models
  • Latent Dirichlet allocation (LDA)
  • Hidden Markov models
  • Deep Boltzmann machines
  • VAEs
  • GANs

Naive Bayes classifiers, though named as a discriminative model, utilize Bayes’ theorem to learn the joint distribution of X and Y under the assumption that the X variables are independent. Similarly, Gaussian mixture models describe the likelihood of a data point belonging to one of a group of normal distributions using the joint probability of the label and these distributions. LDA represents a document as the joint probability of a word and a set of underlying keyword lists (topics) that are used in a document. Hidden Markov models express the joint probability of a state and the next state of a piece of data, such as the weather on successive days of the week. The VAE and GAN models we cover in Chapters 3–6 also utilize joint distributions to map between complex data types—this mapping allows us to generate data from random vectors or transform one kind of data into another.

As mentioned previously, another view of generative models is that they allow us to generate samples of X if we know an outcome Y. In the first four models listed previously, this conditional probability is just a component of the model formula, with the posterior estimates still being the ultimate objective. However, in the last three examples, which are all deep neural network models, learning the conditional probability of X dependent upon a hidden or “latent” variable Z is actually the main objective, in order to generate new data samples. Using the rich structure allowed by multi-layered neural networks, these models can approximate the distribution of complex data types such as images, natural language, and sound. Also, instead of being a target value, Z is often a random number in these applications, serving merely as an input from which to generate a large space of hypothetical data points. To the extent we have a label (such as whether a generated image should be of a dog or dolphin, or the genre of a generated song), the model is P(X|Y=y, Z=z), where the label Y “controls” the generation of data that is otherwise unrestricted by the random nature of Z.

Why generative models?

Now that we have reviewed what generative models are and defined them more formally in the language of probability, why would we have a need for such models in the first place? What value do they provide in practical applications? To answer this question, let us take a brief tour of the topics that we will cover in more detail in the rest of this book.

The promise of deep learning

As noted previously, many of the models we will survey in the book are deep, multi-level neural networks. The last 15 years have seen a renaissance in the development of deep learning models for image classification, natural language processing (NLP) and understanding, and reinforcement learning. These advances were enabled by breakthroughs in traditional challenges in tuning and optimizing very complex models, combined with access to larger datasets, distributed computational power in the cloud, and frameworks such as PyTorch, which make it easier to prototype and reproduce research. We will also lay the theoretical groundwork for the components used in models in the rest of the book, by providing an overview of neural network architectures, optimizers, and regularization in Chapter 2.

Generating images

A challenge to generating images—such as the Théâtre D’opéra Spatial—is that, frequently, images have no labels (such as a digit); rather, we want to map the space of random numbers into a set of artificial images using a latent vector Z, as we described earlier in the chapter. A further constraint is that we want to promote the diversity of these images—if we input numbers within a certain range, we would like to know that they generate different outputs, and be able to tune the resulting image features. For this purpose, VAEs23—a kind of deep neural network model that learns to encode images as a latent variable Z, which it decodes into the input image—were developed to generate diverse and photorealistic images (Figure 1.4), which we will cover in Chapter 3.

Figure 1.4: Sample images from a VAE24, 25

Figure 1.4: Sample images from a VAE24, 25

In the context of image classification tasks, being able to generate new images can help us increase the number of examples in an existing dataset, or reduce the bias if our existing dataset is heavily skewed toward a particular kind of photograph. Applications could include generating alternative poses (angles, shades, and perspective shots) for product photographs on a fashion e-commerce website (Figure 1.5).

Figure 1.5: Simulating alternative poses with deep generative models26

Figure 1.5: Simulating alternative poses with deep generative models26

In a similar application, 2D images of automotive designs can be translated into 3D models using generative AI methods39.

Data augmentation

Another powerful use case for generative models is to augment the limitations of small existing datasets with additional examples. These additional examples can help improve the quality of discriminate models trained from this expanded dataset by improving their generalization abilities. This augmented data can be used for semi-supervised learning; an initial discriminative model is trained using the real limited data. That model is then used to generate labels for the synthetic data, augmenting the dataset. Finally, a second discriminate model is trained using the combined real and synthetic datasets. Examples of these kinds of applications include increasing the number of diagnostic examples in medical image datasets for cancer and bone lesions37, 38.

Style transfer and image transformation

In addition to mapping artificial images to a space of random numbers, we could also use generative models to learn a mapping between one kind of image and a second. This kind of model can, for example, be used to convert an image of a horse into that of a zebra (Figure 1.627), transform a photo into a painting, or create “deepfake videos,” in which one actor’s face has been replaced with another’s (Figure 1.2).

Figure 1.6: CycleGANs apply stripes to horses to generate zebras27

Figure 1.6: CycleGANs apply stripes to horses to generate zebras27

Another fascinating example of applying generative modeling is a study in which a lost masterpiece of the artist Pablo Picasso was discovered to have been painted over with another image. After X-ray imaging of The Old Guitarist and The Crouching Beggar indicated that earlier images of a woman and a landscape lay underneath (Figure 1.7), researchers used the other paintings from Picasso’s “blue period” or other color photographs to train a “neural style transfer” model that transforms black and white images (the X-ray radiographs of the overlying paintings) to the coloration of the original artwork. Then, applying this transfer model to the “hidden” images allowed them to reconstruct “colored-in” versions of the lost paintings.

Figure 1.7: The Picasso paintings The Old Guitarist (top) and The Crouching Beggar (bottom) hid older paintings that were recovered using deep learning to color in the X-ray image of the painted-over scenes (middle) with color patterns learned from examples (column d), generating colorized versions of the lost art (far right)28

Figure 1.7: The Picasso paintings The Old Guitarist (top) and The Crouching Beggar (bottom) hid older paintings that were recovered using deep learning to color in the X-ray image of the painted-over scenes (middle) with color patterns learned from examples (column d), generating colorized versions of the lost art (far right)28

All of these models use the previously mentioned GANs, a type of deep learning model proposed in 201429. In addition to changing the contents of an image (as in the preceding zebra example), these models can also be used to map one image into another, such as paired images (dogs and humans with similar facial features, shown in Figure 1.8), or generate textual descriptions from images (Figure 1.9).

Figure 1.8: Sim-GAN for mapping human to animal or anime faces30

Figure 1.8: Sim-GAN for mapping human to animal or anime faces30

Figure 1.9: GAN for generating descriptions from images31

Figure 1.9: GAN for generating descriptions from images31

We could also condition the properties of the generated images on some auxiliary information such as labels, an approach used in the GANGogh algorithm, which synthesizes images in the style of different artists by supplying the desired artist as input to the generative model. We will describe these applications in Chapters 4 and 6. Generative AI is also enabling programmers to become artists through models such as Stable Diffusion, which translates natural language descriptions of an image into a visual rendering (Figure 1.10)—we’ll cover how it does this in Chapter 7 and try to reproduce Théâtre D’opéra Spatial.

Figure 1.10: Stable Diffusion examples32

Figure 1.10: Stable Diffusion examples32

Fake news and chatbots

Humans have always wanted to talk to machines; the first chatbot, ELIZA33, was written at MIT in the 1960s and used a simple program to transform a user’s input and generate a response, in the mode of a “therapist” who frequently responds in the form of a question. More sophisticated models can generate entirely novel text, such as Google’s BERT34 and GPT-211, which use a unit called a “transformer” to generate new words based on past words in a body of text. A transformer module in a neural network allows a network to propose a new word in the context of preceding words in a piece of text, emphasizing those that are more relevant in order to generate plausible stretches of language. The BERT model then combines transformer units into a powerful multi-dimensional encoding of natural language patterns and contextual significance. This approach can be used in document creation for NLP tasks, or for chatbot dialogue systems (Figure 1.3), which we will cover in Chapters 8 and 9.

Increasingly powerful LLMs have demonstrated remarkable performance in language generation, creative writing, and authoring novel code. In Chapters 10 and 11, we’ll cover some of the most important general, or “foundational,” models that can be tuned for specific tasks after being trained on large sets of diverse language data. These include both closed-source (ChatGPT) and openly available (Llama) models (Figure 1.11).

To adapt these models to specific problems, we will apply methods such as prompt engineering (Chapter 12), fine-tuning, and RAG (Chapter 14). We’ll do so using common tools in this ecosystem such as LangChain and the Hugging Face Pipelines library, which are the topic of Chapter 13.

Figure 1.11: LLM examples—GPT-4 (top) and Llama2 (bottom)35, 36

Figure 1.11: LLM examples—GPT-4 (top) and Llama2 (bottom)35, 36

Unique challenges of generative models

Given the powerful applications that generative models are applied to, what are the major challenges in implementing them? As described, most of these models utilize complex data, requiring us to fit large models with sufficiently diverse inputs to capture all the nuances of their features and distribution. That complexity arises from sources including:

  • Range of variation: The number of potential images generated from a set of three color channel pixels is immense, as is the vocabulary of many languages
  • Heterogeneity of sources: Language models, in particular, are often developed using a mixture of data from several websites
  • Size: Once data becomes large, it becomes more difficult to catch duplications, factual errors (such as mistranslations), noise (such as scrambled images), and systematic biases
  • Rate of change: Many developers of LLMs struggle to keep model information current with the state of the world and thus provide relevant answers to user prompts

This has implications both for the number of examples that we must collect to adequately represent the kind of data we are trying to generate, and the computational resources needed to build the model. Throughout this book, we will use cloud-based tools to accelerate our experiments with these models. A more subtle problem that comes from having complex data, and the fact that we are trying to generate data rather than a numerical label or value, is that our notion of model “accuracy” is much more complicated—we cannot simply calculate the distance to a single label or scores. We will discuss, in Chapter 3 and Chapter 4, how deep generative models such as VAE and GAN algorithms take different approaches to determining whether a generated image is comparable to a real-world image. Finally, our models need to allow us to generate both large and diverse samples, and the various methods we will discuss take different approaches to controlling the diversity of data.

Summary

In this chapter, we discussed what generative modeling is, and how it fits into the landscape of more familiar machine learning methods, using probability theory and Bayes’ theorem to describe how these models approach prediction in an opposite manner to discriminative learning. We reviewed use cases for generative learning, both for specific kinds of data and general prediction tasks. As we saw, text and images are the two major forms of data that these models are applied to. For images, the major models we discussed were VAE, GAN, and similar algorithms. For text, the dominant models are transformer architectures such as Llama, GPT, and BERT. Finally, we examined some of the specialized challenges that arise from building these models.

References

  1. Smithsonian Magazine. 2022. “Art Made with Artificial Intelligence Wins at State Fair.” https://www.smithsonianmag.com/smart-news/artificial-intelligence-art-wins-colorado-state-fair-180980703/.
  2. ChatGPT Technical Report. 2024. arXiv. https://arxiv.org/abs/2303.08774.
  3. Chen, Mark, Jerry Tworek, Heewoo Jun, et al. 2021. “Evaluating Large Language Models Trained on Code.” arXiv. https://arxiv.org/abs/2107.03374.
  4. Scientific Reports. 2019. “Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification.” https://www.nature.com/articles/s41598-019-42294-8.
  5. Google DeepMind. n.d. “AlphaGo: The Story So Far.” https://deepmind.com/research/case-studies/alphago-the-story-so-far.
  6. Google DeepMind. 2019. “AlphaStar: Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning.” https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning.
  7. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv. https://arxiv.org/abs/1810.04805.
  8. Fox News. 2018. “Terrifying High-Tech Porn: Creepy ‘Deepfake’ Videos Are on the Rise.” https://www.foxnews.com/tech/terrifying-high-tech-porn-creepy-deepfake-videos-are-on-the-rise.
  9. Deepfake Image Sample. Wikimedia. https://upload.wikimedia.org/wikipedia/en/thumb/7/71/Deepfake_example.gif/280px-Deepfake_example.gif.
  10. A Chatbot Dialogue Created Using GPT-2. Devopstar. https://devopstar.com/static/2293f764e1538f357dd1c63035ab25b0/d024a/fake-facebook-conversation-example-1.png.
  11. OpenAI. 2019. “Better Language Models and Their Implications.” OpenAI Blog. https://openai.com/blog/better-language-models/.
  12. Google Research. 2018. “Google Duplex: An AI System for Accomplishing Real-World Tasks over the Phone.” Google AI Blog. https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html.
  13. Software That Generates Original Musical Compositions. MuseGAN. https://salu133445.github.io/musegan/.
  14. Kolmogorov, Andrey. 1950 [1933]. Foundations of the Theory of Probability. New York, USA: Chelsea Publishing Company.
  15. Jebara, Tony. 2004. Machine Learning: Discriminative and Generative. Kluwer Academic (Springer).
  16. Ng, Andrew Y., and Michael I. Jordan. 2002. “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes.” Advances in Neural Information Processing Systems.
  17. Mitchell, Tom M. 2015. “Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression.” Machine Learning.
  18. Bayes, Thomas, and Richard Price. 1763. “An Essay towards Solving a Problem in the Doctrine of Chance.” Philosophical Transactions of the Royal Society of London 53: 370–418.
  19. Ho, Tin Kam. 1995. “Random Decision Forests.” Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, August 14–16, 1995, 278–282.
  20. Breiman, L. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.
  21. Friedman, J. H. 1999. “Greedy Function Approximation: A Gradient Boosting Machine.”
  22. Cortes, Corinna, and Vladimir N. Vapnik. 1995. “Support-Vector Networks.” Machine Learning 20 (3): 273–297.
  23. Kingma, Diederik P., and Max Welling. 2022. “Auto-Encoding Variational Bayes.” arXiv. https://arxiv.org/abs/1312.6114.
  24. Sample Images from a VAE: https://miro.medium.com/max/2880/1*jcCjbdnN4uEowuHfBoqITQ.jpeg
  25. Chen, Ricky T. Q., Xuechen Li, Roger Grosse, and David Duvenaud. 2019. “Isolating Sources of Disentanglement in VAEs.” arXiv Vanity. https://www.arxiv-vanity.com/papers/1802.04942/.
  26. Esser, Patrick, Johannes Haux, and Björn Ommer. 2019. “Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis.” arXiv. https://arxiv.org/pdf/1910.10223.pdf.
  27. CycleGANs Apply Stripes to Horses to Generate Zebras.” GitHub. https://github.com/jzsherlock4869/cyclegan-pytorch?tab=readme-ov-file.
  28. Bourached, Anthony, and George Cann. 2019. “Raiders of the Lost Art.” arXiv. https://arxiv.org/pdf/1909.05677.pdf.
  29. Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Networks.” Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014), 2672–2680.
  30. Hindawi Journal of Mathematical Problems in Engineering. 2020. https://www.hindawi.com/journals/mpe/2020/6216048/.
  31. Gorti, Satya, and Jeremy Ma. 2018. “Text-to-Image-to-Text Translation Using Cycle Consistent Adversarial Networks.”
  32. arXiv. 2021. https://arxiv.org/pdf/2112.10752.pdf.
  33. Weizenbaum, Joseph. 1976. Computer Power and Human Reason: From Judgment to Calculation. New York: W. H. Freeman and Company.
  34. Schwartz, Barry. 2019. “Welcome BERT: Google’s Latest Search Algorithm to Better Understand Natural Language.” Search Engine Land. https://searchengineland.com/welcome-bert-google-artificial-intelligence-for-understanding-search-queries-323976.
  35. X post: https://x.com/TonyHoWasHere/status/1636347961813655557.
  36. TheSequence. 2023. “Edge 314: A Deep Dive into Llama 2: Meta AI LLM That Has Become a Symbol in Open Source AI.” https://thesequence.substack.com/p/a-deep-dive-into-llama-2-meta-ai.
  37. Gupta, Anant, Srivas Venkatesh, Sumit Chopra, and Christian Ledig. 2019. “Generative Image Translation for Data Augmentation of Bone Lesion Pathology.” Proceedings of Machine Learning Research. https://proceedings.mlr.press/v102/gupta19b.html.
  38. Mulé, Sébastien, Littisha Lawrance, Younes Belkouchi, and Valérie Vilgrain. 2022. “Generative Adversarial Networks (GAN)-Based Data Augmentation of Rare Liver Cancers: The SFR 2021 Artificial Intelligence Data Challenge.” ScienceDirect. https://www.sciencedirect.com/science/article/pii/S2211568422001711.
  39. Shapiro, Danny. 2023. “Generative AI Revs Up New Age in Auto Industry, from Design and Engineering to Production and Sales.” NVIDIA Blog. https://blogs.nvidia.com/blog/generative-ai-auto-industry/.
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Implement real-world applications of LLMs and generative AI
  • Fine-tune models with PEFT and LoRA to speed up training
  • Expand your LLM toolbox with Retrieval Augmented Generation (RAG) techniques, LangChain, and LlamaIndex
  • Purchase of the print or Kindle book includes a free eBook in PDF format

Description

Become an expert in Generative AI through immersive, hands-on projects that leverage today’s most powerful models for Natural Language Processing (NLP) and computer vision. Generative AI with Python and PyTorch is your end-to-end guide to creating advanced AI applications, made easy by Raghav Bali, a seasoned data scientist with multiple patents in AI, and Joseph Babcock, a PhD and machine learning expert. Through business-tested approaches, this book simplifies complex GenAI concepts, making learning both accessible and immediately applicable. From NLP to image generation, this second edition explores practical applications and the underlying theories that power these technologies. By integrating the latest advancements in LLMs, it prepares you to design and implement powerful AI systems that transform data into actionable intelligence. You’ll build your versatile LLM toolkit by gaining expertise in GPT-4, LangChain, RLHF, LoRA, RAG, and more. You’ll also explore deep learning techniques for image generation and apply styler transfer using GANs, before advancing to implement CLIP and diffusion models. Whether you’re generating dynamic content or developing complex AI-driven solutions, this book equips you with everything you need to harness the full transformative power of Python and AI.

Who is this book for?

This book is for data scientists, machine learning engineers, and software developers seeking practical skills in building generative AI systems. A basic understanding of math and statistics and experience with Python coding is required.

What you will learn

  • Grasp the core concepts and capabilities of LLMs
  • Craft effective prompts using chain-of-thought, ReAct, and prompt query language to guide LLMs toward your desired outputs
  • Understand how attention and transformers have changed NLP
  • Optimize your diffusion models by combining them with VAEs
  • Build text generation pipelines based on LSTMs and LLMs
  • Leverage the power of open-source LLMs, such as Llama and Mistral, for diverse applications
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Last updated date : Apr 14, 2025
Publication date : Mar 28, 2025
Length: 450 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835884447
Vendor :
Facebook
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Last updated date : Apr 14, 2025
Publication date : Mar 28, 2025
Length: 450 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835884447
Vendor :
Facebook
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Table of Contents

17 Chapters
Introduction to Generative AI: Drawing Data from Models Chevron down icon Chevron up icon
Building Blocks of Deep Neural Networks Chevron down icon Chevron up icon
The Rise of Methods for Text Generation Chevron down icon Chevron up icon
NLP 2.0: Using Transformers to Generate Text Chevron down icon Chevron up icon
LLM Foundations Chevron down icon Chevron up icon
Open-Source LLMs Chevron down icon Chevron up icon
Prompt Engineering Chevron down icon Chevron up icon
LLM Toolbox Chevron down icon Chevron up icon
LLM Optimization Techniques Chevron down icon Chevron up icon
Emerging Applications in Generative AI Chevron down icon Chevron up icon
Neural Networks Using VAEs Chevron down icon Chevron up icon
Image Generation with GANs Chevron down icon Chevron up icon
Style Transfer with GANs Chevron down icon Chevron up icon
Deepfakes with GANs Chevron down icon Chevron up icon
Diffusion Models and AI Art Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Leonard Hall Jul 23, 2025
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a masterpiece! As someone actively working in AI, I can confidently say Generative AI with Python and PyTorch (Second Edition) is one of the most comprehensive and engaging resources on the subject. Whether you're exploring generative models for the first time or looking to deepen your expertise in LLMs, VAEs, GANs, or diffusion models, this book delivers with clarity, hands-on depth, and authority. Joseph Babcock and Raghav Bali do an outstanding job of walking readers through the foundational concepts of generative AI—from the math of probability and Bayes’ theorem to the latest advancements in large language models (LLMs) and Stable Diffusion. Each chapter builds logically on the previous one, with lucid explanations and real-world code examples using PyTorch and Hugging Face that make even complex topics approachable.
Feefo Verified review Feefo
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
Modal Close icon
Modal Close icon