|
| 1 | +# Stable Diffusion |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Stable Diffusion was proposed in [Stable Diffusion Announcement](https://stability.ai/blog/stable-diffusion-announcement) by Patrick Esser and Robin Rombach and the Stability AI team. |
| 6 | + |
| 7 | +The summary of the model is the following: |
| 8 | + |
| 9 | +*Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs. You can see some of the amazing output that has been created by this model without pre or post-processing on this page. The model itself builds upon the work of the team at CompVis and Runway in their widely used latent diffusion model combined with insights from the conditional diffusion models by our lead generative AI developer Katherine Crowson, Dall-E 2 by Open AI, Imagen by Google Brain and many others. We are delighted that AI media generation is a cooperative field and hope it can continue this way to bring the gift of creativity to all.* |
| 10 | + |
| 11 | +## Tips: |
| 12 | + |
| 13 | +- Stable Diffusion has the same architecture as [Latent Diffusion](https://arxiv.org/abs/2112.10752) but uses a frozen CLIP Text Encoder instead of training the text encoder jointly with the diffusion model. |
| 14 | +- An in-detail explanation of the Stable Diffusion model can be found under [Stable Diffusion with 🧨 Diffusers](https://huggingface.co/blog/stable_diffusion). |
| 15 | +- Stable Diffusion can work with a variety of different samplers as is shown below. |
| 16 | + |
| 17 | +## Available Pipelines: |
| 18 | + |
| 19 | +| Pipeline | Tasks | Colab |
| 20 | +|---|---|:---:| |
| 21 | +| [pipeline_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) | *Text-to-Image Generation* | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) |
| 22 | +| [pipeline_stable_diffusion_img2img](https://github.com/huggingface/diffusers/blob/e3238c0e4bd8f8ae23e8ac225b46af148ae11e40/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | *Image-to-Image Text-Guided Generation* | [](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) |
| 23 | +| [pipeline_stable_diffusion_inpaint](https://github.com/huggingface/diffusers/blob/e3238c0e4bd8f8ae23e8ac225b46af148ae11e40/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | *Text-Guided Image Inpainting* | [](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/in_painting_with_stable_diffusion_using_diffusers.ipynb) |
| 24 | + |
| 25 | +## Examples: |
| 26 | + |
| 27 | +### Text-to-Image with default PLMS scheduler |
| 28 | + |
| 29 | +```python |
| 30 | +# make sure you're logged in with `huggingface-cli login` |
| 31 | +from torch import autocast |
| 32 | +from diffusers import StableDiffusionPipeline |
| 33 | + |
| 34 | +pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True) |
| 35 | +pipe = pipe.to("cuda") |
| 36 | + |
| 37 | +prompt = "a photo of an astronaut riding a horse on mars" |
| 38 | +with autocast("cuda"): |
| 39 | + image = pipe(prompt)["sample"][0] |
| 40 | + |
| 41 | +image.save("astronaut_rides_horse.png") |
| 42 | +``` |
| 43 | + |
| 44 | +### Text-to-Image with DDIM scheduler |
| 45 | + |
| 46 | +```python |
| 47 | +# make sure you're logged in with `huggingface-cli login` |
| 48 | +from torch import autocast |
| 49 | +from diffusers import StableDiffusionPipeline, DDIMScheduler |
| 50 | + |
| 51 | +scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False) |
| 52 | + |
| 53 | +pipe = StableDiffusionPipeline.from_pretrained( |
| 54 | + "CompVis/stable-diffusion-v1-4", |
| 55 | + scheduler=scheduler, |
| 56 | + use_auth_token=True |
| 57 | +).to("cuda") |
| 58 | + |
| 59 | +prompt = "a photo of an astronaut riding a horse on mars" |
| 60 | +with autocast("cuda"): |
| 61 | + image = pipe(prompt)["sample"][0] |
| 62 | + |
| 63 | +image.save("astronaut_rides_horse.png") |
| 64 | +``` |
| 65 | + |
| 66 | +### Text-to-Image with K-LMS scheduler |
| 67 | + |
| 68 | +```python |
| 69 | +# make sure you're logged in with `huggingface-cli login` |
| 70 | +from torch import autocast |
| 71 | +from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler |
| 72 | + |
| 73 | +lms = LMSDiscreteScheduler( |
| 74 | + beta_start=0.00085, |
| 75 | + beta_end=0.012, |
| 76 | + beta_schedule="scaled_linear" |
| 77 | +) |
| 78 | + |
| 79 | +pipe = StableDiffusionPipeline.from_pretrained( |
| 80 | + "CompVis/stable-diffusion-v1-4", |
| 81 | + scheduler=lms, |
| 82 | + use_auth_token=True |
| 83 | +).to("cuda") |
| 84 | + |
| 85 | +prompt = "a photo of an astronaut riding a horse on mars" |
| 86 | +with autocast("cuda"): |
| 87 | + image = pipe(prompt)["sample"][0] |
| 88 | + |
| 89 | +image.save("astronaut_rides_horse.png") |
| 90 | +``` |
0 commit comments