Skip to content

Commit dbd6a89

Browse files
authored
Merge branch 'huggingface:main' into main
2 parents 944d149 + 2ba42aa commit dbd6a89

File tree

129 files changed

+8174
-1711
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

129 files changed

+8174
-1711
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ jobs:
6161
- name: Install dependencies
6262
run: |
6363
python -m pip install -e .[quality,test]
64-
python -m pip install git+https://github.com/huggingface/accelerate
6564
python -m pip install -U git+https://github.com/huggingface/transformers
6665
6766
- name: Environment
@@ -135,7 +134,6 @@ jobs:
135134
${CONDA_RUN} python -m pip install --upgrade pip
136135
${CONDA_RUN} python -m pip install -e .[quality,test]
137136
${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
138-
${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
139137
140138
- name: Environment
141139
shell: arch -arch arm64 bash {0}

.github/workflows/pr_tests.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,6 @@ jobs:
5959
run: |
6060
apt-get update && apt-get install libsndfile1-dev -y
6161
python -m pip install -e .[quality,test]
62-
python -m pip install git+https://github.com/huggingface/accelerate
6362
python -m pip install -U git+https://github.com/huggingface/transformers
6463
6564
- name: Environment
@@ -127,7 +126,6 @@ jobs:
127126
${CONDA_RUN} python -m pip install --upgrade pip
128127
${CONDA_RUN} python -m pip install -e .[quality,test]
129128
${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
130-
${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
131129
${CONDA_RUN} python -m pip install -U git+https://github.com/huggingface/transformers
132130
133131
- name: Environment

.github/workflows/push_tests.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ jobs:
6161
- name: Install dependencies
6262
run: |
6363
python -m pip install -e .[quality,test]
64-
python -m pip install git+https://github.com/huggingface/accelerate
6564
python -m pip install -U git+https://github.com/huggingface/transformers
6665
6766
- name: Environment
@@ -131,7 +130,6 @@ jobs:
131130
- name: Install dependencies
132131
run: |
133132
python -m pip install -e .[quality,test,training]
134-
python -m pip install git+https://github.com/huggingface/accelerate
135133
python -m pip install -U git+https://github.com/huggingface/transformers
136134
137135
- name: Environment

README.md

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,55 @@ images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).
235235
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
236236
```
237237

238+
Diffusers also has a Image-to-Image generation pipeline with Flax/Jax
239+
```python
240+
import jax
241+
import numpy as np
242+
import jax.numpy as jnp
243+
from flax.jax_utils import replicate
244+
from flax.training.common_utils import shard
245+
import requests
246+
from io import BytesIO
247+
from PIL import Image
248+
from diffusers import FlaxStableDiffusionImg2ImgPipeline
249+
250+
def create_key(seed=0):
251+
return jax.random.PRNGKey(seed)
252+
rng = create_key(0)
253+
254+
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
255+
response = requests.get(url)
256+
init_img = Image.open(BytesIO(response.content)).convert("RGB")
257+
init_img = init_img.resize((768, 512))
258+
259+
prompts = "A fantasy landscape, trending on artstation"
260+
261+
pipeline, params = FlaxStableDiffusionImg2ImgPipeline.from_pretrained(
262+
"CompVis/stable-diffusion-v1-4", revision="flax",
263+
dtype=jnp.bfloat16,
264+
)
265+
266+
num_samples = jax.device_count()
267+
rng = jax.random.split(rng, jax.device_count())
268+
prompt_ids, processed_image = pipeline.prepare_inputs(prompt=[prompts]*num_samples, image = [init_img]*num_samples)
269+
p_params = replicate(params)
270+
prompt_ids = shard(prompt_ids)
271+
processed_image = shard(processed_image)
272+
273+
output = pipeline(
274+
prompt_ids=prompt_ids,
275+
image=processed_image,
276+
params=p_params,
277+
prng_seed=rng,
278+
strength=0.75,
279+
num_inference_steps=50,
280+
jit=True,
281+
height=512,
282+
width=768).images
283+
284+
output_images = pipeline.numpy_to_pil(np.asarray(output.reshape((num_samples,) + output.shape[-3:])))
285+
```
286+
238287
### Image-to-Image text-guided generation with Stable Diffusion
239288

240289
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
@@ -302,11 +351,8 @@ image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
302351

303352
### Tweak prompts reusing seeds and latents
304353

305-
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb).
306-
307-
308-
For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb)
309-
and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0).
354+
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked.
355+
Please have a look at [Reusing seeds for deterministic generation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/reusing_seeds).
310356

311357
## Fine-Tuning Stable Diffusion
312358

docs/source/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@
2828
title: "Text-Guided Image-Inpainting"
2929
- local: using-diffusers/depth2img
3030
title: "Text-Guided Depth-to-Image"
31+
- local: using-diffusers/reusing_seeds
32+
title: "Reusing seeds for deterministic generation"
3133
- local: using-diffusers/custom_pipeline_examples
3234
title: "Community Pipelines"
3335
- local: using-diffusers/contribute_pipeline
@@ -120,6 +122,8 @@
120122
title: "Stochastic Karras VE"
121123
- local: api/pipelines/dance_diffusion
122124
title: "Dance Diffusion"
125+
- local: api/pipelines/unclip
126+
title: "UnCLIP"
123127
- local: api/pipelines/versatile_diffusion
124128
title: "Versatile Diffusion"
125129
- local: api/pipelines/vq_diffusion

docs/source/api/models.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,12 @@ The models are built on the base class ['ModelMixin'] that is a `torch.nn.module
5858
## Transformer2DModelOutput
5959
[[autodoc]] models.attention.Transformer2DModelOutput
6060

61+
## PriorTransformer
62+
[[autodoc]] models.prior_transformer.PriorTransformer
63+
64+
## PriorTransformerOutput
65+
[[autodoc]] models.prior_transformer.PriorTransformerOutput
66+
6167
## FlaxModelMixin
6268
[[autodoc]] FlaxModelMixin
6369

docs/source/api/pipelines/overview.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ available a colab notebook to directly try them out.
6565
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
6666
| [stable_diffusion_safe](./stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
6767
| [stochastic_karras_ve](./stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
68+
| [unclip](./unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
6869
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
6970
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
7071
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
@@ -138,9 +139,9 @@ from diffusers import StableDiffusionImg2ImgPipeline
138139

139140
# load the pipeline
140141
device = "cuda"
141-
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
142-
"runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16
143-
).to(device)
142+
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
143+
device
144+
)
144145

145146
# let's download an initial image
146147
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
@@ -188,7 +189,6 @@ mask_image = download_image(mask_url).resize((512, 512))
188189

189190
pipe = StableDiffusionInpaintPipeline.from_pretrained(
190191
"runwayml/stable-diffusion-inpainting",
191-
revision="fp16",
192192
torch_dtype=torch.float16,
193193
)
194194
pipe = pipe.to("cuda")

docs/source/api/pipelines/stable_diffusion_2.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ import torch
113113

114114
# load model and scheduler
115115
model_id = "stabilityai/stable-diffusion-x4-upscaler"
116-
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
116+
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
117117
pipeline = pipeline.to("cuda")
118118

119119
# let's download an image
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# unCLIP
11+
12+
## Overview
13+
14+
[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
15+
16+
The abstract of the paper is the following:
17+
18+
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples.
19+
20+
The unCLIP model in diffusers comes from kakaobrain's karlo and the original codebase can be found [here](https://github.com/kakaobrain/karlo). Additionally, lucidrains has a DALL-E 2 recreation [here](https://github.com/lucidrains/DALLE2-pytorch).
21+
22+
## Available Pipelines:
23+
24+
| Pipeline | Tasks | Colab
25+
|---|---|:---:|
26+
| [pipeline_unclip.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/unclip/pipeline_unclip.py) | *Text-to-Image Generation* | - |
27+
28+
29+
## UnCLIPPipeline
30+
[[autodoc]] pipelines.unclip.pipeline_unclip.UnCLIPPipeline
31+
- __call__
32+
[[autodoc]] pipelines.unclip.pipeline_unclip_image_variation.UnCLIPImageVariationPipeline
33+
- __call__

docs/source/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ available a colab notebook to directly try them out.
5555
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
5656
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
5757
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
58+
| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
5859
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
5960
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
6061
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |

0 commit comments

Comments
 (0)