huggingface
diff --git a/‎.github/workflows/nightly_tests.yml‎
Lines changed: 0 additions & 2 deletions b/‎.github/workflows/nightly_tests.yml‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎.github/workflows/pr_tests.yml‎
Lines changed: 0 additions & 2 deletions b/‎.github/workflows/pr_tests.yml‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎.github/workflows/push_tests.yml‎
Lines changed: 0 additions & 2 deletions b/‎.github/workflows/push_tests.yml‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 51 additions & 5 deletions b/‎README.md‎
Lines changed: 51 additions & 5 deletions
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 4 additions & 0 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/api/models.mdx‎
Lines changed: 6 additions & 0 deletions b/‎docs/source/api/models.mdx‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/overview.mdx‎
Lines changed: 4 additions & 4 deletions b/‎docs/source/api/pipelines/overview.mdx‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/source/api/pipelines/stable_diffusion_2.mdx‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/api/pipelines/stable_diffusion_2.mdx‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/api/pipelines/unclip.mdx‎
Lines changed: 33 additions & 0 deletions b/‎docs/source/api/pipelines/unclip.mdx‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎docs/source/index.mdx‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/index.mdx‎
Lines changed: 1 addition & 0 deletions
@@ -61,7 +61,6 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install -e .[quality,test]
-          python -m pip install git+https://github.com/huggingface/accelerate
           python -m pip install -U git+https://github.com/huggingface/transformers
 
       - name: Environment
@@ -135,7 +134,6 @@ jobs:
           ${CONDA_RUN} python -m pip install --upgrade pip
           ${CONDA_RUN} python -m pip install -e .[quality,test]
           ${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
-          ${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
 
       - name: Environment
         shell: arch -arch arm64 bash {0}
 
@@ -59,7 +59,6 @@ jobs:
       run: |
         apt-get update && apt-get install libsndfile1-dev -y
         python -m pip install -e .[quality,test]
-        python -m pip install git+https://github.com/huggingface/accelerate
         python -m pip install -U git+https://github.com/huggingface/transformers
 
     - name: Environment
@@ -127,7 +126,6 @@ jobs:
         ${CONDA_RUN} python -m pip install --upgrade pip
         ${CONDA_RUN} python -m pip install -e .[quality,test]
         ${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
-        ${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
         ${CONDA_RUN} python -m pip install -U git+https://github.com/huggingface/transformers
 
     - name: Environment
 
@@ -61,7 +61,6 @@ jobs:
     - name: Install dependencies
       run: |
         python -m pip install -e .[quality,test]
-        python -m pip install git+https://github.com/huggingface/accelerate
         python -m pip install -U git+https://github.com/huggingface/transformers
 
     - name: Environment
@@ -131,7 +130,6 @@ jobs:
     - name: Install dependencies
       run: |
         python -m pip install -e .[quality,test,training]
-        python -m pip install git+https://github.com/huggingface/accelerate
         python -m pip install -U git+https://github.com/huggingface/transformers
 
     - name: Environment
 
@@ -235,6 +235,55 @@ images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).
 images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
 ```
 
+Diffusers also has a Image-to-Image generation pipeline with Flax/Jax
+```python
+import jax
+import numpy as np
+import jax.numpy as jnp
+from flax.jax_utils import replicate
+from flax.training.common_utils import shard
+import requests
+from io import BytesIO
+from PIL import Image
+from diffusers import FlaxStableDiffusionImg2ImgPipeline
+
+def create_key(seed=0):
+    return jax.random.PRNGKey(seed)
+rng = create_key(0)
+
+url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
+response = requests.get(url)
+init_img = Image.open(BytesIO(response.content)).convert("RGB")
+init_img = init_img.resize((768, 512))
+
+prompts = "A fantasy landscape, trending on artstation"
+
+pipeline, params = FlaxStableDiffusionImg2ImgPipeline.from_pretrained(
+    "CompVis/stable-diffusion-v1-4", revision="flax",
+    dtype=jnp.bfloat16,
+)
+
+num_samples = jax.device_count()
+rng = jax.random.split(rng, jax.device_count())
+prompt_ids, processed_image = pipeline.prepare_inputs(prompt=[prompts]*num_samples, image = [init_img]*num_samples)
+p_params = replicate(params)
+prompt_ids = shard(prompt_ids)
+processed_image = shard(processed_image)
+
+output = pipeline(
+    prompt_ids=prompt_ids, 
+    image=processed_image, 
+    params=p_params, 
+    prng_seed=rng, 
+    strength=0.75, 
+    num_inference_steps=50, 
+    jit=True, 
+    height=512,
+    width=768).images
+
+output_images = pipeline.numpy_to_pil(np.asarray(output.reshape((num_samples,) + output.shape[-3:])))
+```
+
 ### Image-to-Image text-guided generation with Stable Diffusion
 
 The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
@@ -302,11 +351,8 @@ image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
 
 ### Tweak prompts reusing seeds and latents
 
-You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb).
-
-
-For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb)
-and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0).
+You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked.
+Please have a look at [Reusing seeds for deterministic generation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/reusing_seeds).
 
 ## Fine-Tuning Stable Diffusion
 
 
@@ -28,6 +28,8 @@
       title: "Text-Guided Image-Inpainting"
     - local: using-diffusers/depth2img
       title: "Text-Guided Depth-to-Image"
+    - local: using-diffusers/reusing_seeds
+      title: "Reusing seeds for deterministic generation"
     - local: using-diffusers/custom_pipeline_examples
       title: "Community Pipelines"
     - local: using-diffusers/contribute_pipeline
@@ -120,6 +122,8 @@
       title: "Stochastic Karras VE"
     - local: api/pipelines/dance_diffusion
       title: "Dance Diffusion"
+    - local: api/pipelines/unclip
+      title: "UnCLIP"
     - local: api/pipelines/versatile_diffusion
       title: "Versatile Diffusion"
     - local: api/pipelines/vq_diffusion
 
@@ -58,6 +58,12 @@ The models are built on the base class ['ModelMixin'] that is a `torch.nn.module
 ## Transformer2DModelOutput
 [[autodoc]] models.attention.Transformer2DModelOutput
 
+## PriorTransformer
+[[autodoc]] models.prior_transformer.PriorTransformer
+
+## PriorTransformerOutput
+[[autodoc]] models.prior_transformer.PriorTransformerOutput
+
 ## FlaxModelMixin
 [[autodoc]] FlaxModelMixin
 
 
@@ -65,6 +65,7 @@ available a colab notebook to directly try them out.
 | [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
 | [stable_diffusion_safe](./stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
 | [stochastic_karras_ve](./stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
+| [unclip](./unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
 | [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | 
 | [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | 
 | [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | 
@@ -138,9 +139,9 @@ from diffusers import StableDiffusionImg2ImgPipeline
 
 # load the pipeline
 device = "cuda"
-pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16
-).to(device)
+pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
+    device
+)
 
 # let's download an initial image
 url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
@@ -188,7 +189,6 @@ mask_image = download_image(mask_url).resize((512, 512))
 
 pipe = StableDiffusionInpaintPipeline.from_pretrained(
     "runwayml/stable-diffusion-inpainting",
-    revision="fp16",
     torch_dtype=torch.float16,
 )
 pipe = pipe.to("cuda")
 
@@ -113,7 +113,7 @@ import torch
 
 # load model and scheduler
 model_id = "stabilityai/stable-diffusion-x4-upscaler"
-pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
+pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipeline = pipeline.to("cuda")
 
 # let's download an  image
 
@@ -0,0 +1,33 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# unCLIP
+
+## Overview
+
+[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
+
+The abstract of the paper is the following:
+
+Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples.
+
+The unCLIP model in diffusers comes from kakaobrain's karlo and the original codebase can be found [here](https://github.com/kakaobrain/karlo). Additionally, lucidrains has a DALL-E 2 recreation [here](https://github.com/lucidrains/DALLE2-pytorch).
+
+## Available Pipelines:
+
+| Pipeline | Tasks | Colab
+|---|---|:---:|
+| [pipeline_unclip.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/unclip/pipeline_unclip.py) | *Text-to-Image Generation* | - |
+
+
+## UnCLIPPipeline
+[[autodoc]] pipelines.unclip.pipeline_unclip.UnCLIPPipeline
+    - __call__
+[[autodoc]] pipelines.unclip.pipeline_unclip_image_variation.UnCLIPImageVariationPipeline
+    - __call__
@@ -55,6 +55,7 @@ available a colab notebook to directly try them out.
 | [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
 | [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
 | [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | 
+| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
 | [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | 
 | [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | 
 | [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |