Skip to content

Merge changes #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
bd96a08
[train_dreambooth_lora.py] Set LANCZOS as default interpolation mode …
merterbak Apr 26, 2025
aa5f5d4
[tests] add tests to check for graph breaks, recompilation, cuda sync…
sayakpaul Apr 28, 2025
9ce89e2
enable group_offload cases and quanto cases on XPU (#11405)
yao-matrix Apr 28, 2025
a7e9f85
enable test_layerwise_casting_memory cases on XPU (#11406)
yao-matrix Apr 28, 2025
0e3f271
[tests] fix import. (#11434)
sayakpaul Apr 28, 2025
b3b04fe
[train_text_to_image] Better image interpolation in training scripts …
tongyu0924 Apr 28, 2025
3da98e7
[train_text_to_image_lora] Better image interpolation in training scr…
tongyu0924 Apr 28, 2025
7567adf
enable 28 GGUF test cases on XPU (#11404)
yao-matrix Apr 28, 2025
0ac1d5b
[Hi-Dream LoRA] fix bug in validation (#11439)
linoytsaban Apr 28, 2025
4a9ab65
Fixing missing provider options argument (#11397)
urpetkov-amd Apr 28, 2025
58431f1
Set LANCZOS as the default interpolation for image resizing in Contro…
YoulunPeng Apr 29, 2025
8fe5a14
Raise warning instead of error for block offloading with streams (#11…
a-r-r-o-w Apr 30, 2025
60892c5
enable marigold_intrinsics cases on XPU (#11445)
yao-matrix Apr 30, 2025
c865115
`torch.compile` fullgraph compatibility for Hunyuan Video (#11457)
a-r-r-o-w Apr 30, 2025
fbe2fe5
enable consistency test cases on XPU, all passed (#11446)
yao-matrix Apr 30, 2025
35fada4
enable unidiffuser test cases on xpu (#11444)
yao-matrix Apr 30, 2025
fbce7ae
Add generic support for Intel Gaudi accelerator (hpu device) (#11328)
dsocek Apr 30, 2025
8cd7426
Add StableDiffusion3InstructPix2PixPipeline (#11378)
xduzhangjiayu Apr 30, 2025
23c9802
make safe diffusion test cases pass on XPU and A100 (#11458)
yao-matrix Apr 30, 2025
38ced7e
[test_models_transformer_hunyuan_video] help us test torch.compile() …
tongyu0924 Apr 30, 2025
daf0a23
Add LANCZOS as default interplotation mode. (#11463)
Va16hav07 Apr 30, 2025
06beeca
make autoencoders. controlnet_flux and wan_transformer3d_single_file …
yao-matrix Apr 30, 2025
d70f8ee
[WAN] fix recompilation issues (#11475)
sayakpaul May 1, 2025
86294d3
Fix typos in docs and comments (#11416)
co63oc May 1, 2025
5dcdf4a
[tests] xfail recent pipeline tests for specific methods. (#11469)
sayakpaul May 1, 2025
d0c0239
cache packages_distributions (#11453)
vladmandic May 1, 2025
b848d47
[docs] Memory optims (#11385)
stevhliu May 1, 2025
e23705e
[docs] Adapters (#11331)
stevhliu May 2, 2025
ed6cf52
[train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default inte…
yuanjua May 2, 2025
ec3d582
[train_dreambooth_lora_flux_advanced] Add LANCZOS as the default inte…
ysurs May 2, 2025
a674914
enable semantic diffusion and stable diffusion panorama cases on XPU …
yao-matrix May 5, 2025
8520d49
[Feature] Implement tiled VAE encoding/decoding for Wan model. (#11414)
c8ef May 5, 2025
fc5e906
[train_text_to_image_sdxl]Add LANCZOS as default interpolation mode f…
ParagEkbote May 5, 2025
ec93239
[train_dreambooth_lora_sdxl] Add --image_interpolation_mode option fo…
MinJu-Ha May 5, 2025
ee1516e
[train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolat…
cjfghk5697 May 5, 2025
071807c
[training] feat: enable quantization for hidream lora training. (#11494)
sayakpaul May 5, 2025
9c29e93
Set LANCZOS as the default interpolation method for image resizing. (…
yijun-lee May 5, 2025
ed4efbd
Update training script for txt to img sdxl with lora supp with new in…
RogerSinghChugh May 5, 2025
1fa5639
Fix torchao docs typo for fp8 granular quantization (#11473)
a-r-r-o-w May 6, 2025
53f1043
Update setup.py to pin min version of `peft` (#11502)
sayakpaul May 6, 2025
d88ae1f
update dep table. (#11504)
sayakpaul May 6, 2025
10bee52
[LoRA] use `removeprefix` to preserve sanity. (#11493)
sayakpaul May 6, 2025
d7ffe60
Hunyuan Video Framepack (#11428)
a-r-r-o-w May 6, 2025
8c661ea
enable lora cases on XPU (#11506)
yao-matrix May 6, 2025
7937166
[lora_conversion] Enhance key handling for OneTrainer components in L…
iamwavecut May 6, 2025
fb29132
[docs] minor updates to bitsandbytes docs. (#11509)
sayakpaul May 6, 2025
7b90494
Cosmos (#10660)
a-r-r-o-w May 7, 2025
53bd367
clean up the __Init__ for stable_diffusion (#11500)
yiyixuxu May 7, 2025
87e508f
fix audioldm
sayakpaul May 8, 2025
c5c34a4
Revert "fix audioldm"
sayakpaul May 8, 2025
66e50d4
[LoRA] make lora alpha and dropout configurable (#11467)
linoytsaban May 8, 2025
784db0e
Add cross attention type for Sana-Sprint training in diffusers. (#11514)
scxue May 8, 2025
6674a51
Conditionally import torchvision in Cosmos transformer (#11524)
a-r-r-o-w May 8, 2025
393aefc
[tests] fix audioldm2 for transformers main. (#11522)
sayakpaul May 8, 2025
599c887
feat: pipeline-level quantization config (#11130)
sayakpaul May 9, 2025
7acf834
[Tests] Enable more general testing for `torch.compile()` with LoRA h…
sayakpaul May 9, 2025
0c47c95
[LoRA] support non-diffusers hidream loras (#11532)
sayakpaul May 9, 2025
2d38089
enable 7 cases on XPU (#11503)
yao-matrix May 9, 2025
3c0a012
[LTXPipeline] Update latents dtype to match VAE dtype (#11533)
james-p-xu May 9, 2025
d6bf268
enable dit integration cases on xpu (#11523)
yao-matrix May 9, 2025
0ba1f76
enable print_env on xpu (#11507)
yao-matrix May 9, 2025
92fe689
Change Framepack transformer layer initialization order (#11535)
a-r-r-o-w May 9, 2025
01abfc8
[tests] add tests for framepack transformer model. (#11520)
sayakpaul May 11, 2025
e48f6ae
Hunyuan Video Framepack F1 (#11534)
a-r-r-o-w May 12, 2025
c372615
enable several pipeline integration tests on XPU (#11526)
yao-matrix May 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ jobs:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
Expand Down Expand Up @@ -180,6 +181,55 @@ jobs:
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_torch_compile_tests:
name: PyTorch Compile CUDA tests

runs-on:
group: aws-g4dn-2xlarge

container:
image: diffusers/diffusers-pytorch-compile-cuda
options: --gpus 0 --shm-size "16gb" --ipc host

steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test,training]
- name: Environment
run: |
python utils/print_env.py
- name: Run torch compile tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_compile_cuda_failures_short.txt

- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_compile_test_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_big_gpu_torch_tests:
name: Torch tests on big GPU
strategy:
Expand Down Expand Up @@ -476,6 +526,60 @@ jobs:
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_pipeline_level_quantization_tests:
name: Torch quantization nightly tests
strategy:
fail-fast: false
max-parallel: 2
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "20gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install -U bitsandbytes optimum_quanto
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
- name: Pipeline-level quantization tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
BIG_GPU_MEMORY: 40
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_pipeline_level_quant_torch_cuda \
--report-log=tests_pipeline_level_quant_torch_cuda.log \
tests/quantization/test_pipeline_level_quantization.py
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_pipeline_level_quant_torch_cuda_stats.txt
cat reports/tests_pipeline_level_quant_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_cuda_pipeline_level_quant_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

# M1 runner currently not well supported
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
# run_nightly_tests_apple_m1:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ on:
- "tests/**.py"
- ".github/**.yml"
- "utils/**.py"
- "setup.py"
push:
branches:
- ci-*
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release_tests_fast.yml
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ jobs:
- name: Environment
run: |
python utils/print_env.py
- name: Run example tests on GPU
- name: Run torch compile tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
Expand Down
41 changes: 24 additions & 17 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,8 @@
title: AutoPipeline
- local: tutorials/basic_training
title: Train a diffusion model
- local: tutorials/using_peft_for_inference
title: Load LoRAs for inference
- local: tutorials/fast_diffusion
title: Accelerate inference of text-to-image diffusion models
- local: tutorials/inference_with_big_models
title: Working with big models
title: Tutorials
- sections:
- local: using-diffusers/loading
Expand All @@ -33,11 +29,24 @@
title: Load schedulers and models
- local: using-diffusers/other-formats
title: Model files and layouts
- local: using-diffusers/loading_adapters
title: Load adapters
- local: using-diffusers/push_to_hub
title: Push files to the Hub
title: Load pipelines and adapters
- sections:
- local: tutorials/using_peft_for_inference
title: LoRA
- local: using-diffusers/ip_adapter
title: IP-Adapter
- local: using-diffusers/controlnet
title: ControlNet
- local: using-diffusers/t2i_adapter
title: T2I-Adapter
- local: using-diffusers/dreambooth
title: DreamBooth
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
title: Adapters
isExpanded: false
- sections:
- local: using-diffusers/unconditional_image_generation
title: Unconditional image generation
Expand All @@ -59,8 +68,6 @@
title: Create a server
- local: training/distributed_inference
title: Distributed inference
- local: using-diffusers/merge_loras
title: Merge LoRAs
- local: using-diffusers/scheduler_features
title: Scheduler features
- local: using-diffusers/callback
Expand Down Expand Up @@ -97,20 +104,12 @@
title: SDXL Turbo
- local: using-diffusers/kandinsky
title: Kandinsky
- local: using-diffusers/ip_adapter
title: IP-Adapter
- local: using-diffusers/omnigen
title: OmniGen
- local: using-diffusers/pag
title: PAG
- local: using-diffusers/controlnet
title: ControlNet
- local: using-diffusers/t2i_adapter
title: T2I-Adapter
- local: using-diffusers/inference_with_lcm
title: Latent Consistency Model
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
- local: using-diffusers/shap-e
title: Shap-E
- local: using-diffusers/diffedit
Expand Down Expand Up @@ -180,7 +179,7 @@
title: Quantization Methods
- sections:
- local: optimization/fp16
title: Speed up inference
title: Accelerate inference
- local: optimization/memory
title: Reduce memory usage
- local: optimization/torch2.0
Expand Down Expand Up @@ -296,6 +295,8 @@
title: CogView4Transformer2DModel
- local: api/models/consisid_transformer3d
title: ConsisIDTransformer3DModel
- local: api/models/cosmos_transformer3d
title: CosmosTransformer3DModel
- local: api/models/dit_transformer2d
title: DiTTransformer2DModel
- local: api/models/easyanimate_transformer3d
Expand Down Expand Up @@ -364,6 +365,8 @@
title: AutoencoderKLAllegro
- local: api/models/autoencoderkl_cogvideox
title: AutoencoderKLCogVideoX
- local: api/models/autoencoderkl_cosmos
title: AutoencoderKLCosmos
- local: api/models/autoencoder_kl_hunyuan_video
title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoderkl_ltx_video
Expand Down Expand Up @@ -434,6 +437,8 @@
title: ControlNet-XS with Stable Diffusion XL
- local: api/pipelines/controlnet_union
title: ControlNetUnion
- local: api/pipelines/cosmos
title: Cosmos
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
Expand All @@ -452,6 +457,8 @@
title: Flux
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/framepack
title: Framepack
- local: api/pipelines/hidream
title: HiDream-I1
- local: api/pipelines/hunyuandit
Expand Down
40 changes: 40 additions & 0 deletions docs/source/en/api/models/autoencoderkl_cosmos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLCosmos

[Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer).

Supported models:
- [nvidia/Cosmos-1.0-Tokenizer-CV8x8x8](https://huggingface.co/nvidia/Cosmos-1.0-Tokenizer-CV8x8x8)

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLCosmos

vae = AutoencoderKLCosmos.from_pretrained("nvidia/Cosmos-1.0-Tokenizer-CV8x8x8", subfolder="vae")
```

## AutoencoderKLCosmos

[[autodoc]] AutoencoderKLCosmos
- decode
- encode
- all

## AutoencoderKLOutput

[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
30 changes: 30 additions & 0 deletions docs/source/en/api/models/cosmos_transformer3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# CosmosTransformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.

The model can be loaded with the following code snippet.

```python
from diffusers import CosmosTransformer3DModel

transformer = CosmosTransformer3DModel.from_pretrained("nvidia/Cosmos-1.0-Diffusion-7B-Text2World", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## CosmosTransformer3DModel

[[autodoc]] CosmosTransformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/animatediff.md
Original file line number Diff line number Diff line change
Expand Up @@ -966,7 +966,7 @@ pipe.to("cuda")
prompt = {
0: "A caterpillar on a leaf, high quality, photorealistic",
40: "A caterpillar transforming into a cocoon, on a leaf, near flowers, photorealistic",
80: "A cocoon on a leaf, flowers in the backgrond, photorealistic",
80: "A cocoon on a leaf, flowers in the background, photorealistic",
120: "A cocoon maturing and a butterfly being born, flowers and leaves visible in the background, photorealistic",
160: "A beautiful butterfly, vibrant colors, sitting on a leaf, flowers in the background, photorealistic",
200: "A beautiful butterfly, flying away in a forest, photorealistic",
Expand Down
41 changes: 41 additions & 0 deletions docs/source/en/api/pipelines/cosmos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

# Cosmos

[Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.

*Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.*

<Tip>

Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.

</Tip>

## CosmosTextToWorldPipeline

[[autodoc]] CosmosTextToWorldPipeline
- all
- __call__

## CosmosVideoToWorldPipeline

[[autodoc]] CosmosVideoToWorldPipeline
- all
- __call__

## CosmosPipelineOutput

[[autodoc]] pipelines.cosmos.pipeline_output.CosmosPipelineOutput
Loading
Loading