Skip to content

Merge changes #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
bd96a08
[train_dreambooth_lora.py] Set LANCZOS as default interpolation mode …
merterbak Apr 26, 2025
aa5f5d4
[tests] add tests to check for graph breaks, recompilation, cuda sync…
sayakpaul Apr 28, 2025
9ce89e2
enable group_offload cases and quanto cases on XPU (#11405)
yao-matrix Apr 28, 2025
a7e9f85
enable test_layerwise_casting_memory cases on XPU (#11406)
yao-matrix Apr 28, 2025
0e3f271
[tests] fix import. (#11434)
sayakpaul Apr 28, 2025
b3b04fe
[train_text_to_image] Better image interpolation in training scripts …
tongyu0924 Apr 28, 2025
3da98e7
[train_text_to_image_lora] Better image interpolation in training scr…
tongyu0924 Apr 28, 2025
7567adf
enable 28 GGUF test cases on XPU (#11404)
yao-matrix Apr 28, 2025
0ac1d5b
[Hi-Dream LoRA] fix bug in validation (#11439)
linoytsaban Apr 28, 2025
4a9ab65
Fixing missing provider options argument (#11397)
urpetkov-amd Apr 28, 2025
58431f1
Set LANCZOS as the default interpolation for image resizing in Contro…
YoulunPeng Apr 29, 2025
8fe5a14
Raise warning instead of error for block offloading with streams (#11…
a-r-r-o-w Apr 30, 2025
60892c5
enable marigold_intrinsics cases on XPU (#11445)
yao-matrix Apr 30, 2025
c865115
`torch.compile` fullgraph compatibility for Hunyuan Video (#11457)
a-r-r-o-w Apr 30, 2025
fbe2fe5
enable consistency test cases on XPU, all passed (#11446)
yao-matrix Apr 30, 2025
35fada4
enable unidiffuser test cases on xpu (#11444)
yao-matrix Apr 30, 2025
fbce7ae
Add generic support for Intel Gaudi accelerator (hpu device) (#11328)
dsocek Apr 30, 2025
8cd7426
Add StableDiffusion3InstructPix2PixPipeline (#11378)
xduzhangjiayu Apr 30, 2025
23c9802
make safe diffusion test cases pass on XPU and A100 (#11458)
yao-matrix Apr 30, 2025
38ced7e
[test_models_transformer_hunyuan_video] help us test torch.compile() …
tongyu0924 Apr 30, 2025
daf0a23
Add LANCZOS as default interplotation mode. (#11463)
Va16hav07 Apr 30, 2025
06beeca
make autoencoders. controlnet_flux and wan_transformer3d_single_file …
yao-matrix Apr 30, 2025
d70f8ee
[WAN] fix recompilation issues (#11475)
sayakpaul May 1, 2025
86294d3
Fix typos in docs and comments (#11416)
co63oc May 1, 2025
5dcdf4a
[tests] xfail recent pipeline tests for specific methods. (#11469)
sayakpaul May 1, 2025
d0c0239
cache packages_distributions (#11453)
vladmandic May 1, 2025
b848d47
[docs] Memory optims (#11385)
stevhliu May 1, 2025
e23705e
[docs] Adapters (#11331)
stevhliu May 2, 2025
ed6cf52
[train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default inte…
yuanjua May 2, 2025
ec3d582
[train_dreambooth_lora_flux_advanced] Add LANCZOS as the default inte…
ysurs May 2, 2025
a674914
enable semantic diffusion and stable diffusion panorama cases on XPU …
yao-matrix May 5, 2025
8520d49
[Feature] Implement tiled VAE encoding/decoding for Wan model. (#11414)
c8ef May 5, 2025
fc5e906
[train_text_to_image_sdxl]Add LANCZOS as default interpolation mode f…
ParagEkbote May 5, 2025
ec93239
[train_dreambooth_lora_sdxl] Add --image_interpolation_mode option fo…
MinJu-Ha May 5, 2025
ee1516e
[train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolat…
cjfghk5697 May 5, 2025
071807c
[training] feat: enable quantization for hidream lora training. (#11494)
sayakpaul May 5, 2025
9c29e93
Set LANCZOS as the default interpolation method for image resizing. (…
yijun-lee May 5, 2025
ed4efbd
Update training script for txt to img sdxl with lora supp with new in…
RogerSinghChugh May 5, 2025
1fa5639
Fix torchao docs typo for fp8 granular quantization (#11473)
a-r-r-o-w May 6, 2025
53f1043
Update setup.py to pin min version of `peft` (#11502)
sayakpaul May 6, 2025
d88ae1f
update dep table. (#11504)
sayakpaul May 6, 2025
10bee52
[LoRA] use `removeprefix` to preserve sanity. (#11493)
sayakpaul May 6, 2025
d7ffe60
Hunyuan Video Framepack (#11428)
a-r-r-o-w May 6, 2025
8c661ea
enable lora cases on XPU (#11506)
yao-matrix May 6, 2025
7937166
[lora_conversion] Enhance key handling for OneTrainer components in L…
iamwavecut May 6, 2025
fb29132
[docs] minor updates to bitsandbytes docs. (#11509)
sayakpaul May 6, 2025
7b90494
Cosmos (#10660)
a-r-r-o-w May 7, 2025
53bd367
clean up the __Init__ for stable_diffusion (#11500)
yiyixuxu May 7, 2025
87e508f
fix audioldm
sayakpaul May 8, 2025
c5c34a4
Revert "fix audioldm"
sayakpaul May 8, 2025
66e50d4
[LoRA] make lora alpha and dropout configurable (#11467)
linoytsaban May 8, 2025
784db0e
Add cross attention type for Sana-Sprint training in diffusers. (#11514)
scxue May 8, 2025
6674a51
Conditionally import torchvision in Cosmos transformer (#11524)
a-r-r-o-w May 8, 2025
393aefc
[tests] fix audioldm2 for transformers main. (#11522)
sayakpaul May 8, 2025
599c887
feat: pipeline-level quantization config (#11130)
sayakpaul May 9, 2025
7acf834
[Tests] Enable more general testing for `torch.compile()` with LoRA h…
sayakpaul May 9, 2025
0c47c95
[LoRA] support non-diffusers hidream loras (#11532)
sayakpaul May 9, 2025
2d38089
enable 7 cases on XPU (#11503)
yao-matrix May 9, 2025
3c0a012
[LTXPipeline] Update latents dtype to match VAE dtype (#11533)
james-p-xu May 9, 2025
d6bf268
enable dit integration cases on xpu (#11523)
yao-matrix May 9, 2025
0ba1f76
enable print_env on xpu (#11507)
yao-matrix May 9, 2025
92fe689
Change Framepack transformer layer initialization order (#11535)
a-r-r-o-w May 9, 2025
01abfc8
[tests] add tests for framepack transformer model. (#11520)
sayakpaul May 11, 2025
e48f6ae
Hunyuan Video Framepack F1 (#11534)
a-r-r-o-w May 12, 2025
c372615
enable several pipeline integration tests on XPU (#11526)
yao-matrix May 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[tests] add tests to check for graph breaks, recompilation, cuda sync…
…s in pipelines during torch.compile() (huggingface#11085)

* test for better torch.compile stuff.

* fixes

* recompilation and graph break.

* clear compilation cache.

* change to modeling level test.

* allow running compilation tests during nightlies.
  • Loading branch information
sayakpaul authored Apr 28, 2025
commit aa5f5d41d61c44167c9df1c3383b8f1aeb6ac34e
49 changes: 49 additions & 0 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,55 @@ jobs:
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_torch_compile_tests:
name: PyTorch Compile CUDA tests

runs-on:
group: aws-g4dn-2xlarge

container:
image: diffusers/diffusers-pytorch-compile-cuda
options: --gpus 0 --shm-size "16gb" --ipc host

steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test,training]
- name: Environment
run: |
python utils/print_env.py
- name: Run torch compile tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_compile_cuda_failures_short.txt

- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_compile_test_reports
path: reports

- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_big_gpu_torch_tests:
name: Torch tests on big GPU
strategy:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release_tests_fast.yml
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ jobs:
- name: Environment
run: |
python utils/print_env.py
- name: Run example tests on GPU
- name: Run torch compile tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
Expand Down
31 changes: 31 additions & 0 deletions tests/models/test_modeling_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1714,6 +1714,37 @@ def test_push_to_hub_library_name(self):
delete_repo(self.repo_id, token=TOKEN)


class TorchCompileTesterMixin:
def setUp(self):
# clean up the VRAM before each test
super().setUp()
torch._dynamo.reset()
gc.collect()
backend_empty_cache(torch_device)

def tearDown(self):
# clean up the VRAM after each test in case of CUDA runtime errors
super().tearDown()
torch._dynamo.reset()
gc.collect()
backend_empty_cache(torch_device)

@require_torch_gpu
@require_torch_2
@is_torch_compile
@slow
def test_torch_compile_recompilation_and_graph_break(self):
torch._dynamo.reset()
init_dict, inputs_dict = self.prepare_init_args_and_inputs_for_common()

model = self.model_class(**init_dict).to(torch_device)
model = torch.compile(model, fullgraph=True)

with torch._dynamo.config.patch(error_on_recompile=True), torch.no_grad():
_ = model(**inputs_dict)
_ = model(**inputs_dict)


@slow
@require_torch_2
@require_torch_accelerator
Expand Down
4 changes: 2 additions & 2 deletions tests/models/transformers/test_models_transformer_flux.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from diffusers.models.embeddings import ImageProjection
from diffusers.utils.testing_utils import enable_full_determinism, torch_device

from ..test_modeling_common import ModelTesterMixin
from ..test_modeling_common import ModelTesterMixin, TorchCompileTesterMixin


enable_full_determinism()
Expand Down Expand Up @@ -78,7 +78,7 @@ def create_flux_ip_adapter_state_dict(model):
return ip_state_dict


class FluxTransformerTests(ModelTesterMixin, unittest.TestCase):
class FluxTransformerTests(ModelTesterMixin, TorchCompileTesterMixin, unittest.TestCase):
model_class = FluxTransformer2DModel
main_input_name = "hidden_states"
# We override the items here because the transformer under consideration is small.
Expand Down
2 changes: 2 additions & 0 deletions tests/pipelines/test_pipelines_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1111,12 +1111,14 @@ def callback_cfg_params(self) -> frozenset:
def setUp(self):
# clean up the VRAM before each test
super().setUp()
torch._dynamo.reset()
gc.collect()
backend_empty_cache(torch_device)

def tearDown(self):
# clean up the VRAM after each test in case of CUDA runtime errors
super().tearDown()
torch._dynamo.reset()
gc.collect()
backend_empty_cache(torch_device)

Expand Down