[benchmarks] overhaul benchmarks #11565

sayakpaul · 2025-05-16T08:28:23Z

What does this PR do?

This PR considerably simplifies how we do benchmarks. Instead of using entire pipeline-level benchmarks across different tasks, we will now ONLY benchmark the diffusion network that is the most compute-intensive part in a standard diffusion workflow.

To make the estimates more realistic, we will make use of pre-trained checkpoints and dummy inputs with reasonable dimensionalities.

I ran benchmarking_flux.py on an 80GB A100 on a batch size of 1 and got the following results:

Analyze the results in this Space: https://huggingface.co/spaces/diffusers/benchmark-analyzer

By default, all benchmarks will use a batch size of 1, eliminating CFG.

How to add your benchmark?

Adding benchmarks for a new model class (SanaTransformer2DModel, for example) boils down to the following:

Define the dummy inputs of the model.
Define the benchmarking scenarios we should run the benchmark on.

This is what benchmarking_flux.py does. More modularization can be shipped afterward.

Idea would be to merge this PR with pre-configured benchmarks for a few popular models and open others to the community.

TODOs

Utilities:

To fire the execution of the individual model-level benchmarks sequentially.
To combine CSVs from multiple different model classes.
Central dataset update and Slack notification.

@DN6 could you give the approach a quick look? I can then work on resolving the TODOs.

sayakpaul · 2025-05-16T08:34:30Z

benchmarks/benchmarking_flux.py

+
+
+if __name__ == "__main__":
+    scenarios = [


Covered the following scenarios:

Regular BF16 with compilation

NF4

Layerwise upcasting

Group offloading

sayakpaul · 2025-05-20T07:08:38Z

Added SDXL, Wan (14B), and LTX (13B) on top of Flux:

Results

	scenario	model_cls	num_params_M	flops_M	time_plain_s	mem_plain_GB	time_compile_s	mem_compile_GB	fullgraph	mode
0	Wan-AI/Wan2.1-T2V-14B-Diffusers-bf16	WanTransformer3DModel	14288.5	7.85612e+08	10.797	31.17	8.974	31.77	1	default
1	Wan-AI/Wan2.1-T2V-14B-Diffusers-layerwise-upcasting	WanTransformer3DModel	14288.5	7.85612e+08	10.702	26.78	nan	nan	nan	nan
2	Wan-AI/Wan2.1-T2V-14B-Diffusers-group-offload-leaf	WanTransformer3DModel	14288.5	7.85612e+08	10.83	4.48	nan	nan	nan	nan
3	stabilityai/stable-diffusion-xl-base-1.0-bf16	UNet2DConditionModel	2567.46	5.9791e+06	0.085	5.05	0.058	5.39	1	default
4	stabilityai/stable-diffusion-xl-base-1.0-layerwise-upcasting	UNet2DConditionModel	2567.46	5.9791e+06	0.175	4.89	nan	nan	nan	nan
5	stabilityai/stable-diffusion-xl-base-1.0-group-offload-leaf	UNet2DConditionModel	2567.46	5.9791e+06	0.383	0.2	nan	nan	nan	nan
6	black-forest-labs/FLUX.1-dev-bf16	FluxTransformer2DModel	11901.4	5.95295e+07	0.535	22.61	0.388	22.85	1	default
7	black-forest-labs/FLUX.1-dev-bnb-nf4	FluxTransformer2DModel	5952.25	17263.8	0.574	6.7	nan	nan	nan	nan
8	black-forest-labs/FLUX.1-dev-layerwise-upcasting	FluxTransformer2DModel	11901.4	5.95295e+07	0.621	22.18	nan	nan	nan	nan
9	black-forest-labs/FLUX.1-dev-group-offload-leaf	FluxTransformer2DModel	11901.4	5.95295e+07	1.536	0.53	nan	nan	nan	nan
10	Lightricks/LTX-Video-0.9.7-dev-bf16	LTXVideoTransformer3DModel	13042.6	1.67583e+08	1.446	25.21	1.137	25.63	1	default
11	Lightricks/LTX-Video-0.9.7-dev-layerwise-upcasting	LTXVideoTransformer3DModel	13042.6	1.67583e+08	1.529	24.38	nan	nan	nan	nan
12	Lightricks/LTX-Video-0.9.7-dev-group-offload-leaf	LTXVideoTransformer3DModel	13042.6	1.67583e+08	1.917	1.04	nan	nan	nan	nan

sayakpaul · 2025-05-20T11:09:24Z

Cc: @a-r-r-o-w if you want to add some caching benchmarks (in a later PR), I think that would be really great!

sayakpaul · 2025-05-20T12:10:38Z

@DN6 this is ready for a review.

This is how the final CSV for this stage looks like:
https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/collated_results.csv

I have confirmed in this run that it works as expected:
https://github.com/huggingface/diffusers/actions/runs/15138495257/job/42570011907

sayakpaul · 2025-06-23T08:05:25Z

@DN6 a gentle ping.

DN6 · 2025-07-02T10:20:16Z

.github/workflows/benchmark.yml

@@ -3,25 +3,26 @@ name: Benchmarking tests
 on:
  workflow_dispatch:
  schedule:
-    - cron: "30 1 1,15 * *" # every 2 weeks on the 1st and the 15th of every month at 1:30 AM
+    - cron: "0 17 * * 1" # every monday at 5 PM.


Not a blocker. But why run every week? Is a monthly benchmark not sufficient?

True. Changing to bi-weekly.

sayakpaul · 2025-07-02T13:44:41Z

@anijain2305 just a ping to let you know that we're merging this PR which will run the benchmarking suite bi-weekly and report the results here: https://huggingface.co/datasets/diffusers/benchmarks/blob/main/collated_results.csv

anijain2305 · 2025-07-02T14:32:36Z

Thanks for setting this up. This will be really helpful for tracking progress and identifying regression

benchmarks/populate_into_db.py

sayakpaul · 2025-07-04T05:19:23Z

Since everything is passing now, will merge this PR :)
https://github.com/huggingface/diffusers/actions/runs/16065987231/job/45340516693

sayakpaul · 2025-07-04T05:34:53Z

Also cc @a-r-r-o-w for #11565 (comment) (not urgent, when you get time).

* start overhauling the benchmarking suite. * fixes * fixes * checking. * checking * fixes. * error handling and logging. * add flops and params. * add more models. * utility to fire execution of all benchmarking scripts. * utility to push to the hub. * push utility improvement * seems to be working. * okay * add torchprofile dep. * remove total gpu memory * fixes * fix * need a big gpu * better * what's happening. * okay * separate requirements and make it nightly. * add db population script. * update secret name * update secret. * population db update * disable db population for now. * change to every monday * Update .github/workflows/benchmark.yml Co-authored-by: Dhruv Nair <[email protected]> * quality improvements. * reparate hub upload step. * repository * remove csv * check * update * update * threading. * update * update * updaye * update * update * update * remove peft dep * upgrade runner. * fix * fixes * fix merging csvs. * push dataset to the Space repo for analysis. * warm up. * add a readme * Apply suggestions from code review Co-authored-by: Luc Georges <[email protected]> * address feedback * Apply suggestions from code review * disable db workflow. * update to bi weekly. * enable population * enable * updaye * update * metadata * fix --------- Co-authored-by: Dhruv Nair <[email protected]> Co-authored-by: Luc Georges <[email protected]>

sayakpaul added 8 commits May 15, 2025 18:05

start overhauling the benchmarking suite.

24a46cc

fixes

ab7f381

fixes

cc0a38a

checking.

169f831

checking

ad18983

fixes.

31e34d5

error handling and logging.

36afdea

Merge branch 'main' into benchmarking-overhaul

0d3af90

sayakpaul commented May 16, 2025

View reviewed changes

sayakpaul added 4 commits May 19, 2025 13:17

Merge branch 'main' into benchmarking-overhaul

fd85fbc

Merge branch 'main' into benchmarking-overhaul

a2c03a4

add flops and params.

4d83a47

add more models.

6815cae

sayakpaul added 5 commits May 20, 2025 15:21

utility to fire execution of all benchmarking scripts.

5635bf8

utility to push to the hub.

cfbd21e

push utility improvement

4ccfad0

seems to be working.

dff3144

okay

accd598

sayakpaul marked this pull request as ready for review May 20, 2025 11:08

sayakpaul changed the title ~~[WIP][benchmarks] overhaul benchmarks~~ [benchmarks] overhaul benchmarks May 20, 2025

sayakpaul added 4 commits May 20, 2025 16:41

add torchprofile dep.

41f79a0

remove total gpu memory

befdd9e

fixes

4784b8b

fix

c19dc5b

sayakpaul requested a review from DN6 May 20, 2025 12:09

sayakpaul added 2 commits May 20, 2025 17:44

need a big gpu

2da4fac

better

7367bb1

sayakpaul added 6 commits June 10, 2025 17:03

disable db workflow.

f9285fd

Merge branch 'main' into benchmarking-overhaul

9017a2c

Merge branch 'main' into benchmarking-overhaul

f9d4345

Merge branch 'main' into benchmarking-overhaul

5792608

Merge branch 'main' into benchmarking-overhaul

3ae040c

Merge branch 'main' into benchmarking-overhaul

d9950cd

sayakpaul added 2 commits June 28, 2025 10:56

Merge branch 'main' into benchmarking-overhaul

9e235e8

Merge branch 'main' into benchmarking-overhaul

aac27f0

DN6 approved these changes Jul 2, 2025

View reviewed changes

DN6 reviewed Jul 2, 2025

View reviewed changes

sayakpaul added 2 commits July 2, 2025 19:13

update to bi weekly.

6bf5b36

Merge branch 'main' into benchmarking-overhaul

736f22e

Merge branch 'main' into benchmarking-overhaul

18c4361

sayakpaul added 6 commits July 3, 2025 09:23

enable population

4bdb865

enable

ca81c6e

Merge branch 'main' into benchmarking-overhaul

1745e45

updaye

26775e5

update

64331b2

metadata

01bd03e

sayakpaul commented Jul 3, 2025

View reviewed changes

benchmarks/populate_into_db.py Outdated Show resolved Hide resolved

McPatate reviewed Jul 3, 2025

View reviewed changes

benchmarks/populate_into_db.py Outdated Show resolved Hide resolved

fix

a76a236

sayakpaul merged commit e6639fe into main Jul 4, 2025
12 checks passed

sayakpaul deleted the benchmarking-overhaul branch July 4, 2025 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmarks] overhaul benchmarks #11565

[benchmarks] overhaul benchmarks #11565

Uh oh!

sayakpaul commented May 16, 2025 •

edited

Loading

Uh oh!

sayakpaul May 16, 2025

Uh oh!

sayakpaul commented May 20, 2025 •

edited

Loading

Uh oh!

sayakpaul commented May 20, 2025

Uh oh!

sayakpaul commented May 20, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Jun 23, 2025

Uh oh!

DN6 Jul 2, 2025

Uh oh!

sayakpaul Jul 2, 2025

Uh oh!

sayakpaul commented Jul 2, 2025

Uh oh!

anijain2305 commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Jul 4, 2025

Uh oh!

Uh oh!

sayakpaul commented Jul 4, 2025

Uh oh!

Uh oh!

[benchmarks] overhaul benchmarks #11565

[benchmarks] overhaul benchmarks #11565

Uh oh!

Conversation

sayakpaul commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to add your benchmark?

TODOs

Uh oh!

sayakpaul May 16, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented May 20, 2025

Uh oh!

sayakpaul commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Jun 23, 2025

Uh oh!

DN6 Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jul 2, 2025

Uh oh!

anijain2305 commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Jul 4, 2025

Uh oh!

Uh oh!

sayakpaul commented Jul 4, 2025

Uh oh!

Uh oh!

sayakpaul commented May 16, 2025 •

edited

Loading

sayakpaul commented May 20, 2025 •

edited

Loading

sayakpaul commented May 20, 2025 •

edited

Loading