@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
1414
1515Starting from version `0.13.0`, Diffusers supports the latest optimization from the upcoming [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/) release. These include:
16161. Support for native flash and memory-efficient attention without any extra dependencies.
17- 2. [` torch.compile` ](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) support for compiling individual models for extra performance boost.
17+ 2. [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) support for compiling individual models for extra performance boost.
1818
1919
2020## Installation
@@ -107,6 +107,13 @@ ___The time reported is in seconds.___
107107| A100 | 32 (1) | OOM | 26.56 | 26.68 | 24.08 | 9.34 |
108108| A100 | 64(2) | | 52.51 | 53.03 | 47.81 | 8.95 |
109109| | | | | | | |
110+ | A10 | 4 | 13.94 | 9.81 | 10.01 | 9.35 | 4.69 |
111+ | A10 | 8 | 27.09 | 19 | 19.53 | 18.33 | 3.53 |
112+ | A10 | 10 | 33.69 | 23.53 | 24.19 | 22.52 | 4.29 |
113+ | A10 | 16 | OOM | 37.55 | 38.31 | 36.81 | 1.97 |
114+ | A10 | 32 (1) | | 77.19 | 78.43 | 76.64 | 0.71 |
115+ | A10 | 64 (1) | | 173.59 | 158.99 | 155.14 | 10.63 |
116+ | | | | | | | |
110117| T4 | 4 | 38.81 | 30.09 | 29.74 | 27.55 | 8.44 |
111118| T4 | 8 | OOM | 55.71 | 55.99 | 53.85 | 3.34 |
112119| T4 | 10 | OOM | 68.96 | 69.86 | 65.35 | 5.23 |
@@ -117,13 +124,6 @@ ___The time reported is in seconds.___
117124| V100 | 10 | OOM | 19.52 | 19.28 | 18.18 | 6.86 |
118125| V100 | 16 | OOM | 30.29 | 29.84 | 28.22 | 6.83 |
119126| | | | | | | |
120- | A10 | 4 | 13.94 | 9.81 | 10.01 | 9.35 | 4.69 |
121- | A10 | 8 | 27.09 | 19 | 19.53 | 18.33 | 3.53 |
122- | A10 | 10 | 33.69 | 23.53 | 24.19 | 22.52 | 4.29 |
123- | A10 | 16 | OOM | 37.55 | 38.31 | 36.81 | 1.97 |
124- | A10 | 32 (1) | | 77.19 | 78.43 | 76.64 | 0.71 |
125- | A10 | 64 (1) | | 173.59 | 158.99 | 155.14 | 10.63 |
126- | | | | | | | |
127127| 3090 | 4 | 10.04 | 7.82 | 7.89 | 7.47 | 4.48 |
128128| 3090 | 8 | 19.27 | 14.97 | 15.04 | 14.22 | 5.01 |
129129| 3090 | 10| 24.08 | 18.7 | 18.7 | 17.69 | 5.40 |
@@ -153,6 +153,13 @@ Using `torch.compile` with efficient attention gives up to 18% performance impro
153153| A100 | 32 | | 92.89 | 91.34 | 88.35 | 4.89 | |
154154| A100 | 64 | | 185.3 | 182.71 | 176.48 | 4.76 | |
155155| | | | | | | |
156+ | A10 | 1 | 10.59 | 8.81 | 7.51 | 7.35 | 16.57 | 30.59 |
157+ | A10 | 4 | 34.77 | 27.63 | 22.77 | 22.07 | 20.12 | 36.53 |
158+ | A10 | 8 | | 56.19 | 43.53 | 43.86 | 21.94 | |
159+ | A10 | 16 | | 116.49 | 88.56 | 86.64 | 25.62 | |
160+ | A10 | 32 | | 221.95 | 175.74 | 168.18 | 24.23 | |
161+ | A10 | 48 | | 333.23 | 264.84 | | 20.52 | |
162+ | | | | | | | |
156163| T4 | 1 | 28.2 | 24.49 | 23.93 | 23.56 | 3.80 | 16.45 |
157164| T4 | 2 | 52.77 | 45.7 | 45.88 | 45.06 | 1.40 | 14.61 |
158165| T4 | 4 | OOM | 85.72 | 85.78 | 84.48 | 1.45 | |
@@ -178,20 +185,13 @@ Using `torch.compile` with efficient attention gives up to 18% performance impro
178185| 3090 Ti | 32 (1) | | 142.55 | 124.44 | 120.74 | 15.30 | |
179186| 3090 Ti | 48 | | 213.19 | 186.55 | | 12.50 | |
180187| | | | | | | |
181- | 4090 | 1 | 5.54 | 4.99 | | | | |
182- | 4090 | 4 | 13.67 | 11.4 | | | | |
183- | 4090 | 8 (2) | | 19.79 | | | | |
184- | 4090 | 16 | | 38.62 | | | | |
185- | 4090 | 32 (1) | | 76.57 | | | | |
186- | 4090 | 48 | | 114.44 | | | 13.68 | |
187- | | | | | | | |
188- | A10 | 1 | 10.59 | 8.81 | 7.51 | 7.35 | 16.57 | 30.59 |
189- | A10 | 4 | 34.77 | 27.63 | 22.77 | 22.07 | 20.12 | 36.53 |
190- | A10 | 8 | | 56.19 | 43.53 | 43.86 | 21.94 | |
191- | A10 | 16 | | 116.49 | 88.56 | 86.64 | 25.62 | |
192- | A10 | 32 | | 221.95 | 175.74 | 168.18 | 24.23 | |
193- | A10 | 48 | | 333.23 | 264.84 | | 20.52 | |
194- | | | | | | | |
188+ | 4090 | 1 | 5.54 | 4.99 | 4.51 | | | |
189+ | 4090 | 4 | 13.67 | 11.4 | 10.3 | | | |
190+ | 4090 | 8 (2) | | 19.79 | 17.13 | | | |
191+ | 4090 | 16 | | 38.62 | | 33.14 | | |
192+ | 4090 | 32 (1) | | 76.57 | 65.96 | | | |
193+ | 4090 | 48 | | 114.44 | | 98.78 | | |
194+
195195
196196
197197(1) Batch Size >= 32 requires enable_vae_slicing() because of https://github.com/pytorch/pytorch/issues/81665
0 commit comments