Skip to content

Commit 48d0123

Browse files
add AudioDiffusionPipeline and LatentAudioDiffusionPipeline huggingface#1334 (huggingface#1426)
* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline * add docs to toc * fix tests * fix tests * fix tests * fix tests * fix tests * Update pr_tests.yml Fix tests * parent 499ff34 author teticio <[email protected]> 1668765652 +0000 committer teticio <[email protected]> 1669041721 +0000 parent 499ff34 author teticio <[email protected]> 1668765652 +0000 committer teticio <[email protected]> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (huggingface#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (huggingface#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (huggingface#1322) * [Examples] Correct path * uP Avoid nested fix-copies (huggingface#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (huggingface#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <[email protected]> Fix the order of casts for onnx inpainting (huggingface#1338) Legacy Inpainting Pipeline for Onnx Models (huggingface#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <[email protected]> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <[email protected]> Jax infer support negative prompt (huggingface#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <[email protected]> Update README.md: Minor change to Imagic code snippet, missing dir error (huggingface#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (huggingface#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (huggingface#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <[email protected]> * move Mel to module in pipeline construction, make librosa optional * fix imports * fix copy & paste error in comment * fix style * add missing register_to_config * fix class docstrings * fix class docstrings * tweak docstrings * tweak docstrings * update slow test * put trailing commas back * respect alphabetical order * remove LatentAudioDiffusion, make vqvae optional * move Mel from models back to pipelines :-) * allow loading of pretrained audiodiffusion models * fix tests * fix dummies * remove reference to latent_audio_diffusion in docs * unused import * inherit from SchedulerMixin to make loadable * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]>
1 parent 459b8ca commit 48d0123

File tree

25 files changed

+781
-5
lines changed

25 files changed

+781
-5
lines changed

.github/workflows/pr_tests.yml

+1
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ jobs:
5757

5858
- name: Install dependencies
5959
run: |
60+
apt-get update && apt-get install libsndfile1-dev -y
6061
python -m pip install -e .[quality,test]
6162
python -m pip install git+https://github.com/huggingface/accelerate
6263
python -m pip install -U git+https://github.com/huggingface/transformers

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -165,4 +165,4 @@ tags
165165
# DS_Store (MacOS)
166166
.DS_Store
167167
# RL pipelines may produce mp4 outputs
168-
*.mp4
168+
*.mp4

docker/diffusers-flax-cpu/Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3334
datasets \
3435
hf-doc-builder \
3536
huggingface-hub \
37+
librosa \
3638
modelcards \
3739
numpy \
3840
scipy \

docker/diffusers-flax-tpu/Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -35,6 +36,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3536
datasets \
3637
hf-doc-builder \
3738
huggingface-hub \
39+
librosa \
3840
modelcards \
3941
numpy \
4042
scipy \

docker/diffusers-onnxruntime-cpu/Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3334
datasets \
3435
hf-doc-builder \
3536
huggingface-hub \
37+
librosa \
3638
modelcards \
3739
numpy \
3840
scipy \

docker/diffusers-onnxruntime-cuda/Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3334
datasets \
3435
hf-doc-builder \
3536
huggingface-hub \
37+
librosa \
3638
modelcards \
3739
numpy \
3840
scipy \

docker/diffusers-pytorch-cpu/Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -32,6 +33,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3233
datasets \
3334
hf-doc-builder \
3435
huggingface-hub \
36+
librosa \
3537
modelcards \
3638
numpy \
3739
scipy \

docker/diffusers-pytorch-cuda/Dockerfile

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -32,6 +33,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3233
datasets \
3334
hf-doc-builder \
3435
huggingface-hub \
36+
librosa \
3537
modelcards \
3638
numpy \
3739
scipy \

docs/source/_toctree.yml

+2
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,8 @@
122122
title: "VQ Diffusion"
123123
- local: api/pipelines/repaint
124124
title: "RePaint"
125+
- local: api/pipelines/audio_diffusion
126+
title: "Audio Diffusion"
125127
title: "Pipelines"
126128
- sections:
127129
- local: api/experimental/rl
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Audio Diffusion
14+
15+
## Overview
16+
17+
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
18+
19+
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
20+
and from mel spectrogram images.
21+
22+
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
23+
training scripts and example notebooks.
24+
25+
## Available Pipelines:
26+
27+
| Pipeline | Tasks | Colab
28+
|---|---|:---:|
29+
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
30+
31+
32+
## Examples:
33+
34+
### Audio Diffusion
35+
36+
```python
37+
import torch
38+
from IPython.display import Audio
39+
from diffusers import DiffusionPipeline
40+
41+
device = "cuda" if torch.cuda.is_available() else "cpu"
42+
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
43+
44+
output = pipe()
45+
display(output.images[0])
46+
display(Audio(output.audios[0], rate=mel.get_sample_rate()))
47+
```
48+
49+
### Latent Audio Diffusion
50+
51+
```python
52+
import torch
53+
from IPython.display import Audio
54+
from diffusers import DiffusionPipeline
55+
56+
device = "cuda" if torch.cuda.is_available() else "cpu"
57+
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
58+
59+
output = pipe()
60+
display(output.images[0])
61+
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
62+
```
63+
64+
### Audio Diffusion with DDIM (faster)
65+
66+
```python
67+
import torch
68+
from IPython.display import Audio
69+
from diffusers import DiffusionPipeline
70+
71+
device = "cuda" if torch.cuda.is_available() else "cpu"
72+
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
73+
74+
output = pipe()
75+
display(output.images[0])
76+
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
77+
```
78+
79+
### Variations, in-painting, out-painting etc.
80+
81+
```python
82+
output = pipe(
83+
raw_audio=output.audios[0, 0],
84+
start_step=int(pipe.get_default_steps() / 2),
85+
mask_start_secs=1,
86+
mask_end_secs=1,
87+
)
88+
display(output.images[0])
89+
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
90+
```
91+
92+
## AudioDiffusionPipeline
93+
[[autodoc]] AudioDiffusionPipeline
94+
- __call__
95+
- encode
96+
- slerp
97+
98+
99+
## Mel
100+
[[autodoc]] Mel
101+
- audio_slice_to_image
102+
- image_to_audio

docs/source/api/pipelines/overview.mdx

+1
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ available a colab notebook to directly try them out.
4545
| Pipeline | Paper | Tasks | Colab
4646
|---|---|:---:|:---:|
4747
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -
48+
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio_diffusion.git) | Unconditional Audio Generation |
4849
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
4950
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
5051
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |

docs/source/index.mdx

+1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ available a colab notebook to directly try them out.
3535
| Pipeline | Paper | Tasks | Colab
3636
|---|---|:---:|:---:|
3737
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
38+
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb)
3839
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
3940
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
4041
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |

docs/source/using-diffusers/audio.mdx

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ specific language governing permissions and limitations under the License.
1212

1313
# Using Diffusers for audio
1414

15-
The [`DanceDiffusionPipeline`] can be used to generate audio rapidly!
16-
More coming soon!
15+
[`DanceDiffusionPipeline`] and [`AudioDiffusionPipeline`] can be used to generate
16+
audio rapidly! More coming soon!

setup.py

+2
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@
9191
"isort>=5.5.4",
9292
"jax>=0.2.8,!=0.3.2",
9393
"jaxlib>=0.1.65",
94+
"librosa",
9495
"modelcards>=0.1.4",
9596
"numpy",
9697
"parameterized",
@@ -181,6 +182,7 @@ def run(self):
181182
extras["training"] = deps_list("accelerate", "datasets", "tensorboard", "modelcards")
182183
extras["test"] = deps_list(
183184
"datasets",
185+
"librosa",
184186
"parameterized",
185187
"pytest",
186188
"pytest-timeout",

src/diffusers/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,14 @@
3030
)
3131
from .pipeline_utils import DiffusionPipeline
3232
from .pipelines import (
33+
AudioDiffusionPipeline,
3334
DanceDiffusionPipeline,
3435
DDIMPipeline,
3536
DDPMPipeline,
3637
KarrasVePipeline,
3738
LDMPipeline,
3839
LDMSuperResolutionPipeline,
40+
Mel,
3941
PNDMPipeline,
4042
RePaintPipeline,
4143
ScoreSdeVePipeline,

src/diffusers/dependency_versions_table.py

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
"isort": "isort>=5.5.4",
1616
"jax": "jax>=0.2.8,!=0.3.2",
1717
"jaxlib": "jaxlib>=0.1.65",
18+
"librosa": "librosa",
1819
"modelcards": "modelcards>=0.1.4",
1920
"numpy": "numpy",
2021
"parameterized": "parameterized",

src/diffusers/pipelines/__init__.py

+12-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
1-
from ..utils import is_flax_available, is_onnx_available, is_torch_available, is_transformers_available
1+
from ..utils import (
2+
is_flax_available,
3+
is_librosa_available,
4+
is_onnx_available,
5+
is_torch_available,
6+
is_transformers_available,
7+
)
28

39

410
if is_torch_available():
@@ -14,6 +20,11 @@
1420
else:
1521
from ..utils.dummy_pt_objects import * # noqa F403
1622

23+
if is_torch_available() and is_librosa_available():
24+
from .audio_diffusion import AudioDiffusionPipeline, Mel
25+
else:
26+
from ..utils.dummy_torch_and_librosa_objects import AudioDiffusionPipeline, Mel # noqa F403
27+
1728
if is_torch_available() and is_transformers_available():
1829
from .alt_diffusion import AltDiffusionImg2ImgPipeline, AltDiffusionPipeline
1930
from .latent_diffusion import LDMTextToImagePipeline
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# flake8: noqa
2+
from .mel import Mel
3+
from .pipeline_audio_diffusion import AudioDiffusionPipeline

0 commit comments

Comments
 (0)