Skip to content

Commit 80e78d2

Browse files
authored
[docs] Custom community components (huggingface#5732)
* fixes * feedback
1 parent 4d3b4e0 commit 80e78d2

File tree

1 file changed

+22
-20
lines changed

1 file changed

+22
-20
lines changed

docs/source/en/using-diffusers/custom_pipeline_overview.md

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -60,15 +60,13 @@ For more information about community pipelines, take a look at the [Community pi
6060

6161
## Community components
6262

63-
If your pipeline has custom components that Diffusers doesn't support already, you need to accompany the Python modules that implement them. These customized components could be VAE, UNet, scheduler, etc. For the text encoder, we rely on `transformers` anyway. So, that should be handled separately (more info here). The pipeline code itself can be customized as well.
63+
Community components allow users to build pipelines that may have customized components that are not a part of Diffusers. If your pipeline has custom components that Diffusers doesn't already support, you need to provide their implementations as Python modules. These customized components could be a VAE, UNet, and scheduler. In most cases, the text encoder is imported from the Transformers library. The pipeline code itself can also be customized.
6464

65-
Community components allow users to build pipelines that may have customized components that are not part of Diffusers. This section shows how users should use community components to build a community pipeline.
65+
This section shows how users should use community components to build a community pipeline.
6666

67-
You'll use the [showlab/show-1-base](https://huggingface.co/showlab/show-1-base) pipeline checkpoint as an example here. Here, you have a custom UNet and a customized pipeline (`TextToVideoIFPipeline`). For convenience, let's call the UNet `ShowOneUNet3DConditionModel`.
67+
You'll use the [showlab/show-1-base](https://huggingface.co/showlab/show-1-base) pipeline checkpoint as an example. So, let's start loading the components:
6868

69-
"showlab/show-1-base" already provides the checkpoints in the Diffusers format, which is a great starting point. So, let's start loading up the components which are already well-supported:
70-
71-
1. **Text encoder**
69+
1. Import and load the text encoder from Transformers:
7270

7371
```python
7472
from transformers import T5Tokenizer, T5EncoderModel
@@ -78,35 +76,41 @@ tokenizer = T5Tokenizer.from_pretrained(pipe_id, subfolder="tokenizer")
7876
text_encoder = T5EncoderModel.from_pretrained(pipe_id, subfolder="text_encoder")
7977
```
8078

81-
2. **Scheduler**
79+
2. Load a scheduler:
8280

8381
```python
8482
from diffusers import DPMSolverMultistepScheduler
8583

8684
scheduler = DPMSolverMultistepScheduler.from_pretrained(pipe_id, subfolder="scheduler")
8785
```
8886

89-
3. **Image processor**
87+
3. Load an image processor:
9088

9189
```python
9290
from transformers import CLIPFeatureExtractor
9391

9492
feature_extractor = CLIPFeatureExtractor.from_pretrained(pipe_id, subfolder="feature_extractor")
9593
```
9694

97-
Now, you need to implement the custom UNet. The implementation is available [here](https://github.com/showlab/Show-1/blob/main/showone/models/unet_3d_condition.py). So, let's create a Python script called `showone_unet_3d_condition.py` and copy over the implementation, changing the `UNet3DConditionModel` classname to `ShowOneUNet3DConditionModel` to avoid any conflicts with Diffusers. This is because Diffusers already has one `UNet3DConditionModel`. We put all the components needed to implement the class in `showone_unet_3d_condition.py` only. You can find the entire file [here](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py).
95+
<Tip warning={true}>
96+
97+
In steps 4 and 5, the custom [UNet](https://github.com/showlab/Show-1/blob/main/showone/models/unet_3d_condition.py) and [pipeline](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py) implementation must match the format shown in their files for this example to work.
98+
99+
</Tip>
100+
101+
4. Now you'll load a [custom UNet](https://github.com/showlab/Show-1/blob/main/showone/models/unet_3d_condition.py), which in this example, has already been implemented in the `showone_unet_3d_condition.py` [script](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py) for your convenience. You'll notice the `UNet3DConditionModel` class name is changed to `ShowOneUNet3DConditionModel` because [`UNet3DConditionModel`] already exists in Diffusers. Any components needed for the `ShowOneUNet3DConditionModel` class should be placed in the `showone_unet_3d_condition.py` script.
98102

99-
Once this is done, we can initialize the UNet:
103+
Once this is done, you can initialize the UNet:
100104

101105
```python
102106
from showone_unet_3d_condition import ShowOneUNet3DConditionModel
103107

104108
unet = ShowOneUNet3DConditionModel.from_pretrained(pipe_id, subfolder="unet")
105109
```
106110

107-
Then implement the custom `TextToVideoIFPipeline` in another Python script: `pipeline_t2v_base_pixel.py`. This is already available [here](https://github.com/showlab/Show-1/blob/main/showone/pipelines/pipeline_t2v_base_pixel.py).
111+
5. Finally, you'll load the custom pipeline code. For this example, it has already been created for you in the `pipeline_t2v_base_pixel.py` [script](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/pipeline_t2v_base_pixel.py). This script contains a custom `TextToVideoIFPipeline` class for generating videos from text. Just like the custom UNet, any code needed for the custom pipeline to work should go in the `pipeline_t2v_base_pixel.py` script.
108112

109-
Now that you have all the components, initialize the `TextToVideoIFPipeline`:
113+
Once everything is in place, you can initialize the `TextToVideoIFPipeline` with the `ShowOneUNet3DConditionModel`:
110114

111115
```python
112116
from pipeline_t2v_base_pixel import TextToVideoIFPipeline
@@ -123,19 +127,19 @@ pipeline = pipeline.to(device="cuda")
123127
pipeline.torch_dtype = torch.float16
124128
```
125129

126-
Push to the pipeline to the Hub to share with the community:
130+
Push the pipeline to the Hub to share with the community!
127131

128132
```python
129133
pipeline.push_to_hub("custom-t2v-pipeline")
130134
```
131135

132136
After the pipeline is successfully pushed, you need a couple of changes:
133137

134-
1. In `model_index.json` file, change the `_class_name` attribute. It should be like [so](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/model_index.json#L2).
135-
2. Upload `showone_unet_3d_condition.py` to the `unet` directory ([example](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py)).
136-
3. Upload `pipeline_t2v_base_pixel.py` to the pipeline base directory ([example](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py)).
138+
1. Change the `_class_name` attribute in [`model_index.json`](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/model_index.json#L2) to `"pipeline_t2v_base_pixel"` and `"TextToVideoIFPipeline"`.
139+
2. Upload `showone_unet_3d_condition.py` to the `unet` [directory](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py).
140+
3. Upload `pipeline_t2v_base_pixel.py` to the pipeline base [directory](https://huggingface.co/sayakpaul/show-1-base-with-code/blob/main/unet/showone_unet_3d_condition.py).
137141

138-
To run inference, just do:
142+
To run inference, simply add the `trust_remote_code` argument while initializing the pipeline to handle all the "magic" behind the scenes.
139143

140144
```python
141145
from diffusers import DiffusionPipeline
@@ -161,6 +165,4 @@ video_frames = pipeline(
161165
guidance_scale=9.0,
162166
output_type="pt"
163167
).frames
164-
```
165-
166-
Here, notice the use of the `trust_remote_code` argument while initializing the pipeline. It is responsible for handling all the "magic" behind the scenes.
168+
```

0 commit comments

Comments
 (0)