Skip to content

Commit 2c45a53

Browse files
authored
[docs] Shap-E guide (huggingface#4700)
* first draft * fixes * more fixes * fix toctree
1 parent 22ea35c commit 2c45a53

File tree

6 files changed

+210
-179
lines changed

6 files changed

+210
-179
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@
6666
title: Stable Diffusion XL
6767
- local: using-diffusers/controlnet
6868
title: ControlNet
69+
- local: using-diffusers/shap-e
70+
title: Shap-E
6971
- local: using-diffusers/diffedit
7072
title: DiffEdit
7173
- local: using-diffusers/distilled_sd

docs/source/en/api/pipelines/shap_e.md

Lines changed: 2 additions & 155 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ specific language governing permissions and limitations under the License.
99

1010
# Shap-E
1111

12-
The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://huggingface.co/papers/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai).
12+
The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://huggingface.co/papers/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai).
1313

1414
The abstract from the paper is:
1515

@@ -19,163 +19,10 @@ The original codebase can be found at [openai/shap-e](https://github.com/openai/
1919

2020
<Tip>
2121

22-
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
22+
See the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
2323

2424
</Tip>
2525

26-
## Usage Examples
27-
28-
In the following, we will walk you through some examples of how to use Shap-E pipelines to create 3D objects in gif format.
29-
30-
### Text-to-3D image generation
31-
32-
We can use [`ShapEPipeline`] to create 3D object based on a text prompt. In this example, we will make a birthday cupcake for :firecracker: diffusers library's 1 year birthday. The workflow to use the Shap-E text-to-image pipeline is same as how you would use other text-to-image pipelines in diffusers.
33-
34-
```python
35-
import torch
36-
37-
from diffusers import DiffusionPipeline
38-
39-
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
40-
41-
repo = "openai/shap-e"
42-
pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16)
43-
pipe = pipe.to(device)
44-
45-
guidance_scale = 15.0
46-
prompt = ["A firecracker", "A birthday cupcake"]
47-
48-
images = pipe(
49-
prompt,
50-
guidance_scale=guidance_scale,
51-
num_inference_steps=64,
52-
frame_size=256,
53-
).images
54-
```
55-
56-
The output of [`ShapEPipeline`] is a list of lists of images frames. Each list of frames can be used to create a 3D object. Let's use the `export_to_gif` utility function in diffusers to make a 3D cupcake!
57-
58-
```python
59-
from diffusers.utils import export_to_gif
60-
61-
export_to_gif(images[0], "firecracker_3d.gif")
62-
export_to_gif(images[1], "cake_3d.gif")
63-
```
64-
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/firecracker_out.gif)
65-
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/cake_out.gif)
66-
67-
68-
### Image-to-Image generation
69-
70-
You can use [`ShapEImg2ImgPipeline`] along with other text-to-image pipelines in diffusers and turn your 2D generation into 3D.
71-
72-
In this example, We will first genrate a cheeseburger with a simple prompt "A cheeseburger, white background"
73-
74-
```python
75-
from diffusers import DiffusionPipeline
76-
import torch
77-
78-
pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
79-
pipe_prior.to("cuda")
80-
81-
t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
82-
t2i_pipe.to("cuda")
83-
84-
prompt = "A cheeseburger, white background"
85-
86-
image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=1.0).to_tuple()
87-
image = t2i_pipe(
88-
prompt,
89-
image_embeds=image_embeds,
90-
negative_image_embeds=negative_image_embeds,
91-
).images[0]
92-
93-
image.save("burger.png")
94-
```
95-
96-
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png)
97-
98-
we will then use the Shap-E image-to-image pipeline to turn it into a 3D cheeseburger :)
99-
100-
```python
101-
from PIL import Image
102-
from diffusers.utils import export_to_gif
103-
104-
repo = "openai/shap-e-img2img"
105-
pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16)
106-
pipe = pipe.to("cuda")
107-
108-
guidance_scale = 3.0
109-
image = Image.open("burger.png").resize((256, 256))
110-
111-
images = pipe(
112-
image,
113-
guidance_scale=guidance_scale,
114-
num_inference_steps=64,
115-
frame_size=256,
116-
).images
117-
118-
gif_path = export_to_gif(images[0], "burger_3d.gif")
119-
```
120-
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_out.gif)
121-
122-
### Generate mesh
123-
124-
For both [`ShapEPipeline`] and [`ShapEImg2ImgPipeline`], you can generate mesh output by passing `output_type` as `mesh` to the pipeline, and then use the [`ShapEPipeline.export_to_ply`] utility function to save the output as a `ply` file. We also provide a [`ShapEPipeline.export_to_obj`] function that you can use to save mesh outputs as `obj` files.
125-
126-
```python
127-
import torch
128-
129-
from diffusers import DiffusionPipeline
130-
from diffusers.utils import export_to_ply
131-
132-
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
133-
134-
repo = "openai/shap-e"
135-
pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16, variant="fp16")
136-
pipe = pipe.to(device)
137-
138-
guidance_scale = 15.0
139-
prompt = "A birthday cupcake"
140-
141-
images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type="mesh").images
142-
143-
ply_path = export_to_ply(images[0], "3d_cake.ply")
144-
print(f"saved to folder: {ply_path}")
145-
```
146-
147-
Huggingface Datasets supports mesh visualization for mesh files in `glb` format. Below we will show you how to convert your mesh file into `glb` format so that you can use the Dataset viewer to render 3D objects.
148-
149-
We need to install `trimesh` library.
150-
151-
```
152-
pip install trimesh
153-
```
154-
155-
To convert the mesh file into `glb` format,
156-
157-
```python
158-
import trimesh
159-
160-
mesh = trimesh.load("3d_cake.ply")
161-
mesh.export("3d_cake.glb", file_type="glb")
162-
```
163-
164-
By default, the mesh output of Shap-E is from the bottom viewpoint; you can change the default viewpoint by applying a rotation transformation
165-
166-
```python
167-
import trimesh
168-
import numpy as np
169-
170-
mesh = trimesh.load("3d_cake.ply")
171-
rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
172-
mesh = mesh.apply_transform(rot)
173-
mesh.export("3d_cake.glb", file_type="glb")
174-
```
175-
176-
Now you can upload your mesh file to your dataset and visualize it! Here is the link to the 3D cake we just generated
177-
https://huggingface.co/datasets/hf-internal-testing/diffusers-images/blob/main/shap_e/3d_cake.glb
178-
17926
## ShapEPipeline
18027
[[autodoc]] ShapEPipeline
18128
- all

docs/source/en/api/utilities.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ Utility and helper functions for working with 🤗 Diffusers.
1818

1919
[[autodoc]] utils.testing_utils.load_image
2020

21+
## export_to_gif
22+
23+
[[autodoc]] utils.testing_utils.export_to_gif
24+
2125
## export_to_video
2226

2327
[[autodoc]] utils.testing_utils.export_to_video
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Shap-E
2+
3+
[[open-in-colab]]
4+
5+
Shap-E is a conditional model for generating 3D assets which could be used for video game development, interior design, and architecture. It is trained on a large dataset of 3D assets, and post-processed to render more views of each object and produce 16K instead of 4K point clouds. The Shap-E model is trained in two steps:
6+
7+
1. a encoder accepts the point clouds and rendered views of a 3D asset and outputs the parameters of implicit functions that represent the asset
8+
2. a diffusion model is trained on the latents produced by the encoder to generate either neural radiance fields (NeRFs) or a textured 3D mesh, making it easier to render and use the 3D asset in downstream applications
9+
10+
This guide will show you how to use Shap-E to start generating your own 3D assets!
11+
12+
Before you begin, make sure you have the following libraries installed:
13+
14+
```py
15+
# uncomment to install the necessary libraries in Colab
16+
#!pip install diffusers transformers accelerate safetensors trimesh
17+
```
18+
19+
## Text-to-3D
20+
21+
To generate a gif of a 3D object, pass a text prompt to the [`ShapEPipeline`]. The pipeline generates a list of image frames which are used to create the 3D object.
22+
23+
```py
24+
import torch
25+
from diffusers import ShapEPipeline
26+
27+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
28+
29+
pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
30+
pipe = pipe.to(device)
31+
32+
guidance_scale = 15.0
33+
prompt = ["A firecracker", "A birthday cupcake"]
34+
35+
images = pipe(
36+
prompt,
37+
guidance_scale=guidance_scale,
38+
num_inference_steps=64,
39+
frame_size=256,
40+
).images
41+
```
42+
43+
Now use the [`~utils.export_to_gif`] function to turn the list of image frames into a gif of the 3D object.
44+
45+
```py
46+
from diffusers.utils import export_to_gif
47+
48+
export_to_gif(images[0], "firecracker_3d.gif")
49+
export_to_gif(images[1], "cake_3d.gif")
50+
```
51+
52+
<div class="flex gap-4">
53+
<div>
54+
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/firecracker_out.gif"/>
55+
<figcaption class="mt-2 text-center text-sm text-gray-500">firecracker</figcaption>
56+
</div>
57+
<div>
58+
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/cake_out.gif"/>
59+
<figcaption class="mt-2 text-center text-sm text-gray-500">cupcake</figcaption>
60+
</div>
61+
</div>
62+
63+
## Image-to-3D
64+
65+
To generate a 3D object from another image, use the [`ShapEImg2ImgPipeline`]. You can use an existing image or generate an entirely new one. Let's use the the [Kandinsky 2.1](../api/pipelines/kandinsky) model to generate a new image.
66+
67+
```py
68+
from diffusers import DiffusionPipeline
69+
import torch
70+
71+
prior_pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
72+
pipeline = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
73+
74+
prompt = "A cheeseburger, white background"
75+
76+
image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()
77+
image = pipeline(
78+
prompt,
79+
image_embeds=image_embeds,
80+
negative_image_embeds=negative_image_embeds,
81+
).images[0]
82+
83+
image.save("burger.png")
84+
```
85+
86+
Pass the cheeseburger to the [`ShapEImg2ImgPipeline`] to generate a 3D representation of it.
87+
88+
```py
89+
from PIL import Image
90+
from diffusers.utils import export_to_gif
91+
92+
pipe = ShapEImg2ImgPipeline.from_pretrained("openai/shap-e-img2img", torch_dtype=torch.float16, variant="fp16").to("cuda")
93+
94+
guidance_scale = 3.0
95+
image = Image.open("burger.png").resize((256, 256))
96+
97+
images = pipe(
98+
image,
99+
guidance_scale=guidance_scale,
100+
num_inference_steps=64,
101+
frame_size=256,
102+
).images
103+
104+
gif_path = export_to_gif(images[0], "burger_3d.gif")
105+
```
106+
107+
<div class="flex gap-4">
108+
<div>
109+
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"/>
110+
<figcaption class="mt-2 text-center text-sm text-gray-500">cheeseburger</figcaption>
111+
</div>
112+
<div>
113+
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_out.gif"/>
114+
<figcaption class="mt-2 text-center text-sm text-gray-500">3D cheeseburger</figcaption>
115+
</div>
116+
</div>
117+
118+
## Generate mesh
119+
120+
Shap-E is a flexible model that can also generate textured mesh outputs to be rendered for downstream applications. In this example, you'll convert the output into a `glb` file because the 🤗 Datasets library supports mesh visualization of `glb` files which can be rendered by the [Dataset viewer](https://huggingface.co/docs/hub/datasets-viewer#dataset-preview).
121+
122+
You can generate mesh outputs for both the [`ShapEPipeline`] and [`ShapEImg2ImgPipeline`] by specifying the `output_type` parameter as `"mesh"`:
123+
124+
```py
125+
import torch
126+
from diffusers import ShapEPipeline
127+
128+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
129+
130+
pipe = ShapEPipeline.from_pretrained("openai/shap-e", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
131+
pipe = pipe.to(device)
132+
133+
guidance_scale = 15.0
134+
prompt = "A birthday cupcake"
135+
136+
images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type="mesh").images
137+
```
138+
139+
Use the [`~utils.export_to_ply`] function to save the mesh output as a `ply` file:
140+
141+
<Tip>
142+
143+
You can optionally save the mesh output as an `obj` file with the [`~utils.export_to_obj`] function. The ability to save the mesh output in a variety of formats makes it more flexible for downstream usage!
144+
145+
</Tip>
146+
147+
```py
148+
from diffusers.utils import export_to_ply
149+
150+
ply_path = export_to_ply(images[0], "3d_cake.ply")
151+
print(f"saved to folder: {ply_path}")
152+
```
153+
154+
Then you can convert the `ply` file to a `glb` file with the trimesh library:
155+
156+
```py
157+
import trimesh
158+
159+
mesh = trimesh.load("3d_cake.ply")
160+
mesh.export("3d_cake.glb", file_type="glb")
161+
```
162+
163+
By default, the mesh output is focused from the bottom viewpoint but you can change the default viewpoint by applying a rotation transform:
164+
165+
```py
166+
import trimesh
167+
import numpy as np
168+
169+
mesh = trimesh.load("3d_cake.ply")
170+
rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])
171+
mesh = mesh.apply_transform(rot)
172+
mesh.export("3d_cake.glb", file_type="glb")
173+
```
174+
175+
Upload the mesh file to your dataset repository to visualize it with the Dataset viewer!
176+
177+
<div class="flex justify-center">
178+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/3D-cake.gif"/>
179+
</div>

0 commit comments

Comments
 (0)