-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[From Single File] support from_single_file
method for WanVACE3DTransformer
#11807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
add rename keys for `VACE`
Thank you for PR. Tested by uninstalling diffusers and installing
There seems to be a typo issue src/diffusers/loaders/single_file_model.py "WandVACETransformer3DModel": {
"checkpoint_mapping_fn": convert_wan_transformer_to_diffusers,
"default_subfolder": "transformer",
}, I think it should be WanVACETransformer3DModel instead of WandVACETransformer3DModel |
Sincere thanks to @nitinmukesh 🙇♂️
@nitinmukesh |
Thank you @J4BEZ . Please could you share the code you are using. I am getting error with mine.
I'm using GGUF from https://huggingface.co/samuelchristlie/Wan2.1-VACE-1.3B-GGUF model_id = "a-r-r-o-w/Wan-VACE-1.3B-diffusers"
transformer_path = f"https://huggingface.co/samuelchristlie/Wan2.1-VACE-1.3B-GGUF/blob/main/Wan2.1-VACE-1.3B-Q8_0.gguf"
transformer_gguf = WanVACETransformer3DModel.from_single_file(
transformer_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
config=model_id,
subfolder="transformer",
) |
@nitinmukesh import torch
from diffusers import AutoencoderKLWan, WanVACEPipeline, WanVACETransformer3DModel, GGUFQuantizationConfig, UniPCMultistepScheduler
from huggingface_hub import hf_hub_download
model_id = "Wan-AI/Wan2.1-VACE-14B-diffusers"
gguf_model_id = "QuantStack/Wan2.1_14B_VACE-GGUF"
gguf_model_name = "Wan2.1_14B_VACE-Q3_K_S.gguf"
FLOW_SHIFT = 5.0
gguf_path = hf_hub_download(gguf_model_id, gguf_model_name)
transformer = WanVACETransformer3DModel.from_single_file(
gguf_path,
quantization_config=GGUFQuantizationConfig(
compute_dtype=torch.bfloat16,
)
) Thanks to your feedback, I found that—unlike the 14B model—the 1.3B model includes additional layers ranging from vace_blocks.8 to vace_blocks.14. I will make the necessary adjustments shortly and follow up with you as soon as possible. |
Sincere thanks to @nitinmukesh again🙇♂️
Upon execution, I did not encounter any warnings such as
To investigate further, I compared the state dict keys between the original model and the GGUF-converted model. This may possibly be contributing to the issue. Below is the code I used for the key comparison: from gguf.gguf_reader import GGUFReader
from safetensors import safe_open
from huggingface_hub import hf_hub_download
# Load GGUF keys
gguf_file_path = hf_hub_download("samuelchristlie/Wan2.1-VACE-1.3B-GGUF", "Wan2.1-VACE-1.3B-Q8_0.gguf")
original_file_path = hf_hub_download("Wan-AI/Wan2.1-VACE-1.3B", "diffusion_pytorch_model.safetensors")
def read_gguf_file(gguf_file_path):
keys = set()
reader = GGUFReader(gguf_file_path)
for tensor in reader.tensors:
keys.add(tensor.name)
return keys
gguf_keys = read_gguf_file(gguf_file_path)
# Load original model keys
original_tensors = {}
with safe_open(original_file_path, framework="pt", device="cpu") as f:
for key in f.keys():
original_tensors[key] = f.get_tensor(key)
original_keys = set(original_tensors.keys())
# Check key differences
print(original_keys - gguf_keys) # {'vace_patch_embedding.weight'}
print(gguf_keys - original_keys) # set() I hope this information helps clarify the root of the issue. Sincerly yours, |
thanks for the PR! |
I guess support is needed from diffusers team. 14B GGUF is working but 1.3B is not. from typing import List
import torch
import PIL.Image
from diffusers import AutoencoderKLWan, WanVACEPipeline, WanVACETransformer3DModel
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image, load_video
from diffusers import GGUFQuantizationConfig
model_id = "a-r-r-o-w/Wan-VACE-1.3B-diffusers"
# transformer_path = f"https://huggingface.co/newgenai79/Wan-VACE-1.3B-diffusers-gguf/blob/main/Wan-VACE-1.3B-diffusers-Q8_0.gguf"
transformer_path = f"https://huggingface.co/samuelchristlie/Wan2.1-VACE-1.3B-GGUF/blob/main/Wan2.1-VACE-1.3B-Q8_0.gguf"
transformer_gguf = WanVACETransformer3DModel.from_single_file(
transformer_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
config=model_id,
subfolder="transformer",
)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanVACEPipeline.from_pretrained(
model_id,
transformer=transformer_gguf,
vae=vae,
torch_dtype=torch.bfloat16
)
flow_shift = 3.0 # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=832,
height=480,
num_frames=81,
num_inference_steps=30,
guidance_scale=5.0,
conditioning_scale=0.0,
generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(output, "output_GGUF1.mp4", fps=16)
|
and if I comment config and subfolder transformer_path = f"https://huggingface.co/samuelchristlie/Wan2.1-VACE-1.3B-GGUF/blob/main/Wan2.1-VACE-1.3B-Q8_0.gguf"
transformer_gguf = WanVACETransformer3DModel.from_single_file(
transformer_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
# config=model_id,
# subfolder="transformer",
)
|
@J4BEZ will take a look into the conversion issue |
Hmm @J4BEZ it does look like the issue loading the 1.3B checkpoint you linked is indeed due the missing key in the file. This version has the missing key, and loading works fine. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@bot /style |
Style bot fixed some files and pushed the changes. |
Thank you so much 🙇♂️ I'm especially thrilled to contribute to diffusers, which I consider a true work of art. I deeply appreciate your continued dedication to supporting the open-source ecosystem. Wishing everyone a peaceful and smooth end to the week. |
Great, when will it be merged? load GGUF |
cc @DN6 Iooks like it's ready to merge now? |
…ansformer` (huggingface#11807) * add `WandVACETransformer3DModel` in`SINGLE_FILE_LOADABLE_CLASSES` * add rename keys for `VACE` add rename keys for `VACE` * fix typo Sincere thanks to @nitinmukesh 🙇♂️ * support for `1.3B VACE` model Sincere thanks to @nitinmukesh again🙇♂️ * update * update * Apply style fixes --------- Co-authored-by: Dhruv Nair <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Trying the code above with the weight DN6 linked throws an error. The code: from typing import List
import torch
import PIL.Image
from diffusers import AutoencoderKLWan, WanVACEPipeline, WanVACETransformer3DModel
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image, load_video
from diffusers import GGUFQuantizationConfig
model_id = "a-r-r-o-w/Wan-VACE-1.3B-diffusers"
# transformer_path = f"https://huggingface.co/newgenai79/Wan-VACE-1.3B-diffusers-gguf/blob/main/Wan-VACE-1.3B-diffusers-Q8_0.gguf"
transformer_path = f"https://huggingface.co/calcuis/wan-gguf/blob/main/wan2.1-v4-vace-1.3b-q4_0.gguf"
transformer_gguf = WanVACETransformer3DModel.from_single_file(
transformer_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
config=model_id,
subfolder="transformer",
)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanVACEPipeline.from_pretrained(
model_id,
transformer=transformer_gguf,
vae=vae,
torch_dtype=torch.bfloat16
)
flow_shift = 3.0 # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=832,
height=480,
num_frames=81,
num_inference_steps=30,
guidance_scale=5.0,
conditioning_scale=0.0,
generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(output, "output_GGUF1.mp4", fps=16) The Error:
|
1.3b didn't worked for me either and I gave up. Not sure what is wrong. wan2.1-v4-vace-1.3b-q8_0.gguf is only ~2GB and would have been very helpful for low VRAM. |
@nitinmukesh Can you get 14b working on consumer hardware? |
I would not even try it provided I have only 8 GB + 16 GB. It should easily work for you as you have 24 GB VRAM https://huggingface.co/calcuis/wan-gguf/resolve/main/wan2.1-v2-vace-14b-q4_0.gguf?download=true |
Also logged the issue here |
Exactly same problem I face with 1.3B too. |
Can you try removing and see if it works transformer_gguf = WanVACETransformer3DModel.from_single_file( |
What does this PR do?
This PR would solve the problem #11630
First of all, I would like to sincerely thank the team for your continued hard work in making state-of-the-art generative models accessible to everyone.
While encountering the same issue as reported in #11630, I was able to find a solution. I’m submitting this pull request to share that fix with the community in the hope that it may help others facing the same problem.
best regards,
J4BEZ
Fixes # (issue)
#11630
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.