Skip to content

Support dynamically loading/unloading loras with group offloading #11804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 27, 2025

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w commented Jun 25, 2025

Fixes #11791.

reproducer
import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
from diffusers.hooks import apply_group_offloading
from diffusers.utils.logging import set_verbosity_debug

set_verbosity_debug()


model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)
pipe.vae.enable_tiling()
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
pipe.load_lora_weights("a-r-r-o-w/HunyuanVideo-tuxemons")
num_inference_steps = 20

list(map(
    lambda x: apply_group_offloading(x, onload_device, offload_device, offload_type="leaf_level", use_stream=True),
    [pipe.transformer]
))
[module.to(onload_device) for module in (pipe.text_encoder, pipe.text_encoder_2, pipe.vae)]

output = pipe(
    prompt="Style of snomexut, a cat-like Tuxemon creature walks in alien-world grass, and observes its surroundings.",
    height=768,
    width=768,
    num_frames=33,
    num_inference_steps=num_inference_steps,
    generator=torch.Generator().manual_seed(73),
).frames[0]
export_to_video(output, "output.mp4", fps=15)

pipe.unload_lora_weights()

output_1 = pipe(
    prompt="Style of snomexut, a cat-like Tuxemon creature walks in alien-world grass, and observes its surroundings.",
    height=768,
    width=768,
    num_frames=33,
    num_inference_steps=num_inference_steps,
    generator=torch.Generator().manual_seed(73),
).frames[0]
export_to_video(output_1, "output2.mp4", fps=15)

pipe.load_lora_weights("a-r-r-o-w/HunyuanVideo-tuxemons")

output_2 = pipe(
    prompt="Style of snomexut, a cat-like Tuxemon creature walks in alien-world grass, and observes its surroundings.",
    height=768,
    width=768,
    num_frames=33,
    num_inference_steps=num_inference_steps,
    generator=torch.Generator().manual_seed(73),
).frames[0]
export_to_video(output_2, "output3.mp4", fps=15)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w a-r-r-o-w requested a review from DN6 June 25, 2025 08:01
@a-r-r-o-w
Copy link
Member Author

cc @zhangvia I think this should fix the issues you were facing. Could you test? Thanks 🤗

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just some minor comments.

And maybe we could include disabling and enabling group offloading in the existing _func_optionally_*() function. But not strong opinions.

@a-r-r-o-w a-r-r-o-w requested a review from sayakpaul June 25, 2025 08:48
@a-r-r-o-w
Copy link
Member Author

And maybe we could include disabling and enabling group offloading in the existing func_optionally*() function. But not strong opinions.

Yep, done!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

@@ -268,9 +288,12 @@ class GroupOffloadingHook(ModelHook):

_is_stateful = False

def __init__(self, group: ModuleGroup, next_group: Optional[ModuleGroup] = None) -> None:
def __init__(
self, group: ModuleGroup, next_group: Optional[ModuleGroup] = None, *, config: GroupOffloadingConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

else:
raise ValueError(f"Unsupported offload_type: {offload_type}")
assert False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be turned into a sensible value error like previous? 👁️

Copy link
Member Author

@a-r-r-o-w a-r-r-o-w Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement cannot be reached because we instantiate GroupOffloadingType with the value passed by user. Dataclass and enum will raise an error if the value is invalid

assert False is just a placeholder

@zhangvia
Copy link

cc @zhangvia I think this should fix the issues you were facing. Could you test? Thanks 🤗

Thanks for the quick fix! i've Confirmed it.

Copy link
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍🏽

@a-r-r-o-w a-r-r-o-w added the roadmap Add to current release roadmap label Jun 27, 2025
@a-r-r-o-w
Copy link
Member Author

A bit of a problem with the tests that weren't caught until now...

Group offloading with streams is limited in what it can do. If the same layer is invoked twice in the same parent layer's forward, the prefetching logic becomes completely incorrect. This is the case with:

  • CogView4 - invokes the same MLP on both hidden_states and encoder_hidden_states in succession
  • CogVideoX - invokes the same layernorm on both hidden_states and encoder_hidden_states in succession

One option to make it work is to concatenate inputs across sequence dim and then split again. This will, however, incur some extra perf cost because concatenation/split is not free.

Another option is creating a separate layer for each data stream and sharing the weights. I don't think this should incur memory overhead since the data reference will be same for both layers, but need to test to be sure.

Any other ideas are welcome. If these don't sound good, I propose we skip the tests for now and wait for group offloading logic to become more mature/stable. Other than than PR looks good to merge to me

@sayakpaul
Copy link
Member

Thanks for investigating these and also for proposing the potential solutions.

On the surface, I would say we evaluate both approaches and then decide. However, the two models you mentioned probably have limited usage at least with group offloading for now. So, xfailing them is the more reasonable approach. WDYT?

@a-r-r-o-w
Copy link
Member Author

I tested the first approach as it's a super simple change. The performance penalty is not noticeable end-to-end but only shows up at a small microsecond scale. I don't think it really matters because, like you mentioned, they probably have very limited usage in the context of group offloading.

For now, I'll specialize the tests for CogView4/CogVideoX by parameterizing with only non-stream tests instead of xfailing them, and add a note. Sounds good?

@sayakpaul
Copy link
Member

For now, I'll specialize the tests for CogView4/CogVideoX by parameterizing with only non-stream tests instead of xfailing them, and add a note. Sounds good?

I am good!

@a-r-r-o-w
Copy link
Member Author

Updated the tests. There are some gynamistics involved in skipping tests marked with parameterized because it seems like can't be overwritten or specialized in child classes

Comment on lines +132 to +137
@parameterized.expand([("block_level", True), ("leaf_level", False)])
@require_torch_accelerator
def test_group_offloading_inference_denoiser(self, offload_type, use_stream):
# TODO: We don't run the (leaf_level, True) test here that is enabled for other models.
# The reason for this can be found here: https://github.com/huggingface/diffusers/pull/11804#issuecomment-3013325338
super()._test_group_offloading_inference_denoiser(offload_type, use_stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this or we could detect if the test class is either of CogView4 or CogVideoX and use pytest.skip(). Upto you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer the base test class not having any information about the model test classes that derive it. Current implementation will work for any model that overrides the test, so also much cleaner

@a-r-r-o-w a-r-r-o-w merged commit 76ec3d1 into main Jun 27, 2025
32 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Diffusers Roadmap 0.35 Jun 27, 2025
@a-r-r-o-w a-r-r-o-w deleted the support-dynamic-lora-with-go branch June 27, 2025 17:50
tolgacangoz pushed a commit to tolgacangoz/diffusers that referenced this pull request Jul 5, 2025
…ggingface#11804)

* update

* add test

* address review comments

* update

* fixes

* change decorator order to fix tests

* try fix

* fight tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

can't load lora weights after apply group offloading
5 participants