Skip to content

Add MAGI-1: Autoregressive Video Generation at Scale #11713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 67 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
dabb12f
first draft
tolgacangoz Jun 13, 2025
89806ea
style
tolgacangoz Jun 13, 2025
f4b5748
upp
tolgacangoz Jun 13, 2025
9b45317
style
tolgacangoz Jun 13, 2025
8e5881b
Merge branch 'main' into add-magi-1
tolgacangoz Jun 14, 2025
03d50e2
2nd draft
tolgacangoz Jun 14, 2025
08287a9
2nd draft
tolgacangoz Jun 14, 2025
8784881
up
tolgacangoz Jun 14, 2025
ae03b7d
Refactor Magi1AttentionBlock to support rotary embeddings and integra…
tolgacangoz Jun 15, 2025
2a2df39
Enhance rotary positional embeddings with new parameters for grid cen…
tolgacangoz Jun 24, 2025
61e7cb0
Merge branch 'main' into add-magi-1
tolgacangoz Jun 24, 2025
9f63582
Refactor Magi1 VAE to align with DiT architecture
tolgacangoz Jun 24, 2025
743bd44
Refactor: Remove custom caching mechanism from Magi1 VAE
tolgacangoz Jun 25, 2025
0f09f74
Refactor Magi1 VAE decoder logic
tolgacangoz Jun 25, 2025
d3df80a
Refactor Magi1 VAE decoder to a patch-based architecture
tolgacangoz Jun 26, 2025
3fcd4c3
Refactor Magi1 VAE block implementation
tolgacangoz Jun 26, 2025
1301c9e
Refactor Magi1 VAE blocks to use standard attention processor
tolgacangoz Jun 26, 2025
16218e8
Refactor: Simplify Magi1 VAE decoder architecture
tolgacangoz Jun 26, 2025
1537b5b
Refactor: Convert Magi1 encoder to a Vision Transformer architecture
tolgacangoz Jun 26, 2025
7603067
Refactor: Simplify and streamline Magi1 VAE architecture
tolgacangoz Jun 27, 2025
1898e19
style
tolgacangoz Jun 27, 2025
6e6ba3e
Refactor Magi1 VAE configuration and parameters
tolgacangoz Jun 27, 2025
9a4b252
Refactor Magi1 VAE to remove quantization steps
tolgacangoz Jun 27, 2025
499111d
Refactor MAGI1 VAE conversion for ViT architecture
tolgacangoz Jun 27, 2025
d5f5594
Rename `AutoencoderKLMagi` to `AutoencoderKLMagi1`
tolgacangoz Jun 27, 2025
0cb50c9
Refactor: Rename Magi to Magi1
tolgacangoz Jun 27, 2025
af5b575
style
tolgacangoz Jun 27, 2025
7a4af97
Merge branch 'main' into add-magi-1
tolgacangoz Jun 27, 2025
b5e140b
Refactor: Update references from `MagiPipeline` to `Magi1Pipeline` ac…
tolgacangoz Jun 27, 2025
eead329
Enhance Magi-1 checkpoint loading robustness
tolgacangoz Jun 28, 2025
1643342
Fixes tensor shape in MAGI-1 attention processor
tolgacangoz Jun 28, 2025
14dff1f
Refactor: Simplify VAE checkpoint conversion and integrate hf_hub_dow…
tolgacangoz Jun 28, 2025
069f510
Refactor: Remove convert_magi_checkpoint function and streamline VAE …
tolgacangoz Jun 28, 2025
85729e0
Add Magi1 models and pipelines to the module initialization
tolgacangoz Jun 28, 2025
9389d0b
Fix: Update Magi pipeline names to include versioning
tolgacangoz Jun 28, 2025
657f569
Refactor: Rename references to autoencoder_kl_magi to autoencoder_kl_…
tolgacangoz Jun 28, 2025
b12796b
renaming
tolgacangoz Jun 28, 2025
dedea6f
Refactor: Update references to Magi pipelines and classes to include …
tolgacangoz Jun 28, 2025
fc99d53
Refactor: Comment out unused imports related to Magi1LoraLoaderMixin …
tolgacangoz Jun 28, 2025
5616238
Refactor Magi1 encoder to support variational encoding
tolgacangoz Jun 28, 2025
6d17954
Refactor: Update text encoder and tokenizer initialization in the mai…
tolgacangoz Jun 28, 2025
f856187
Refactor: Update text encoder and tokenizer to use DeepFloyd model
tolgacangoz Jun 28, 2025
dc9bb61
Refactor: Update MAGI-1 transformer conversion script and related com…
tolgacangoz Jun 28, 2025
6027704
Refactor MAGI-1 conversion script for accurate loading
tolgacangoz Jun 28, 2025
58dc666
style
tolgacangoz Jun 28, 2025
87299a4
fix-copies
tolgacangoz Jun 28, 2025
017cfc3
Refactor: Remove einops dependency in Magi1 VAE
tolgacangoz Jun 28, 2025
ecece86
style
tolgacangoz Jun 28, 2025
33b6a65
style
tolgacangoz Jun 28, 2025
e725461
Rename autoencoder_kl_magi.md to autoencoder_kl_magi1.md
tolgacangoz Jul 4, 2025
7415473
Refactor: Rename MagiTransformer classes to Magi1Transformer for cons…
tolgacangoz Jul 4, 2025
2d29f94
Refactor: Rename Magi1AttnProcessor2_0 and Magi1TransformerBlock clas…
tolgacangoz Jul 5, 2025
b5f58aa
Merge branch 'main' into add-magi-1
tolgacangoz Jul 5, 2025
85b2b74
Improve Magi1 VAE to handle variable input resolutions
tolgacangoz Jul 6, 2025
d43d6dd
Refactor: Comment out _keep_in_fp32_modules and remove clamping in Au…
tolgacangoz Jul 6, 2025
e1c548b
style
tolgacangoz Jul 6, 2025
03a4b4c
Refactor: Replace `FP32LayerNorm` with a manual implementation
tolgacangoz Jul 6, 2025
1bfb06e
Removes residual connection in VAE attention processor
tolgacangoz Jul 6, 2025
a6e18e6
up
tolgacangoz Jul 6, 2025
c04df0f
tolgacangoz Jul 6, 2025
04c0b09
up
tolgacangoz Jul 6, 2025
a75997c
style
tolgacangoz Jul 6, 2025
d48c5f6
Refactor attention processing to improve tensor shape handling in Mag…
tolgacangoz Jul 7, 2025
72f97be
Add Magi1VAELayerNorm class for improved integration in Magi1VAEAttnP…
tolgacangoz Jul 7, 2025
ba9d3ff
Refactor Magi1VAEAttnProcessor2_0 to improve query, key, and value ha…
tolgacangoz Jul 7, 2025
535c0dc
Refactor: Remove `timm` dependency in Magi1 VAE
tolgacangoz Jul 7, 2025
0b0f1c5
style
tolgacangoz Jul 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,8 @@
title: Lumina2Transformer2DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/magi_transformer_3d
title: Magi1Transformer3DModel
- local: api/models/mochi_transformer3d
title: MochiTransformer3DModel
- local: api/models/omnigen_transformer
Expand Down Expand Up @@ -375,6 +377,8 @@
title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoderkl_ltx_video
title: AutoencoderKLLTXVideo
- local: api/models/autoencoder_kl_magi1
title: AutoencoderKLMagi1
- local: api/models/autoencoderkl_magvit
title: AutoencoderKLMagvit
- local: api/models/autoencoderkl_mochi
Expand Down Expand Up @@ -497,6 +501,8 @@
title: Lumina 2.0
- local: api/pipelines/lumina
title: Lumina-T2X
- local: api/pipelines/magi1
title: MAGI-1
- local: api/pipelines/marigold
title: Marigold
- local: api/pipelines/mochi
Expand Down
34 changes: 34 additions & 0 deletions docs/source/en/api/models/autoencoder_kl_magi1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLMagi1

The 3D variational autoencoder (VAE) model with KL loss used in [MAGI-1: Autoregressive Video Generation at Scale](https://arxiv.org/abs/2505.13211) by Sand.ai.

MAGI-1 uses a transformer-based VAE with 8x spatial and 4x temporal compression, providing fast average decoding time and highly competitive reconstruction quality.

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLMagi1

vae = AutoencoderKLMagi1.from_pretrained("sand-ai/MAGI-1", subfolder="vae", torch_dtype=torch.float32)
```

## AutoencoderKLMagi1

[[autodoc]] AutoencoderKLMagi1
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
32 changes: 32 additions & 0 deletions docs/source/en/api/models/magi1_transformer_3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# Magi1Transformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [MAGI-1: Autoregressive Video Generation at Scale](https://arxiv.org/abs/2505.13211) by Sand.ai.

MAGI-1 is an autoregressive denoising video generation model that generates videos chunk-by-chunk instead of as a whole. Each chunk (24 frames) is denoised holistically, and the generation of the next chunk begins as soon as the current one reaches a certain level of denoising.

The model can be loaded with the following code snippet.

```python
from diffusers import Magi1Transformer3DModel

transformer = Magi1Transformer3DModel.from_pretrained("sand-ai/MAGI-1", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## Magi1Transformer3DModel

[[autodoc]] Magi1Transformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Loading