Open
Description
Describe the bug
I would like to run the Cosmos-Predict2-14B-Text2Image model, but it is too large to fit in 24GB of VRAM normally, so I tried to load a Q8_0 GGUF quantization. I copied some code from the HiDreamImageTransformer2DModel page and tried to adapt it, but I get the following error:
AttributeError: type object 'CosmosTransformer3DModel' has no attribute 'from_single_file'
Is there supposed to be another way to load a 8 bit quantization? From what I have seen, Q8_0 typically produces results that are much closer to full precision compared to FP8.
Reproduction
transformer = CosmosTransformer3DModel.from_single_file(
rf"{model_14b_id}\cosmos-predict2-14b-text2image-Q8_0.gguf",
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16
)
pipe_14b = Cosmos2TextToImagePipeline.from_pretrained(
model_14b_id,
torch_dtype=torch.bfloat16,
transformer = transformer
)
Logs
transformer = CosmosTransformer3DModel.from_single_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'CosmosTransformer3DModel' has no attribute 'from_single_file'
System Info
- 🤗 Diffusers version: 0.35.0.dev0
- Platform: Windows-10-10.0.26100-SP0
- Running on Google Colab?: No
- Python version: 3.11.9
- PyTorch version (GPU?): 2.7.1+cu128 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.33.1
- Transformers version: 4.53.0
- Accelerate version: 1.8.1
- PEFT version: 0.15.2
- Bitsandbytes version: 0.46.1
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No