CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo
CogVideo is an open source text-/image-/video-to-video generation project that hosts the CogVideoX family of diffusion-transformer models and end-to-end tooling. The repo includes SAT and Diffusers implementations, turnkey demos, and fine-tuning pipelines (including LoRA) designed to run across a wide range of NVIDIA GPUs, from desktop cards (e.g., RTX 3060) to data-center hardware (A100/H100). Current releases cover CogVideoX-2B, CogVideoX-5B, and the upgraded CogVideoX1.5-5B variants, plus image-to-video (I2V) models, with options for BF16/FP16/FP32—and INT8 quantized inference via TorchAO for memory-constrained setups. The codebase emphasizes practical deployment: prompt-optimization utilities (LLM-assisted long-prompt expansion), Colab notebooks, a Gradio web app, and multiple performance knobs (tiling/slicing, CPU offload, torch.compile, multi-GPU, and FA3 backends via partner projects).