Stars
Reference PyTorch implementation and models for DINOv3
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Official code for ICLR 2024 paper, "A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation"
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Emotional FusionBrain Challenge 4.0 - dev
Robust Speech Recognition via Large-Scale Weak Supervision
High-resolution models for human tasks.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Official repository for "AM-RADIO: Reduce All Domains Into One"
Collection of awesome parameter-efficient fine-tuning resources.
Collection of AWESOME vision-language models for vision tasks
EVA Series: Visual Representation Fantasies from BAAI
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
AI Journey 2023: Russian Sign Language Recognition (Equal AI Track)
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
1st place solution to the Google - American Sign Language Fingerspelling Recognition competition
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
The definitive Web UI for local AI, with powerful features and easy setup.


