Lists (6)
Sort Name ascending (A-Z)
Stars
[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Explore the Multimodal “Aha Moment” on 2B Model
Collect every awesome work about r1!
A library for advanced large language model reasoning
Build multimodal language agents for fast prototype and production
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Everything you need to know to build your own RAG application
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
One for All Modalities Evaluation Toolkit - including text, image, video, audio tasks.
Deep Learning tools and applications for NVIDIA AGX platforms.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Awesome Data-Driven Autonomous Driving Solutions. Also the official repository of our survey paper: Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Min…
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A unified framework for 3D content generation.
TripoSR: Fast 3D Object Reconstruction from a Single Image
Generative Models by Stability AI
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
An open source implementation of CLIP.
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multi…
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
Python class for calculating confusion matrix for object detection task