Starred repositories
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
real time face swap and one-click video deepfake with only a single image
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
A community-maintained Python framework for creating mathematical animations.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Real-time face swap for PC streaming or video calls
Open-Sora: Democratizing Efficient Video Production for All
State-of-the-art 2D and 3D Face Analysis Project
DeepSeek Coder: Let the Code Write Itself
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Build Real-Time Knowledge Graphs for AI Agents
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Janus-Series: Unified Multimodal Understanding and Generation Models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
Wan: Open and Advanced Large-Scale Video Generative Models
A Conversational Speech Generation Model
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!

