Stars
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae…
Multi-Language Backend Framework that unifies APIs, background jobs, queues, workflows, streams, and AI agents with a single core primitive with built-in observability and state management.
zero-shot voice conversion & singing voice conversion, with real-time support
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
使用LangGraph+DeepSeek-R1+FastAPI+Gradio实现一个带有记忆功能的流量包推荐智能客服web端用例,同时也支持gpt大模型、国产大模型(OneApi方式)、Ollama本地开源大模型、阿里通义千问大模型
AutoGen最新架构v0.4正式发布第一个稳定版本,v0.4是对AutoGen的一次从头开始的重写,目的是为构建Agent创建一个更健壮、可扩展、更易用的跨语言库,其应用接口采用分层架构设计,存在多套软件接口用以满足不同的场景需求 。
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
A free and open source, self hosted Ai based live meeting note taker and minutes summary generator that can completely run in your Local device (Mac OS and windows OS Support added. Working on addi…
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Convert PDF to markdown + JSON quickly with high accuracy
Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown
An open-source RAG-based tool for chatting with your documents.
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
TrustRAG:The RAG Framework within Reliable input,Trusted output
A generative speech model for daily dialogue.
Production-ready platform for agentic workflow development.
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes; NeurIPS 2024; Official code