Stars
- All languages
- Assembly
- Batchfile
- Bicep
- Bikeshed
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Emacs Lisp
- Go
- HLSL
- HTML
- Haxe
- Java
- JavaScript
- Jsonnet
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MDX
- MLIR
- Macaulay2
- Makefile
- Markdown
- Max
- Mojo
- Nim
- Nix
- OCaml
- Objective-C
- PHP
- Perl
- PostScript
- PowerShell
- Processing
- Pure Data
- Python
- QML
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- SQLPL
- Scala
- Shell
- Solidity
- Svelte
- Swift
- TeX
- TypeScript
- TypeSpec
- V
- Vim Script
- Vue
- Zig
🎥 Python and OpenCV-based scene cut/transition detection program & library.
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
使用 NextJS + Notion API 实现的,支持多种部署方案的静态博客,无需服务器、零门槛搭建网站,为Notion和所有创作者设计。 (A static blog built with NextJS and Notion API, supporting multiple deployment options. No server required, zero threshold t…
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
Declarative framework for building Agentic AI Services. Build powerful AI apps allowing Agents to navigate, interact and transact with your service.
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Fine-tune Stable Audio Open with DiT ControlNet.
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
TikTok Content Scraper -- No API-Key needed, minimal dependencies, citable | Download videos (MP4), slides (JPEG) and metadata of author, music, file, hashtags, content, interactions etc.
AI-powered Auto-Clipper: automatically detects highlight segments from YouTube channels, Twitch/Kick live streams (via audience-retention data, chat spikes and timestamped comments), clips them wit…
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Truly universal encoding detector in pure Python.
Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.
Noise supression using deep filtering
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
A novel media player that allows you to navigate by speaker
⚡ Accelerate speaker diarization with Senko, processing 1 hour of audio in just 5 seconds on powerful hardware—boost your audio analysis efficiency.
LLM story writer with a focus on high-quality long output based on a user provided prompt.
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio …
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale…
Modified version of Chatterbox that accepts text files as input and no character restrictions. I use it to make audiobooks, especially for my kids.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
Video translation and dubbing tool powered by LLMs. The video translator offers 100 language translations and one-click full-process deployment. The video translation output is optimized for platfo…
智能视频多语言AI配音/翻译工具 - Linly-Dubbing — “AI赋能,语言无界”
A flask built web app that leverages the power of OpenAI's whisper model to transcribe audio and video files. Has support for various file formats. Generates timestamped .srt files.


