- Seoul, Korea
Lists (5)
Sort Name ascending (A-Z)
Stars
[ICML 2025 Spotlight] Direct Discriminative Optimization: Supercharging Diffusion/Autoregressive with GAN-type Discrimination
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.
Foundation Models and Data for Human-Human and Human-AI interactions.
PodAgent: A Comprehensive Framework for Podcast Generation
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
[Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
VoiceHub: A Unified Inference Interface for TTS Models
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.
Unofficial fork of taku910/mecab (Yet another Japanese morphological analyzer)
fyabc / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
A PyTorch native platform for training generative AI models