seastar105

HAESUNG JEON seastar105

54 followers · 89 following

Seoul, Korea

Achievements

Lists (5)

Sort

Stars

LAION-AI / emotion-annotations

Python 38 4 Updated Jun 28, 2025

NVlabs / DDO

[ICML 2025 Spotlight] Direct Discriminative Optimization: Supercharging Diffusion/Autoregressive with GAN-type Discrimination

Python 69 Updated Jun 22, 2025

NVIDIA / elucidated-text-to-audio

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 15 2 Updated Jun 30, 2025

Tencent-Hunyuan / Hunyuan-A13B

Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.

Python 512 51 Updated Jul 1, 2025

facebookresearch / seamless_interaction

Foundation Models and Data for Human-Human and Human-AI interactions.

Python 127 3 Updated Jul 2, 2025

lifeiteng / DiscoSeqSampler

Distributed Coordinated Sequence Sampler

1 Updated Jun 26, 2025

yujxx / PodAgent

PodAgent: A Comprehensive Framework for Podcast Generation

Python 103 11 Updated May 16, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 132 2 Updated Jun 26, 2025

csuhan / Tar

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

JavaScript 65 1 Updated Jun 26, 2025

kehanlu / DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

HTML 94 5 Updated Feb 20, 2025

Mddct / usm-tokenizer

semantic tokenizer for speech and music

Python 20 3 Updated Jun 27, 2025

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 214 22 Updated Jul 2, 2025

rimads / avey-dpa

Code for the paper Don't Pay Attention

Python 47 3 Updated Jun 16, 2025

LiQiiiii / DLLM-Survey

[Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey

Python 155 2 Updated Jun 19, 2025

AI4Bharat / Nirantar

Python 5 Updated Jun 19, 2024

Cypress-Yang / SongBloom

Python 169 29 Updated Jun 30, 2025

MeiGen-AI / MultiTalk

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 1,024 109 Updated Jul 1, 2025

yxduir / LLM-SRT

Python 14 1 Updated Jun 26, 2025

kadirnar / VoiceHub

VoiceHub: A Unified Inference Interface for TTS Models

Python 38 1 Updated Jun 30, 2025

samsja / muon_fsdp_2

Muon fsdp 2

Python 12 1 Updated Jun 8, 2025

step-law / steplaw

Python 193 8 Updated Apr 14, 2025

ictnlp / Stream-Omni

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 222 12 Updated Jun 17, 2025

kyutai-labs / delayed-streams-modeling

Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.

Jupyter Notebook 376 29 Updated Jul 2, 2025

IDRnD / VoxTube

The VoxTube dataset official repository

HTML 69 1 Updated Feb 14, 2024

shogo82148 / mecab

Unofficial fork of taku910/mecab (Yet another Japanese morphological analyzer)

C++ 36 2 Updated Jul 1, 2025

FunAudioLLM / CV3-Eval

Python 54 1 Updated Jun 26, 2025

MoonshotAI / Kimi-Audio-Evalkit

Python 120 5 Updated Apr 29, 2025

fyabc / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40 13 Updated Jun 6, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,995 416 Updated Jul 2, 2025

tencent-ailab / SongGeneration

Python 473 41 Updated Jun 27, 2025

HAESUNG JEON seastar105

Lists (5)

diffusion

drive-model

generation

tts-dataset

vocoder

Stars