Skip to content
View seastar105's full-sized avatar

Block or report seastar105

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 38 4 Updated Jun 28, 2025

[ICML 2025 Spotlight] Direct Discriminative Optimization: Supercharging Diffusion/Autoregressive with GAN-type Discrimination

Python 69 Updated Jun 22, 2025

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 15 2 Updated Jun 30, 2025

Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.

Python 512 51 Updated Jul 1, 2025

Foundation Models and Data for Human-Human and Human-AI interactions.

Python 127 3 Updated Jul 2, 2025

Distributed Coordinated Sequence Sampler

1 Updated Jun 26, 2025

PodAgent: A Comprehensive Framework for Podcast Generation

Python 103 11 Updated May 16, 2025

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 132 2 Updated Jun 26, 2025

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

JavaScript 65 1 Updated Jun 26, 2025

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

HTML 94 5 Updated Feb 20, 2025

semantic tokenizer for speech and music

Python 20 3 Updated Jun 27, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 214 22 Updated Jul 2, 2025

Code for the paper Don't Pay Attention

Python 47 3 Updated Jun 16, 2025

[Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey

Python 155 2 Updated Jun 19, 2025
Python 5 Updated Jun 19, 2024
Python 169 29 Updated Jun 30, 2025

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 1,024 109 Updated Jul 1, 2025
Python 14 1 Updated Jun 26, 2025

VoiceHub: A Unified Inference Interface for TTS Models

Python 38 1 Updated Jun 30, 2025

Muon fsdp 2

Python 12 1 Updated Jun 8, 2025
Python 193 8 Updated Apr 14, 2025

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 222 12 Updated Jun 17, 2025

Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.

Jupyter Notebook 376 29 Updated Jul 2, 2025

The VoxTube dataset official repository

HTML 69 1 Updated Feb 14, 2024

Unofficial fork of taku910/mecab (Yet another Japanese morphological analyzer)

C++ 36 2 Updated Jul 1, 2025
Python 54 1 Updated Jun 26, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40 13 Updated Jun 6, 2025

A PyTorch native platform for training generative AI models

Python 3,995 416 Updated Jul 2, 2025
Next