i-MaTh

Follow

🎯

Focusing

i-MaTh i-MaTh

🎯

Focusing

Follow

Shanghai, China.

27 followers · 315 following

East China Normal University
Shanghai

Achievements

Achievements

Highlights

Pro

Starred repositories

lattifai / lattifai-python

Precision Alignment, Infinite Possibilities

Python 93 6 Updated Nov 10, 2025

stepfun-ai / Step-Audio-EditX

Python 396 19 Updated Nov 11, 2025

shaochenze / calm

Official implementation of "Continuous Autoregressive Language Models"

Python 487 60 Updated Nov 10, 2025

sh-lee-prml / PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

Python 210 16 Updated Apr 14, 2025

NVIDIA / audio-intelligence

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 85 5 Updated Oct 15, 2025

vibevoice-community / VibeVoice

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 694 275 Updated Oct 27, 2025

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 1,934 217 Updated Nov 6, 2025

Soul-AILab / SAC

Trainging, inference, and testing of the SAC speech codec model.

Python 84 6 Updated Nov 1, 2025

NVlabs / OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 506 43 Updated Oct 29, 2025

kyutai-labs / nanoGPTaudio

Forked from karpathy/nanoGPT

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 131 3 Updated Oct 20, 2025

luotianze666 / WaveFM

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Python 115 10 Updated Mar 27, 2025

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

813 64 Updated Oct 28, 2025

meituan-longcat / LongCat-Audio-Codec

LongCat Audio Tokenizer and Detokenizer

Python 208 15 Updated Nov 11, 2025

ddlBoJack / Omni-Captioner

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 85 Updated Oct 17, 2025

ludlows / PESQ

PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)

C 610 103 Updated Sep 5, 2024

cofe-ai / flm-audio

FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.

Python 46 6 Updated Sep 30, 2025

huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,232 484 Updated Apr 15, 2025

nazdridoy / kokoro-tts

A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.

Python 902 106 Updated Sep 13, 2025

fcumlin / DNSMOSPro

Official implementation of DNSMOS Pro (accepted at INTERSPEECH 2024).

Python 65 7 Updated Jun 8, 2025

ArchiMickey / rvqllm

Language modelling on RVQ tokens with minimal codes

Python 10 Updated Oct 10, 2025

wshobson / agents

Intelligent automation and multi-agent orchestration for Claude Code

Python 20,493 2,287 Updated Nov 8, 2025

anthropics / claude-code

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

TypeScript 42,057 2,777 Updated Nov 12, 2025

yujxx / PodEval

A comprehensive toolkit for podcast evaluation. https://arxiv.org/abs/2510.00485

JavaScript 15 Updated Nov 2, 2025

ZhikangNiu / Semantic-VAE

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 91 4 Updated Oct 26, 2025

wyhzhen6 / FairDialogue

Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations

Python 10 Updated Nov 4, 2025

ace-step / ACE-Step

ACE-Step: A Step Towards Music Generation Foundation Model

Python 3,259 380 Updated Jun 27, 2025

qiuqiangkong / audio_vae

Python 3 Updated Oct 6, 2025

neuphonic / neutts-air

On-device TTS model by Neuphonic

Python 3,938 389 Updated Nov 4, 2025

inclusionAI / Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 334 26 Updated Oct 28, 2025

OFA-Sys / AIR-Bench

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Python 124 6 Updated Dec 9, 2024

Starred topics

text-to-speech