Stars
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
使用FreeSWITCH接受用户手机呼叫,通过UniMRCP Server集成讯飞开放平台(xfyun)插件将用户语音进行语音识别(ASR),并根据自定义业务逻辑调用语音合成(TTS),构建简单的端到端语音呼叫中心。
[Findings of NAACL 2024] Source code of paper CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
In this repository, you will learn how code works in VITS(Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) in Jupyter Notebooks, including normalizing da…
模型压缩的小白入门教程,PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases
PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.