Kokoro-82M
Lightweight, fast, and high-quality open TTS model with 82M params
Kokoro-82M is an open-weight, lightweight text-to-speech (TTS) model featuring 82 million parameters, developed to deliver high-quality voice synthesis with exceptional efficiency. Despite its compact size, Kokoro rivals the output quality of much larger models while remaining significantly faster and cheaper to run. Built on StyleTTS2 and ISTFTNet architectures, it uses a decoder-only setup without diffusion, enabling rapid audio generation with low computational overhead. Kokoro supports multiple voices and languages and is compatible with environments like Google Colab or production APIs. It was trained on a few hundred hours of permissively licensed and synthetic audio paired with IPA phoneme labels, ensuring broad legal usability. Licensed under Apache 2.0, Kokoro is deployable in commercial, research, and personal projects, including those with monetized outputs under $1M in annual revenue.