Seed-VC Realtime

Seed-VC Real-time Voice Conversion separated for easier deployment and usage.

This project provides a standalone implementation of the real-time voice conversion functionality from Seed-VC, supporting zero-shot voice conversion with low latency.

Features

Real-time Voice Conversion: Capable of cloning voices with 1~30 seconds of reference speech.
Low Latency: Algorithm delay of ~300ms and device side delay of ~100ms.
Zero-shot: No training required for new voices.

Requirements

OS: Windows or Linux.
GPU: NVIDIA GPU with CUDA support is strongly recommended for real-time performance.
Python: 3.10

Installation & Usage

Create and activate a virtual environment:

# Windows
python -m venv venv
venv\Scripts\activate

# Linux
python3 -m venv venv
source venv/bin/activate

Install dependencies:

Note: The default requirements.txt includes CUDA 12.1 support. If you have a different CUDA version, please edit requirements.txt accordingly.
```
pip install -r requirements.txt
```
Run the GUI:

python real-time-gui.py --checkpoint-path <path-to-checkpoint> --config-path <path-to-config>

checkpoint is the path to the model checkpoint if you have trained or fine-tuned your own model, leave to blank to auto-download default model from huggingface. (seed-uvit-tat-xlsr-tiny)
config is the path to the model config if you have trained or fine-tuned your own model, leave to blank to auto-download default config from huggingface

Configuration & Performance

It is strongly recommended to use a GPU for real-time voice conversion. Below are some benchmark results on an NVIDIA RTX 3060 Laptop GPU:

Model Configuration	Diffusion Steps	Inference CFG Rate	Max Prompt Length	Block Time (s)	Latency (ms)	Inference Time (ms)
seed-uvit-xlsr-tiny	10	0.7	3.0	0.18s	430ms	150ms

Key Parameters

Diffusion Steps: 4~10 recommended for fastest real-time inference.
Block Time: The length of each audio chunk. Must be greater than inference time per block.
Extra context: Increasing this improves stability but adds latency.
Virtual Cable: Use VB-CABLE to route GUI output to a virtual microphone for use in other applications (Discord, Zoom, etc.).

Acknowledgements 🙏

This project is a separated version of Seed-VC. All credit goes to the original authors.

Seed-VC - Original Project
Amphion for providing computational resources and inspiration!
Vevo for theoretical foundation of V2 model
MegaTTS3 for multi-condition CFG inference implemented in V2 model
ASTRAL-quantiztion for the amazing speaker-disentangled speech tokenizer used by V2 model
RVC for foundationing the real-time voice conversion
SEED-TTS for the initial idea

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
assets		assets
configs		configs
examples		examples
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hf_utils.py		hf_utils.py
real-time-gui.py		real-time-gui.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Seed-VC Realtime

Features

Requirements

Installation & Usage

Configuration & Performance

Key Parameters

Acknowledgements 🙏

About

Uh oh!

Releases

Packages

Languages

License

jiaheguo521/seed-vc-realtime

Folders and files

Latest commit

History

Repository files navigation

Seed-VC Realtime

Features

Requirements

Installation & Usage

Configuration & Performance

Key Parameters

Acknowledgements 🙏

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages