You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zero-shot voice conversion trained according to the scheme described in SEED-TTS.
The VC quality is surprisingly good in terms of both audio quality and timbre similarity. We decide to continue along this pathway see where it can achieve.
TODO:
Release code
Release v0.1 pretrained model:
Huggingface space demo:
HTML demo page (maybe with comparisons to other VC models)
Code for training on custom data
Streaming inference
Potential architecture improvements
More to be added
About
fork: zero-shot voice conversion with in context learning