Skip to content

Commit 5904256

Browse files
committed
Updated README
1 parent 8768714 commit 5904256

File tree

2 files changed

+83
-8
lines changed

2 files changed

+83
-8
lines changed

README-CN.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Seed-VC
2+
*[English](README.md) | 简体中文*
3+
4+
一个受 SEED-TTS 启发的新型零样本声音转换方案。
5+
6+
目前发布的模型支持零样本语音转换和零样本歌声转换。无需任何训练,只需提供1~30秒的参考语音即可克隆声音。
7+
8+
要查看演示列表和与之前语音转换模型的比较,请访问我们的 [演示页面](https://plachtaa.github.io/seed-vc/)🌐
9+
10+
我们将继续改进模型质量并添加更多功能。
11+
12+
## 安装 📥
13+
建议在 Windows 或 Linux 上使用 Python 3.10:
14+
```bash
15+
pip install -r requirements.txt
16+
```
17+
18+
## 使用方法🛠️
19+
首次运行推理时,将自动下载最新模型的检查点。
20+
21+
命令行推理:
22+
```bash
23+
python inference.py --source <源语音文件路径> \
24+
--target <参考语音文件路径> \
25+
--output <输出目录> \
26+
--diffusion-steps 25 \ # 建议歌声转换时使用50~100
27+
--length-adjust 1.0 \
28+
--inference-cfg-rate 0.7 \
29+
--n-quantizers 3 \
30+
--f0-condition False \ # 歌声转换时设置为 True
31+
--auto-f0-condition False \ # 设置为 True 可自动调整源音高到目标音高,歌声转换中通常不使用
32+
--semi-tone-shift 0 # 歌声转换的半音移调
33+
```
34+
其中:
35+
- `source` 待转换为参考声音的源语音文件路径
36+
- `target` 声音参考的语音文件路径
37+
- `output` 输出目录的路径
38+
- `diffusion-steps` 使用的扩散步数,默认25,最佳质量建议使用50-100,最快推理使用4-10
39+
- `length-adjust` 长度调整系数,默认1.0,<1.0加速语音,>1.0减慢语音
40+
- `inference-cfg-rate` 对输出有细微影响,默认0.7
41+
- `n-quantizers` 用的 FAcodec 码本数量,默认3,使用的码本越少,保留的源音频韵律越少
42+
- `f0-condition` 是否根据源音频的音高调整输出音高,默认 False,歌声转换时设置为 True
43+
- `auto-f0-condition` 是否自动将源音高调整到目标音高水平,默认 False,歌声转换中通常不使用
44+
- `semi-tone-shift` 歌声转换中的半音移调,默认0
45+
46+
Gradio 网页界面:
47+
```bash
48+
python app.py
49+
```
50+
然后在浏览器中打开 `http://localhost:7860/` 使用网页界面。
51+
## TODO📝
52+
- [x] 发布代码
53+
- [x] 发布 v0.1 预训练模型: [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-SeedVC-blue)](https://huggingface.co/Plachta/Seed-VC)
54+
- [x] Hugging Face Space 演示: [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-blue)](https://huggingface.co/spaces/Plachta/Seed-VC)
55+
- [x] HTML 演示页面(可能包含与其他 VC 模型的比较): [Demo](https://plachtaa.github.io/seed-vc/)
56+
- [ ] 流式推理
57+
- [x] 歌声转换
58+
- [ ] 提高源音频和参考音频的抗噪性
59+
- [x] 这已在 f0 条件模型中启用,但不确定效果如何...
60+
- [ ] 潜在的架构改进
61+
- [x] 类似U-ViT 的skip connection
62+
- [x] 将输入更改为 [FAcodec](https://github.com/Plachtaa/FAcodec) tokens
63+
- [ ] 自定义数据训练代码
64+
- [ ] 更多待添加
65+
66+
## 更新日志 🗒️
67+
- 2024-09-18:
68+
- 更新了用于歌声转换的模型
69+
- 2024-09-14:
70+
- 更新了 v0.2 预训练模型,具有更小的尺寸和更少的扩散步骤即可达到相同质量,且增加了控制韵律保留的能力
71+
- 添加了命令行推理脚本
72+
- 添加了安装和使用说明

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
1-
# Seed-VC
1+
# Seed-VC
2+
*English | [简体中文](README-CN.md)*
3+
24
A new zero-shot voice conversion scheme inspired by SEED-TTS.
35

4-
Currently released model supports *zero-shot voice conversion* and *zero-shot singing voice conversion*. Without any training, it is able to clone a voice given a reference speech of 1~30 seconds.
6+
Currently released model supports *zero-shot voice conversion* 🔊 and *zero-shot singing voice conversion* 🎙. Without any training, it is able to clone a voice given a reference speech of 1~30 seconds.
57

6-
To find a list of demos and comparisons with previous voice conversion models, please visit our [demo page](https://plachtaa.github.io/seed-vc/)
8+
To find a list of demos and comparisons with previous voice conversion models, please visit our [demo page](https://plachtaa.github.io/seed-vc/)🌐
79

810
We are keeping on improving the model quality and adding more features.
911

10-
## Installation
12+
## Installation📥
1113
Suggested python 3.10 on Windows or Linux.
1214
```bash
1315
pip install -r requirements.txt
1416
```
1517

16-
## Usage
18+
## Usage🛠️
1719
Checkpoints of the latest model release will be downloaded automatically when first run inference.
1820

1921
Command line inference:
@@ -39,13 +41,14 @@ where:
3941
- `n-quantizers` is the number of quantizers from FAcodec to use, default is 3, the less quantizer used, the less prosody of source audio is preserved
4042
- `f0-condition` is the flag to condition the pitch of the output to the pitch of the source audio, default is False, set to True for singing voice conversion
4143
- `auto-f0-condition` is the flag to auto adjust source pitch to target pitch level, default is False, normally not used in singing voice conversion
42-
- `semi-tone-shift` is the pitch shift in semitones for singing voice conversion, default is 0
44+
- `semi-tone-shift` is the pitch shift in semitones for singing voice conversion, default is 0
45+
4346
Gradio web interface:
4447
```bash
4548
python app.py
4649
```
4750
Then open the browser and go to `http://localhost:7860/` to use the web interface.
48-
## TODO
51+
## TODO📝
4952
- [x] Release code
5053
- [x] Release v0.1 pretrained model: [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-SeedVC-blue)](https://huggingface.co/Plachta/Seed-VC)
5154
- [x] Huggingface space demo: [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-blue)](https://huggingface.co/spaces/Plachta/Seed-VC)
@@ -60,7 +63,7 @@ Then open the browser and go to `http://localhost:7860/` to use the web interfac
6063
- [ ] Code for training on custom data
6164
- [ ] More to be added
6265

63-
## CHANGELOGS
66+
## CHANGELOGS🗒️
6467
- 2024-09-18:
6568
- Updated f0 conditioned model for singing voice conversion
6669
- 2024-09-14:

0 commit comments

Comments
 (0)