Skip to content

Commit 1399efc

Browse files
committed
additional notes on training
1 parent a516247 commit 1399efc

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

README-ZH.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ python real-time-gui.py --checkpoint <path-to-checkpoint> --config <path-to-conf
103103
这里是一个简单的Colab示例以供参考: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1R1BJTqMsTXZzYAVx3j1BiemFXog9pbQG?usp=sharing)
104104
1. 准备您的数据集。必须满足以下要求:
105105
- 文件结构不重要
106+
- 每条音频长度必须在1-30秒之间,否则会被自动忽略
106107
- 所有音频文件必须是以下格式之一:`.wav` `.flac` `.mp3` `.m4a` `.opus` `.ogg`
107108
- 不需要说话人标签,但请确保每位说话人至少有 1 条语音
108109
- 当然,数据越多,模型的表现就越好
@@ -134,7 +135,9 @@ where:
134135
- `save-every` 保存模型检查点的步数
135136
- `num-workers` 数据加载的工作线程数量,建议 Windows 上设置为 0
136137

137-
4. 训练完成后,您可以通过指定检查点和配置文件的路径来进行推理。
138+
4. 如果需要从上次停止的地方继续训练,只需运行同样的命令即可。通过传入相同的 `run-name``config` 参数,程序将能够找到上次训练的检查点和日志。
139+
140+
5. 训练完成后,您可以通过指定检查点和配置文件的路径来进行推理。
138141
- 它们应位于 `./runs/<run-name>/` 下,检查点命名为 `ft_model.pth`,配置文件名称与训练配置文件相同。
139142
- 在推理时,您仍需指定要使用的说话人的参考音频文件,类似于零样本推理。
140143

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ Fine-tuning on custom data allow the model to clone someone's voice more accurat
112112
A Colab Tutorial is here for you to follow: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1R1BJTqMsTXZzYAVx3j1BiemFXog9pbQG?usp=sharing)
113113
1. Prepare your own dataset. It has to satisfy the following:
114114
- File structure does not matter
115+
- Each audio file should range from 1 to 30 seconds, otherwise will be ignored
115116
- All audio files should be in on of the following formats: `.wav` `.flac` `.mp3` `.m4a` `.opus` `.ogg`
116117
- Speaker label is not required, but make sure that each speaker has at least 1 utterance
117118
- Of course, the more data you have, the better the model will perform
@@ -143,7 +144,9 @@ where:
143144
- `save-every` is the number of steps to save the model checkpoint
144145
- `num-workers` is the number of workers for data loading, set to 0 for Windows
145146

146-
4. After training, you can use the trained model for inference by specifying the path to the checkpoint and config file.
147+
4. If training accidentially stops, you can resume training by running the same command again, the training will continue from the last checkpoint. (Make sure `run-name` and `config` arguments are the same so that latest checkpoint can be found)
148+
149+
5. After training, you can use the trained model for inference by specifying the path to the checkpoint and config file.
147150
- They should be under `./runs/<run-name>/`, with the checkpoint named `ft_model.pth` and config file with the same name as the training config file.
148151
- You still have to specify a reference audio file of the speaker you'd like to use during inference, similar to zero-shot usage.
149152

0 commit comments

Comments
 (0)