Skip to content

Commit 2830a42

Browse files
committed
improve real time vc stability by expanding left context window for content encoder
1 parent 6901140 commit 2830a42

File tree

3 files changed

+17
-7
lines changed

3 files changed

+17
-7
lines changed

README-ZH.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ python real-time-gui.py --checkpoint <path-to-checkpoint> --config <path-to-conf
8787

8888
| 模型配置 | 扩散步数 | Inference CFG Rate | 最大prompt长度 | 每块时间 (s) | 交叉淡化长度 (s) | 额外上下文(左)(s) | 额外上下文(右)(s) | 延迟 (ms) | 每块推理时间 (ms) |
8989
|---------------------|------|--------------------|------------|----------|------------|-------------|-------------|---------|-------------|
90-
| seed-uvit-xlsr-tiny | 10 | 0.7 | 3.0 | 0.18s | 0.04s | 0.5s | 0.02s | 430ms | 150ms |
90+
| seed-uvit-xlsr-tiny | 10 | 0.7 | 3.0 | 0.18s | 0.04s | 2.5s | 0.02s | 430ms | 150ms |
9191

9292
你可以根据设备性能调整 GUI 中的参数,只要推理时间小于块时间,语音转换流就可以正常工作。 注意,如果你正在运行其他占用 GPU 的任务(如游戏、看视频),推理速度可能会下降。
9393

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ Some performance testing has been done on a NVIDIA RTX 3060 Laptop GPU, results
9494

9595
| Model Configuration | Diffusion Steps | Inference CFG Rate | Max Prompt Length | Block Time (s) | Crossfade Length (s) | Extra context (left) (s) | Extra context (right) (s) | Latency (ms) | Inference Time per Chunk (ms) |
9696
|---------------------------------|-----------------|--------------------|-------------------|----------------|----------------------|--------------------------|---------------------------|--------------|-------------------------------|
97-
| seed-uvit-xlsr-tiny | 10 | 0.7 | 3.0 | 0.18s | 0.04s | 0.5s | 0.02s | 430ms | 150ms |
97+
| seed-uvit-xlsr-tiny | 10 | 0.7 | 3.0 | 0.18s | 0.04s | 2.5s | 0.02s | 430ms | 150ms |
9898

9999
You can adjust the parameters in the GUI according to your own device performance, the voice conversion stream should work well as long as Inference Time is less than Block Time.
100100
Note that inference speed may drop if you are running other GPU intensive tasks (e.g. gaming, watching videos)

real-time-gui.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
reference_wav_name = ""
4646

4747
prompt_len = 3 # in seconds
48+
ce_dit_difference = 2 # 2 seconds
4849
@torch.no_grad()
4950
def custom_infer(model_set,
5051
reference_wav,
@@ -61,6 +62,7 @@ def custom_infer(model_set,
6162
global prompt_condition, mel2, style2
6263
global reference_wav_name
6364
global prompt_len
65+
global ce_dit_difference
6466
(
6567
model,
6668
semantic_fn,
@@ -94,12 +96,20 @@ def custom_infer(model_set,
9496
reference_wav_name = new_reference_wav_name
9597

9698
converted_waves_16k = input_wav_res
99+
start_event = torch.cuda.Event(enable_timing=True)
100+
end_event = torch.cuda.Event(enable_timing=True)
101+
torch.cuda.synchronize()
102+
start_event.record()
97103
S_alt = semantic_fn(converted_waves_16k.unsqueeze(0))
104+
end_event.record()
105+
torch.cuda.synchronize() # Wait for the events to be recorded!
106+
elapsed_time_ms = start_event.elapsed_time(end_event)
107+
print(f"Time taken for semantic_fn: {elapsed_time_ms}ms")
98108

99-
target_lengths = torch.LongTensor([(skip_head + return_length + skip_tail) / 50 * sr // hop_length]).to(S_alt.device)
100-
109+
S_alt = S_alt[:, ce_dit_difference * 50:]
110+
target_lengths = torch.LongTensor([(skip_head + return_length + skip_tail - ce_dit_difference * 50) / 50 * sr // hop_length]).to(S_alt.device)
101111
cond = model.length_regulator(
102-
S_alt, ylens=target_lengths, n_quantizers=3, f0=None
112+
S_alt, ylens=target_lengths , n_quantizers=3, f0=None
103113
)[0]
104114
cat_condition = torch.cat([prompt_condition, cond], dim=1)
105115
vc_target = model.cfm.inference(
@@ -420,7 +430,7 @@ def load(self):
420430
"sr_type": "sr_model",
421431
"block_time": 0.5,
422432
"crossfade_length": 0.04,
423-
"extra_time": 0.5,
433+
"extra_time": 2.5,
424434
"extra_time_right": 0.02,
425435
"diffusion_steps": 10,
426436
"inference_cfg_rate": 0.7,
@@ -594,7 +604,7 @@ def launcher(self):
594604
[
595605
sg.Text("Extra context time (left)"),
596606
sg.Slider(
597-
range=(0.5, 10.0),
607+
range=(2.5, 10.0),
598608
key="extra_time",
599609
resolution=0.1,
600610
orientation="h",

0 commit comments

Comments
 (0)