SyncTalk_2D

SyncTalk_2D is a 2D lip-sync video generation model based on SyncTalk and Ultralight-Digital-Human. It can generate lip-sync videos with high quality and low latency, and it can also be used for real-time lip-sync video generation.

Compared to the Ultralght-Digital-Human, we have improved the audio feature encoder and increased the resolution to 328 to accommodate higher-resolution input video. This version can realize high-definition, commercial-grade digital humans.

与Ultralght-Digital-Human相比，我们改进了音频特征编码器，并将分辨率提升至328以适应更高分辨率的输入视频。该版本可实现高清、商业级数字人。

Setting up

Set up the environment

conda create -n synctalk_2d python=3.10
conda activate synctalk_2d

# install dependencies
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge ffmpeg  #very important
pip install opencv-python transformers soundfile librosa onnxruntime-gpu configargparse
pip install numpy==1.23.5

Prepare your data

Record a 5-minute video with your head facing the camera and without significant movement. At the same time, ensure that the camera does not move and the background light remains unchanged during video recording.
Don't worry about FPS, the code will automatically convert the video to 25fps.
No second person's voice can appear in the recorded video, and a 5-second silent clip is left at the beginning and end of the video.
Don't wear clothes with overly obvious texture, it's better to wear single-color clothes.
The video should be recorded in a well-lit environment.
The audio should be clear and without background noise.

Train

put your video in the 'dataset/name/name.mp4'

example: dataset/May/May.mp4

run the process and training script

bash training_328.sh name gpu_id

example: bash training_328.sh May 0
Waiting for training to complete, approximately 5 hours
If OOM occurs, try reducing the size of batch_size

Inference

python inference_328.py --name data_name --audio_path path_to_audio.wav

example: python inference_328.py --name May --audio_path demo/talk_hb.wav
the result will be saved in the 'result' folder

Acknowledgements

This code is based on Ultralight-Digital-Human and SyncTalk. We thank the authors for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data_utils		data_utils
demo		demo
model/checkpoints		model/checkpoints
result		result
syncnet_ckpt		syncnet_ckpt
README.md		README.md
datasetsss.py		datasetsss.py
datasetsss_328.py		datasetsss_328.py
inference.py		inference.py
inference_328.py		inference_328.py
syncnet.py		syncnet.py
syncnet_328.py		syncnet_328.py
train.py		train.py
train_328.py		train_328.py
training.sh		training.sh
training_328.sh		training_328.sh
unet.py		unet.py
unet_328.py		unet_328.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SyncTalk_2D

Setting up

Prepare your data

Train

Inference

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

ZiqiaoPeng/SyncTalk_2D

Folders and files

Latest commit

History

Repository files navigation

SyncTalk_2D

Setting up

Prepare your data

Train

Inference

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages