Skip to content

Commit 8e5e124

Browse files
committed
Update README and minor fixes
1 parent db52e7d commit 8e5e124

File tree

3 files changed

+23
-11
lines changed

3 files changed

+23
-11
lines changed

README.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,17 @@
88
[Jiachun Pan](https://scholar.google.com/citations?user=nrOvfb4AAAAJ),
99
[Zhenxiong Tan](https://scholar.google.com/citations?user=HP9Be6UAAAAJ),
1010
[Jiahao Lu](https://scholar.google.com/citations?user=h7rbA-sAAAAJ),
11-
[Jiahao Lu](https://scholar.google.com/citations?user=h7rbA-sAAAAJ),
1211
[Chuanxin Tang](https://scholar.google.com/citations?user=3ZC8B7MAAAAJ),
1312
[Bo An](https://personal.ntu.edu.sg/boan/index.html),
1413
[Shuicheng Yan](https://scholar.google.com/citations?user=DNuiPHwAAAAJ)
1514
<br>
16-
_[Project Page](https://memoavatar.github.io) | [arXiv]() |
17-
[Model](https://huggingface.co/memoavatar/memo) |
18-
[Data](https://huggingface.co/memoavatar/memo-data)_
15+
_[Project Page](https://memoavatar.github.io) | [arXiv](https://arxiv.org/abs/2412.04448) | [Model](https://huggingface.co/memoavatar/memo)_
1916

2017
This repository contains the example inference script for the MEMO-preview model. The gif demo below is compressed. See our [project page](https://memoavatar.github.io) for full videos.
2118

22-
![](assets/demo.gif)
23-
19+
<div style="width: 100%; text-align: center;">
20+
<img src="assets/demo.gif" alt="Demo GIF" style="width: 100%; height: auto;">
21+
</div>
2422

2523
## Installation
2624

@@ -45,7 +43,21 @@ For example:
4543
python inference.py --config configs/inference.yaml --input_image assets/examples/dicaprio.jpg --input_audio assets/examples/speech.wav --output_dir outputs
4644
```
4745

48-
> We tested the code on both H100 and RTX 4090 GPUs with CUDA 12. The inference time is around 1s per frame on H100 and 2s per frame on RTX 4090.
46+
> We tested the code on H100 and RTX 4090 GPUs using CUDA 12. Under the default settings (fps=30, inference_steps=20), the inference time is around 1 second per frame on H100 and 2 seconds per frame on RTX 4090. We welcome community contributions to improve the inference speed or interfaces like ComfyUI.
47+
48+
## Acknowledgement
49+
50+
Our work is made possible thanks to high-quality open-source talking video datasets (including [HDTF](https://github.com/MRzzm/HDTF), [VFHQ](https://liangbinxie.github.io/projects/vfhq), [CelebV-HQ](https://celebv-hq.github.io), [MultiTalk](https://multi-talk.github.io), and [MEAD](https://wywu.github.io/projects/MEAD/MEAD.html)) and some pioneering works (such as [EMO](https://humanaigc.github.io/emote-portrait-alive) and [Hallo](https://github.com/fudan-generative-vision/hallo)).
51+
52+
## Ethics Statement
53+
54+
We acknowledge the potential of AI in generating talking videos, with applications spanning education, virtual assistants, and entertainment. However, we are equally aware of the ethical, legal, and societal challenges that misuse of this technology could pose.
55+
56+
To reduce potential risks, we have only open-sourced a preview model for research purposes. Demos on our website use publicly available materials. We welcome copyright concerns—please contact us if needed, and we will address issues promptly. Users are required to ensure that their actions align with legal regulations, cultural norms, and ethical standards.
57+
58+
It is strictly prohibited to use the model for creating malicious, misleading, defamatory, or privacy-infringing content, such as deepfake videos for political misinformation, impersonation, harassment, or fraud. We strongly encourage users to review generated content carefully, ensuring it meets ethical guidelines and respects the rights of all parties involved. Users must also ensure that their inputs (e.g., audio and reference images) and outputs are used with proper authorization. Unauthorized use of third-party intellectual property is strictly forbidden.
59+
60+
While users may claim ownership of content generated by the model, they must ensure compliance with copyright laws, particularly when involving public figures' likeness, voice, or other aspects protected under personality rights.
4961

5062
## Citation
5163

@@ -55,7 +67,7 @@ If you find our work useful, please use the following citation:
5567
@article{zheng2024memo,
5668
title={MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation},
5769
author={Longtao Zheng and Yifan Zhang and Hanzhong Guo and Jiachun Pan and Zhenxiong Tan and Jiahao Lu and Chuanxin Tang and Bo An and Shuicheng Yan},
58-
journal={arXiv preprint arXiv:2411.xxxxx},
70+
journal={arXiv preprint arXiv:2412.04448},
5971
year={2024}
6072
}
6173
```

configs/inference.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ cfg_scale: 3.5
88
weight_dtype: bf16
99
enable_xformers_memory_efficient_attention: true
1010

11-
# model_name_or_path: memoavatar/memo
12-
model_name_or_path: checkpoints
11+
model_name_or_path: memoavatar/memo
12+
# model_name_or_path: checkpoints
1313
vae: stabilityai/sd-vae-ft-mse
1414
wav2vec: facebook/wav2vec2-base-960h
1515
emotion2vec: iic/emotion2vec_plus_large

memo/utils/vision_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ def preprocess_image(face_analysis_model: str, image_path: str, image_size: int
5757
# Define the image transformation
5858
transform = transforms.Compose(
5959
[
60-
transforms.Resize(image_size),
60+
transforms.Resize((image_size, image_size)),
6161
transforms.ToTensor(),
6262
transforms.Normalize([0.5], [0.5]),
6363
]

0 commit comments

Comments
 (0)