Support for Qwen2-Audio and Qwen2.5-Omni #4088

zcs-hlt · 2025-05-06T03:21:46Z

When we tried to reproduce the results of Qwen2.5-Omni on MMAU, some problems occurred. The usage environment is as follows:

1.ms_swift == 3.4.0.dev0
2.transformers == 4.52.0.dev0

The infer script is the same as the official sample file (examples/train/multimodal/omni/infer.sh), but the ACC on MMAU-test-mini is much lower than the results reported in the paper, so we try to verify our evaluation. We also tested it on Qwen2-Audio using the same evaluation script, and found that the metric did not gap with Omni, and it was also far lower than the results reported in Omni's paper. However, for Qwen2-Audio, which we have done the same evaluation script before, the ACC is close to the results reported in Omni's paper, but the environment is different, and only the version of ms_swift is changed, just as follows:

1.ms_swift == 3.3.0.dev0
2.transformers == 4.52.0.dev0

We would like to ask if there are any details on how to reproduce the results of Qwen2.5-Omni and Qwen2-Audio on MMAU-test-mini reported in the Omni paper on ms_swift == 3.4.0.dev0 or later. （We have also tested the two versions of ms_swift mentioned above on different transformers, eliminating the problem of transformers version.）
At the same time, we also proposed a similar issue to the Omni team. They only emphasized thinker_do_sample=False. We set temperature=0 in the infer script to achieve this, but the conclusion remains unchanged.
Looking forward to your reply. Thank you very much.

Jintao-Huang · 2025-05-06T05:37:44Z

https://github.com/modelscope/ms-swift/blob/main/tests/test_align/test_template/test_video.py#L133

No error was found.

zcs-hlt closed this as completed May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Qwen2-Audio and Qwen2.5-Omni #4088

Support for Qwen2-Audio and Qwen2.5-Omni #4088

zcs-hlt commented May 6, 2025

Jintao-Huang commented May 6, 2025

Support for Qwen2-Audio and Qwen2.5-Omni #4088

Support for Qwen2-Audio and Qwen2.5-Omni #4088

Comments

zcs-hlt commented May 6, 2025

Jintao-Huang commented May 6, 2025