SA-Eval

LLMs have recently shown remarkable ability to process not only text but also multimodal inputs such as speech and audio. However, most existing models primarily focus on analyzing input signals using text instructions, overlooking scenarios in which speech instructions and audio are mixed and serve as inputs to the model.

To evaluate the models under this scenario, we propose SA-Eval, which includes three tasks: audio event classification, audio captioning, and audio question answering. SA-Eval has diverse speech instruction with various speaking styles, encompassing two difficulty levels, easy and hard, to capture the range of real-world acoustic conditions.

We are preparing more updates to be released soon.

Citation

@article{ao2025solla,
  title   = {Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context},
  author  = {Junyi Ao and Dekun Chen and Xiaohai Tian and Wenjie Feng and Jun Zhang and Lu Lu and Yuxuan Wang and Haizhou Li and Zhizheng Wu},
  eprint={2503.15338},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SA-Eval

Citation

About

Uh oh!

Releases

Packages

amphionspace/SA-Eval

Folders and files

Latest commit

History

Repository files navigation

SA-Eval

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages