Skip to content

amphionspace/SA-Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

SA-Eval

arXiv

LLMs have recently shown remarkable ability to process not only text but also multimodal inputs such as speech and audio. However, most existing models primarily focus on analyzing input signals using text instructions, overlooking scenarios in which speech instructions and audio are mixed and serve as inputs to the model.

To evaluate the models under this scenario, we propose SA-Eval, which includes three tasks: audio event classification, audio captioning, and audio question answering. SA-Eval has diverse speech instruction with various speaking styles, encompassing two difficulty levels, easy and hard, to capture the range of real-world acoustic conditions.

We are preparing more updates to be released soon.

Citation

@article{ao2025solla,
  title   = {Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context},
  author  = {Junyi Ao and Dekun Chen and Xiaohai Tian and Wenjie Feng and Jun Zhang and Lu Lu and Yuxuan Wang and Haizhou Li and Zhizheng Wu},
  eprint={2503.15338},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published