OpenCompass v0.5.1 Release Notes
π Highlights
β¨ A New Method to quickly Integrate and Evaluate Your Datasets: Added a fast dataset integration and evaluation method based on ChatML format, simplifying the previously complex dataset integration process.
β¨ New Datasets: Integrated new benchmarks including SeedBench and BeyondAIME.
β¨ Infrastructure & Enhancements: Fixed several bugs and updated CI.
π New Features
π§ Introduced a new approach for dataset integration and evaluation based on ChatML Template with evaluation examples (#2277).
π§ Added SeedBench dataset (#2020).
π§ Added BeyondAIME dataset (#2192).
π Bug Fixes
π§ Fixed Module Registers(#2262, #2266)
π§ Fixed duplicate engine config update in TurboMindModelwithChatTemplate (#2276)
π§ Fixed torchrun to avoid unexpected PATH environment (#2269)
π§ Fixed the None return value case in cascade evaluator (#2211)
β Enhancements and Refactors
β Infrastructure Refactors:
- Update rjob.py and subjective_eval.py (#2263)
β CI/CD Improvements:
π Welcome New Contributors
A warm welcome and special thanks to our newest contributors who made this release possible:
- @ChenZiHong-Gavin for their first contribution in (#2020)
- @jibuzixin for their first contribution in (#2211)
- @JYLiAo32 for their first contribution in (#2276)
- @ShoupingShan for their first contribution in (#2262)
Full Changelog: 0.5.0...0.5.1.post1
Thank you for using OpenCompass! These updates empower deeper insights and more reliable evaluations. Keep exploring, and stay tuned for future innovations! π