Skip to content

lihqi/OpenDataArena-Tool

 
 

Repository files navigation

OpenDataArena-Tool



stars forks open issues MIT License Documentation Status

English | 简体中文

What's New

Overview

OpenDataArena (ODA) is an open, transparent, and extensible platform for evaluating the value of post-training datasets, aiming to make every dataset measurable, comparable, and verifiable.

This repository includes the tools for ODA platform:

  • Data Scoring: Assess datasets through diverse metrics and methods, including model-based methods, llm-as-judge, and heuristic methods.
  • Model Training: Use LLaMA-Factory to supervised fine-tuning (SFT) the model on the datasets. We provide the SFT scripts for reproducible experiments on mainstream models and benchmarks.
  • Benchmark Evaluation: Use OpenCompass to evaluate the performance of the model on popular benchmarks from multiple domains (math, code, science, and general instruction). We also provide the evaluation scripts for the datasets in ODA.

Quick Start

First, clone the repository and its submodules:

git clone https://github.com/OpenDataArena/OpenDataArena-Tool.git --recursive
cd OpenDataArena-Tool

Then, you can start to use the tools in ODA:

  • To score your own dataset, please refer to Data Scoring for more details.
  • To train the models on the datasets in ODA, please refer to Model Training for more details.
  • To evaluate the models on the benchmarks in ODA, please refer to Benchmark Evaluation for more details.

Contributors

We thank to these outstanding researchers and developers for their contributions to OpenDataArena project. Welcome to collaborate and contribute to the project!

Xiaoyang Wang Qizhi Pei Mengzhang Cai Zinan Tang Yu Li Mengyuan Sun Honglin Lin Xin Gao

Lijun Wu Zhuoshi Pan Chenlin Ming Zhanping Zhong Conghui He

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find this project useful, please consider citing:

@misc{opendataarena_tool_2025,
  author       = {OpenDataArena},
  title        = {{OpenDataArena-Tool}},
  year         = {2025},
  url          = {https://github.com/OpenDataArena/OpenDataArena-Tool},
  note         = {GitHub repository},
  howpublished = {\url{https://github.com/OpenDataArena/OpenDataArena-Tool}},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.3%
  • Shell 0.7%