English | 简体中文
- 2025-07-26: We release the OpenDataArena platform and the OpenDataArena-Tool repository.
OpenDataArena (ODA) is an open, transparent, and extensible platform for evaluating the value of post-training datasets, aiming to make every dataset measurable, comparable, and verifiable.
This repository includes the tools for ODA platform:
- Data Scoring: Assess datasets through diverse metrics and methods, including model-based methods, llm-as-judge, and heuristic methods.
- Model Training: Use LLaMA-Factory to supervised fine-tuning (SFT) the model on the datasets. We provide the SFT scripts for reproducible experiments on mainstream models and benchmarks.
- Benchmark Evaluation: Use OpenCompass to evaluate the performance of the model on popular benchmarks from multiple domains (math, code, science, and general instruction). We also provide the evaluation scripts for the datasets in ODA.
First, clone the repository and its submodules:
git clone https://github.com/OpenDataArena/OpenDataArena-Tool.git --recursive
cd OpenDataArena-ToolThen, you can start to use the tools in ODA:
- To score your own dataset, please refer to Data Scoring for more details.
- To train the models on the datasets in ODA, please refer to Model Training for more details.
- To evaluate the models on the benchmarks in ODA, please refer to Benchmark Evaluation for more details.
We thank to these outstanding researchers and developers for their contributions to OpenDataArena project. Welcome to collaborate and contribute to the project!
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project useful, please consider citing:
@misc{opendataarena_tool_2025,
author = {OpenDataArena},
title = {{OpenDataArena-Tool}},
year = {2025},
url = {https://github.com/OpenDataArena/OpenDataArena-Tool},
note = {GitHub repository},
howpublished = {\url{https://github.com/OpenDataArena/OpenDataArena-Tool}},
}