MoE-CAP is a benchmarking method designed to evaluate sparse MoE systems by integrating Cost, Accuracy, and Performance across these three dimensions.
- MoE-CAP has been accepted to NeurIPS 2025 Dataset and Benchmark Track 🎉 See you in San Diego, US.
Python: >= 3.9
git clone https://github.com/sparse-generative-ai/MoE-CAP.git
cd MoE-CAP
pip install -e .Then you can import moe_cap directly.
- Launch our sglang custom server (e.g. H100)
python -m moe_cap.systems.sglang \
--model-path Qwen/Qwen3-235B-A22B-Thinking-2507 \
--port 30000 \
--expert-distribution-recorder-mode stat \
--tp-size 8- Run our benchmark
python -m moe_cap.runner.sglang_profile \
--config-file configs/gsm8k_qwen3_235b_a22b.yaml \
--output_dir outputs/The results will be stored under outputs/.
Thank you for your interest in contributing to the MoE-CAP project! We welcome contributions from everyone. Below you'll find guidance on how to set up your development environment, understand our architecture, and contribute effectively. If you have any questions or wish to discuss your contributions, please reach out to Yinsicheng Jiang, Yao Fu or Yeqi Huang via email at [email protected], [email protected] or [email protected].
We are looking for contributions in several key areas to enhance the MoE-CAP project:
-
General Bug Fixes/Reports: We welcome reports of any bugs found in the frontend interface or backend, as well as fixes for these issues.
-
Adding New Tasks (Benchmark Datasets): If you have ideas for new benchmark datasets that could be added, your contributions would be greatly appreciated.
-
Supporting New Inference Frameworks: Expanding our project to support new inference frameworks is crucial for our growth. If you can contribute in this area, please reach out.
-
Testing More Models: To make our leaderboard as comprehensive as possible, we need to test a wide range of models. Contributions in this area are highly valuable.
Documentation is currently of lower priority, but if you have thoughts or suggestions, please feel free to raise them.
Your contributions are crucial to the success and improvement of the MoE-CAP project. We look forward to collaborating with you.
- Implement auto-submission CI pipeline.
- Implement CAP HTTP Server.
@misc{jiang2025moecapbenchmarkingcostaccuracy,
title={MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems},
author={Yinsicheng Jiang and Yao Fu and Yeqi Huang and Ping Nie and Zhan Lu and Leyang Xue and Congjie He and Man-Kit Sit and Jilong Xue and Li Dong and Ziming Miao and Dayou Du and Tairan Xu and Kai Zou and Edoardo Ponti and Luo Mai},
year={2025},
eprint={2412.07067},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.07067},
}

