Skip to content

mohammad-abdollahi/survey_llm_benchmark_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Surveying the Benchmarking Landscape of Large Language Models in Code Intelligence

Welcome to the official repository for our survey paper:

“Surveying the Benchmarking Landscape of Large Language Models in Code Intelligence”

📌 Overview

With the rapid evolution of Large Language Models (LLMs) such as GPT-2, GPT-3, and their successors, there has been a transformative shift in the field of code intelligence, enabling significant advances in tasks like code generation, program repair, software testing, and debugging.

To ensure these models are evaluated rigorously and meaningfully, benchmarking plays a crucial role.

In this work, we systematically review:

  • 142 research papers
  • 156 unique benchmark datasets
  • 32 different code-related tasks

We analyze each dataset across four key dimensions:

  1. General landscape and coverage
  2. Dataset construction and quality assurance
  3. Evaluation protocols
  4. Limitations and gaps

🔍 Key Findings

  • Python is the most dominant language (used in 77% of datasets)
  • GitHub is the primary data source (46% usage)
  • Most benchmarks focus on code generation (86 datasets)
  • Benchmark creation has notably accelerated in the past 3 years
  • Gaps exist in terms of bias, dataset evolution, and standardized evaluation

📄 Paper Access

You can read the full survey here: 📖 [https://hal.science/view/index/docid/5183398]

📚 Citation

If you find this work useful in your research, please consider citing it:

@article{abdollahi:hal-05183398,
  TITLE = {{Surveying the Benchmarking Landscape of Large Language Models in Code Intelligence}},
  AUTHOR = {Abdollahi, Mohammad and Zhang, Ruixin and Shiri Harzevili, Nima and Shin, Jiho and Wang, Song and Hemmati, Hadi},
  URL = {https://hal.science/hal-05183398},
  NOTE = {37 pages + references},
  YEAR = {2025},
  MONTH = Jul,
  KEYWORDS = {Large language Models LLMs ; Benchmark ; Code Intelligence ; Software Engineering},
  PDF = {https://hal.science/hal-05183398v1/file/main.pdf},
  HAL_ID = {hal-05183398},
  HAL_VERSION = {v1},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published