Skip to content
Change the repository type filter

All

    Repositories list

    • "Large Language Models" Course (COMP4901B) offered in HKUST
      Python
      9901Updated Nov 23, 2025Nov 23, 2025
    • The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
      Python
      914320Updated Nov 22, 2025Nov 22, 2025
    • Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
      Python
      12010Updated Oct 8, 2025Oct 8, 2025
    • From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
      Python
      12300Updated Oct 7, 2025Oct 7, 2025
    • The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
      Python
      18700Updated Sep 29, 2025Sep 29, 2025
    • The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
      Python
      01500Updated Sep 3, 2025Sep 3, 2025
    • Simple RL training for reasoning
      Python
      2813.8k301Updated Aug 3, 2025Aug 3, 2025
    • ceval

      Public
      Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
      Python
      831.8k60Updated Jul 27, 2025Jul 27, 2025
    • mstar

      Public
      [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
      36910Updated Jul 13, 2025Jul 13, 2025
    • Laser

      Public
      Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
      Python
      46030Updated May 22, 2025May 22, 2025
    • B-STaR

      Public
      B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
      Python
      118600Updated May 21, 2025May 21, 2025
    • CodeIO

      Public
      [ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
      Python
      3256101Updated May 6, 2025May 6, 2025
    • GUIMid

      Public
      02110Updated May 3, 2025May 3, 2025
    • The official repo of "On the Perception Bottleneck of VLMs for Chart Understanding"
      Jupyter Notebook
      0800Updated Apr 12, 2025Apr 12, 2025
    • PreSelect

      Public
      [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches
      Python
      85700Updated Mar 4, 2025Mar 4, 2025
    • dart-math

      Public
      [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
      Jupyter Notebook
      711630Updated Dec 10, 2024Dec 10, 2024
    • deita

      Public
      Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
      Python
      3257670Updated Dec 9, 2024Dec 9, 2024
    • On the Universal Truthfulness Hyperplane Inside LLMs (EMNLP 2024)
      Python
      2600Updated Oct 3, 2024Oct 3, 2024
    • Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
      Python
      614300Updated Sep 20, 2024Sep 20, 2024
    • An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
      SAS
      37366115Updated May 20, 2024May 20, 2024
    • In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
      Python
      96231Updated Mar 30, 2024Mar 30, 2024
    • JavaScript
      0000Updated Jan 25, 2024Jan 25, 2024
    • felm

      Public
      Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
      Python
      16130Updated Dec 25, 2023Dec 25, 2023
    • [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
      Python
      106141Updated Nov 26, 2023Nov 26, 2023
    • Python
      1700Updated Oct 3, 2023Oct 3, 2023
    • SynCSE

      Public
      This is the official implementation of the paper: "Contrastive Learning of Sentence Embeddings from Scratch"
      Python
      73910Updated Jun 9, 2023Jun 9, 2023