Theheavens

Follow

💭

I may be slow to respond.

Tianyu Zhao Theheavens

💭

I may be slow to respond.

Follow

An Engineer

23 followers · 19 following

@antgroup
Hangzhou

Achievements

Achievements

Organizations

Lists (1)

Sort

🔮 Future ideas

Stars

gydpku / Data_Synthesis_RL

Python 63 8 Updated May 26, 2025

QwenLM / Self-Lengthen

Python 88 7 Updated Nov 6, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,554 2,215 Updated Jun 30, 2025

akoksal / LongForm

Reverse Instructions to generate instruction tuning data with corpus examples

213 8 Updated Mar 5, 2024

mozhu621 / LongGenBench

Jupyter Notebook 21 3 Updated Feb 6, 2025

jianzhnie / awesome-instruction-datasets

A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。

677 37 Updated Apr 7, 2024

kanishkg / cognitive-behaviors

Python 192 11 Updated Mar 26, 2025

xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding

Python 3,389 326 Updated Nov 13, 2024

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,431 191 Updated Jun 25, 2025

SWE-Gym / SWE-Gym

Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]

Jupyter Notebook 490 30 Updated May 8, 2025

Eclipsess / Awesome-Efficient-Reasoning-LLMs

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

487 15 Updated Jun 16, 2025

hemingkx / Awesome-Efficient-Reasoning

Paper list for Efficient Reasoning.

523 19 Updated Jun 23, 2025

mlabonne / llm-datasets

Curated list of datasets and tools for post-training.

3,202 270 Updated Jan 29, 2025

mbzuai-oryx / Awesome-LLM-Post-training

Awesome Reasoning LLM Tutorial/Survey/Guide

Python 1,802 128 Updated Jun 16, 2025

reasoning-survey / Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

600 56 Updated Jun 16, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 53,160 6,513 Updated Jun 29, 2025

trishullab / PutnamBench

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.

Lean 136 18 Updated Jun 28, 2025

BatsResearch / bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Python 780 49 Updated Feb 28, 2025

facebookresearch / SemDeDup

Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).

Python 136 14 Updated Oct 1, 2023

wasiahmad / Awesome-LLM-Synthetic-Data

A reading list on LLM based Synthetic Data Generation 🔥

1,315 76 Updated Jun 5, 2025

ByteDance-Seed / Seed-Thinking-v1.5

792 16 Updated Jun 9, 2025

openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 2,802 396 Updated Jan 17, 2025

TsinghuaC3I / Awesome-RL-Reasoning-Recipes

Awesome RL Reasoning Recipes ("Triple R")

715 42 Updated Jun 16, 2025

bespokelabsai / curator

Synthetic data curation for post-training and structured data extraction

Python 1,419 111 Updated Jun 25, 2025

princeton-nlp / intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898

Python 221 47 Updated May 5, 2024

TIGER-AI-Lab / MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]

Python 256 40 Updated Feb 28, 2025

mlfoundations / evalchemy

Automatic evals for LLMs

HTML 451 55 Updated Jun 27, 2025

GAIR-NLP / LIMO

LIMO: Less is More for Reasoning

Python 965 50 Updated Apr 6, 2025

modelscope / evalscope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Python 1,225 131 Updated Jun 26, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,613 419 Updated Jun 29, 2025