A framework for detecting, highlighting and correcting grammatical errors on natural language text. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Python 1,562 175 Updated Feb 15, 2023

Zjh-819 / LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

3,310 222 Updated Nov 28, 2023

mlfoundations / dclm

DataComp for Language Models

HTML 1,390 129 Updated Sep 9, 2025

facebookresearch / cc_net

Tools to download and cleanup Common Crawl data

Python 1,033 152 Updated Apr 25, 2023

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,730 272 Updated Jul 18, 2025

booydar / babilong

BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.

Jupyter Notebook 217 21 Updated Sep 2, 2025

hyp1231 / awesome-llm-powered-agent

Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...

2,161 177 Updated Apr 30, 2025

amazon-science / RAGChecker

RAGChecker: A Fine-grained Framework For Diagnosing RAG

Python 1,019 86 Updated Dec 13, 2024

EdinburghNLP / awesome-hallucination-detection

List of papers on hallucination detection in LLMs.

990 77 Updated Nov 14, 2025

NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,371 114 Updated Nov 13, 2025

dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,693 291 Updated Aug 14, 2024

THUDM / LongAlign

[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs

Python 256 21 Updated Dec 16, 2024

jshuadvd / LongRoPE

Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper

Python 152 14 Updated Jul 20, 2024

openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 16,624 1,308 Updated Oct 6, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,289 625 Updated Nov 21, 2025

bigai-nlco / LooGLE

ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models

Python 191 5 Updated Oct 8, 2024

FranxYao / Long-Context-Data-Engineering

Implementation of paper Data Engineering for Scaling Language Models to 128K Context

Python 478 30 Updated Mar 19, 2024

OpenBMB / InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 356 29 Updated Sep 25, 2024

openai / transformer-debugger

Python 4,110 241 Updated Jun 4, 2024

togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,856 366 Updated Dec 7, 2024

Strivin0311 / long-llms-learning

A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks

Jupyter Notebook 269 11 Updated Jul 30, 2024

allenai / open-instruct

AllenAI's post-training codebase

Python 3,329 459 Updated Nov 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pei (Patrick) Chen brickee

Achievements

Achievements

Block or report brickee

Stars

arcee-ai / mergekit

youngyangyang04 / leetcode-master

OpenBMB / ToolBench

hiyouga / LLaMA-Factory

HqWu-HITCS / Awesome-Personalized-LLM

hijkzzz / Awesome-LLM-Strawberry

srush / awesome-o1

Infini-AI-Lab / MagicPIG

PrithivirajDamodaran / Gramformer