-
Chinese Academic of Sciences
- Beijing, China
-
01:07
(UTC -12:00) - https://dongyuxu77.github.io/
Highlights
- Pro
Stars
An open collection of methodologies to help with successful training of large language models.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
verl: Volcano Engine Reinforcement Learning for LLMs
Efficient, Low-Resource, Distributed transformer implementation based on BMTrain
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
A flexible framework powered by ComfyUI for generating personalized Nobel Prize images.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Benchmarking Deep Learning operations on different hardware
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
A machine learning compiler for GPUs, CPUs, and ML accelerators
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step