Lists (10)
Sort Name ascending (A-Z)
Stars
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
My learning notes/codes for ML SYS.
MetaX-MACA / FlashMLA
Forked from deepseek-ai/FlashMLAFast and efficient attention method exploration and implementation.
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashMLA: Efficient MLA decoding kernels
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
FlashInfer: Kernel Library for LLM Serving
FlagGems is an operator library for large language models implemented in the Triton Language.
FlagScale is a large model toolkit based on open-sourced projects.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
SGLang is a fast serving framework for large language models and vision language models.
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
NVIDIA Linux open GPU kernel module source
PyTorch入门教程,在线阅读地址:https://datawhalechina.github.io/thorough-pytorch/
Vulkan-based implementation of D3D8, 9, 10 and 11 for Linux / Wine
we want to create a repo to illustrate usage of transformers in chinese
easy to read hlsl asm shader code. parse dxbc text and export hlsl like for read
DXIL conversion to SPIR-V for D3D12 translation libraries
Utility libraries for Vulkan developers
Tensors and Dynamic neural networks in Python with strong GPU acceleration
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.