Skip to content
View wzpsgit's full-sized avatar

Block or report wzpsgit

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,387 111 Updated Jul 10, 2025

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,963 150 Updated Jul 8, 2025

My learning notes/codes for ML SYS.

Python 2,841 177 Updated Jul 9, 2025

Fast and efficient attention method exploration and implementation.

C++ 21 4 Updated Mar 25, 2025

The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices

Python 3,651 805 Updated Mar 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,265 843 Updated Jul 10, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,858 280 Updated May 15, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,641 876 Updated Apr 29, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,531 1,035 Updated Jul 1, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,332 372 Updated Jul 9, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 620 109 Updated Jul 10, 2025

FlagScale is a large model toolkit based on open-sourced projects.

Python 322 85 Updated Jul 10, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,771 3,330 Updated Jul 9, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 15,869 2,306 Updated Jul 10, 2025

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,385 3,353 Updated Aug 12, 2024

NVIDIA Linux open GPU kernel module source

C 15,967 1,430 Updated Jul 7, 2025

PyTorch入门教程,在线阅读地址:https://datawhalechina.github.io/thorough-pytorch/

Jupyter Notebook 3,120 476 Updated Jul 7, 2025

Vulkan-based implementation of D3D8, 9, 10 and 11 for Linux / Wine

C++ 15,090 974 Updated Jul 5, 2025

we want to create a repo to illustrate usage of transformers in chinese

Shell 2,926 486 Updated Aug 18, 2024

Vulkan Profiles Tools

C++ 137 49 Updated Jul 9, 2025

Tools to aid in Vulkan development

C++ 717 188 Updated Jul 9, 2025

easy to read hlsl asm shader code. parse dxbc text and export hlsl like for read

Lua 315 71 Updated Jul 27, 2024

DXIL conversion to SPIR-V for D3D12 translation libraries

C++ 202 39 Updated Jul 9, 2025

Utility libraries for Vulkan developers

C++ 75 33 Updated Jul 8, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 91,415 24,637 Updated Jul 10, 2025

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 57,072 6,857 Updated Jun 30, 2025

AMD Open Source Driver For Vulkan

1,907 166 Updated Apr 30, 2025

Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.

C++ 15,885 1,999 Updated Jun 28, 2025

Neural Network in Dx12/HLSL Shaders

C++ 100 6 Updated May 13, 2025
Next