-
qwen3.cu Public
Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedNov 3, 2025 -
-
batch_invariant_ops Public
Forked from thinking-machines-lab/batch_invariant_opsPython MIT License UpdatedSep 10, 2025 -
qwen3.c Public
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.
-
ao Public
Forked from pytorch/aoPyTorch native quantization and sparsity for training and inference
Python Other UpdatedAug 25, 2025 -
luminal Public
Forked from luminal-ai/luminalDeep learning at the speed of light.
Rust Apache License 2.0 UpdatedAug 24, 2025 -
memray Public
Forked from bloomberg/memrayMemray is a memory profiler for Python
Python Apache License 2.0 UpdatedAug 22, 2025 -
compiler-explorer Public
Forked from compiler-explorer/compiler-explorerRun compilers interactively from your web browser and interact with the assembly
TypeScript BSD 2-Clause "Simplified" License UpdatedAug 18, 2025 -
yams Public
Forked from trvon/yamsContent addressable storage with excellent search
C++ Apache License 2.0 UpdatedAug 16, 2025 -
Flash-RL Public
Forked from yaof20/Flash-RLImplementation for FP8/INT8 Rollout for RL training without performence drop.
Python MIT License UpdatedAug 14, 2025 -
speculators Public
Forked from vllm-project/speculatorsPython Apache License 2.0 UpdatedAug 14, 2025 -
scalene Public
Forked from plasma-umass/scaleneScalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Python Apache License 2.0 UpdatedAug 9, 2025 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
C++ Apache License 2.0 UpdatedAug 9, 2025 -
libzmq Public
Forked from zeromq/libzmqZeroMQ core engine in C++, implements ZMTP/3.1
C++ Mozilla Public License 2.0 UpdatedAug 8, 2025 -
LazyLLM Public
Forked from LazyAGI/LazyLLMEasiest and laziest way for building multi-agent LLMs applications.
Python Apache License 2.0 UpdatedAug 6, 2025 -
KittenTTS Public
Forked from KittenML/KittenTTSState-of-the-art TTS model under 25MB 😻
Python Apache License 2.0 UpdatedAug 5, 2025 -
ZLUDA Public
Forked from vosen/ZLUDACUDA on non-NVIDIA GPUs
Rust Apache License 2.0 UpdatedJul 29, 2025 -
llm-d Public
Forked from llm-d/llm-dllm-d is a Kubernetes-native high-performance distributed LLM inference framework
Makefile Apache License 2.0 UpdatedJul 29, 2025 -
AITemplate Public
Forked from facebookincubator/AITemplateAITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Python Apache License 2.0 UpdatedJul 24, 2025 -
modular Public
Forked from modular/modularThe Modular Platform (includes MAX & Mojo)
Mojo Other UpdatedJul 21, 2025 -
-
dia Public
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
-
llamafile Public
Forked from mozilla-ai/llamafileDistribute and run LLMs with a single file.
C++ Other UpdatedJun 30, 2025 -
fast3r Public
Forked from facebookresearch/fast3r[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Python Other UpdatedMay 29, 2025 -
forked-pdb Public
Forked from Lightning-AI/forked-pdbPython pdb for multiple processes
Python Apache License 2.0 UpdatedMay 24, 2025 -
AutoAWQ Public
Forked from casper-hansen/AutoAWQAutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Python MIT License UpdatedMay 3, 2025 -
vllm-triton-backend Public
Forked from foundation-model-stack/vllm-triton-backendA Triton-only attention backend for vLLM
Python Apache License 2.0 UpdatedApr 15, 2025 -
guidance Public
Forked from guidance-ai/guidanceA guidance language for controlling large language models.
Jupyter Notebook MIT License UpdatedApr 10, 2025 -
NVTX Public
Forked from NVIDIA/NVTXThe NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
C++ Other UpdatedApr 3, 2025


