gigit0000

William Song gigit0000

Ready to dispatch!

13 followers · 40 following

Kim Baksa's Lab, South Korea
14:47 (UTC +09:00)

Achievements

qwen3.cu Public

Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.

cuda gguf qwen3

Cuda 20 MIT License Updated Nov 26, 2025
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python Apache License 2.0 Updated Nov 3, 2025
reporting-repo Public

Updated Sep 28, 2025
batch_invariant_ops Public
Forked from thinking-machines-lab/batch_invariant_ops

Python MIT License Updated Sep 10, 2025
qwen3.c Public

Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.

c transformer slm fp32 gguf qwen3

C 18 2 MIT License Updated Sep 1, 2025
ao Public
Forked from pytorch/ao

PyTorch native quantization and sparsity for training and inference

Python Other Updated Aug 25, 2025
luminal Public
Forked from luminal-ai/luminal

Deep learning at the speed of light.

Rust Apache License 2.0 Updated Aug 24, 2025
memray Public
Forked from bloomberg/memray

Memray is a memory profiler for Python

Python Apache License 2.0 Updated Aug 22, 2025
compiler-explorer Public
Forked from compiler-explorer/compiler-explorer

Run compilers interactively from your web browser and interact with the assembly

TypeScript BSD 2-Clause "Simplified" License Updated Aug 18, 2025
yams Public
Forked from trvon/yams

Content addressable storage with excellent search

C++ Apache License 2.0 Updated Aug 16, 2025
Flash-RL Public
Forked from yaof20/Flash-RL

Implementation for FP8/INT8 Rollout for RL training without performence drop.

Python MIT License Updated Aug 14, 2025
speculators Public
Forked from vllm-project/speculators

Python Apache License 2.0 Updated Aug 14, 2025
scalene Public
Forked from plasma-umass/scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Python Apache License 2.0 Updated Aug 9, 2025
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

C++ Apache License 2.0 Updated Aug 9, 2025
libzmq Public
Forked from zeromq/libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1

C++ Mozilla Public License 2.0 Updated Aug 8, 2025
LazyLLM Public
Forked from LazyAGI/LazyLLM

Easiest and laziest way for building multi-agent LLMs applications.

Python Apache License 2.0 Updated Aug 6, 2025
KittenTTS Public
Forked from KittenML/KittenTTS

State-of-the-art TTS model under 25MB 😻

Python Apache License 2.0 Updated Aug 5, 2025
ZLUDA Public
Forked from vosen/ZLUDA

CUDA on non-NVIDIA GPUs

Rust Apache License 2.0 Updated Jul 29, 2025
llm-d Public
Forked from llm-d/llm-d

llm-d is a Kubernetes-native high-performance distributed LLM inference framework

Makefile Apache License 2.0 Updated Jul 29, 2025
AITemplate Public
Forked from facebookincubator/AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python Apache License 2.0 Updated Jul 24, 2025
modular Public
Forked from modular/modular

The Modular Platform (includes MAX & Mojo)

Mojo Other Updated Jul 21, 2025
gigit0000 Public

Updated Jul 18, 2025
dia Public
Forked from nari-labs/dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 1 Apache License 2.0 Updated Jul 6, 2025
llamafile Public
Forked from mozilla-ai/llamafile

Distribute and run LLMs with a single file.

C++ Other Updated Jun 30, 2025
fast3r Public
Forked from facebookresearch/fast3r

[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Python Other Updated May 29, 2025
forked-pdb Public
Forked from Lightning-AI/forked-pdb

Python pdb for multiple processes

Python Apache License 2.0 Updated May 24, 2025
AutoAWQ Public
Forked from casper-hansen/AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python MIT License Updated May 3, 2025
vllm-triton-backend Public
Forked from foundation-model-stack/vllm-triton-backend

A Triton-only attention backend for vLLM

Python Apache License 2.0 Updated Apr 15, 2025
guidance Public
Forked from guidance-ai/guidance

A guidance language for controlling large language models.

Jupyter Notebook MIT License Updated Apr 10, 2025
NVTX Public
Forked from NVIDIA/NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ Other Updated Apr 3, 2025

William Song gigit0000

Achievements

Achievements

qwen3.cu Public

Uh oh!

vllm Public

Uh oh!

reporting-repo Public

Uh oh!

batch_invariant_ops Public

Uh oh!

qwen3.c Public

Uh oh!

ao Public

Uh oh!

luminal Public

Uh oh!

memray Public

Uh oh!

compiler-explorer Public

Uh oh!

yams Public

Uh oh!

Flash-RL Public

Uh oh!

speculators Public

Uh oh!

scalene Public

Uh oh!

flashinfer Public

Uh oh!

libzmq Public

Uh oh!

LazyLLM Public

Uh oh!

KittenTTS Public

Uh oh!

ZLUDA Public

Uh oh!

llm-d Public

Uh oh!

AITemplate Public

Uh oh!

modular Public

Uh oh!

gigit0000 Public

Uh oh!

dia Public

Uh oh!

llamafile Public

Uh oh!

fast3r Public

Uh oh!

forked-pdb Public

Uh oh!

AutoAWQ Public

Uh oh!

vllm-triton-backend Public

Uh oh!

guidance Public

Uh oh!

NVTX Public

Uh oh!