Skip to content
View gigit0000's full-sized avatar
  • Kim Baksa's Lab, South Korea
  • 13:13 (UTC +09:00)

Block or report gigit0000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,762 324 Updated Nov 28, 2025

Easy, Fast, and Scalable Multimodal AI

Python 75 5 Updated Nov 28, 2025

Nvidia Instruction Set Specification Generator

Python 299 16 Updated Jul 9, 2024

Cataloging released Triton kernels.

274 14 Updated Sep 9, 2025

Learning Deep Representations of Data Distributions

TeX 650 51 Updated Nov 29, 2025

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Python 151 7 Updated Oct 19, 2023

Python pdb for multiple processes

Python 66 8 Updated May 24, 2025

Large-scale LLM inference engine

C++ 1,599 176 Updated Nov 24, 2025

Memray is a memory profiler for Python

Python 14,620 432 Updated Nov 22, 2025
Python 7 Updated Jul 26, 2025

Triton Support in Compiler Explorer

TypeScript 5 Updated Aug 5, 2025

Run compilers interactively from your web browser and interact with the assembly

TypeScript 18,259 1,964 Updated Nov 27, 2025

This repo provides several classic attention variant implementation based on FlexAttention API.

Python 2 1 Updated May 18, 2025

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 134 16 Updated Nov 21, 2025

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 7,137 1,617 Updated Nov 23, 2025

Hacker News

HTML 13 5 Updated Nov 30, 2025

Distribute and run LLMs with a single file.

C++ 1 Updated Jul 23, 2024

Distribute and run LLMs with a single file.

C 23,436 1,242 Updated Nov 24, 2025

CUDA on non-NVIDIA GPUs

Rust 13,522 860 Updated Nov 29, 2025

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm

C 9,822 346 Updated Oct 25, 2025

A .NET MAUI app for displaying the top posts on Hacker News that demonstrates text sentiment analysis gathered using artificial intelligence

C# 281 40 Updated Nov 24, 2025

A curated list of awesome C frameworks, libraries, resources and other shiny things. Inspired by all the other awesome-... projects out there.

10,840 902 Updated Nov 7, 2025

Local AI voice assistant stack for Home Assistant (GPU-accelerated) with persistent memory, follow-up conversation, and Ollama model recommendations - settings designed for low VRAM systems.

216 18 Updated Jul 27, 2025

Debug Module for Embedded Systems

C 1 Updated May 3, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 1 Updated Jul 6, 2025

📝 A curated list of awesome Raspberry Pi tools, projects, images and resources

Shell 15,496 1,067 Updated Nov 10, 2025

Inference Llama 2 in one file of pure C & one file with CUDA

C 31 1 Updated Oct 14, 2023

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,893 1,642 Updated Nov 19, 2025

V-lang api wrapper for llm-inference chatllm.cpp

C 6 Updated Nov 20, 2024

The Modular Platform (includes MAX & Mojo)

Mojo 25,271 2,738 Updated Nov 29, 2025
Next