Stars
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Nvidia Instruction Set Specification Generator
Learning Deep Representations of Data Distributions
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
Large-scale LLM inference engine
Triton Support in Compiler Explorer
Run compilers interactively from your web browser and interact with the assembly
This repo provides several classic attention variant implementation based on FlexAttention API.
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
nasa03 / llamafile
Forked from mozilla-ai/llamafileDistribute and run LLMs with a single file.
Distribute and run LLMs with a single file.
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
A .NET MAUI app for displaying the top posts on Hacker News that demonstrates text sentiment analysis gathered using artificial intelligence
A curated list of awesome C frameworks, libraries, resources and other shiny things. Inspired by all the other awesome-... projects out there.
Local AI voice assistant stack for Home Assistant (GPU-accelerated) with persistent memory, follow-up conversation, and Ollama model recommendations - settings designed for low VRAM systems.
gigit0000 / dia
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
📝 A curated list of awesome Raspberry Pi tools, projects, images and resources
rogerallen / llama2.cu
Forked from karpathy/llama2.cInference Llama 2 in one file of pure C & one file with CUDA
A TTS model capable of generating ultra-realistic dialogue in one pass.


