- San Francisco, CA
- https://www.kevinkiningham.com/about
Stars
- All languages
- AGS Script
- Assembly
- C
- C#
- C++
- CMake
- CSS
- Cuda
- D
- Dart
- Dockerfile
- Go
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Logos
- MLIR
- Makefile
- OCaml
- PHP
- Perl
- Perl 6
- PostScript
- Protocol Buffer
- Python
- RobotFramework
- Rocq Prover
- Ruby
- Rust
- SCSS
- SMT
- Sass
- Scala
- Shell
- Starlark
- Swift
- SystemVerilog
- Tcl
- TeX
- TypeScript
- VHDL
- Verilog
- Vim Script
LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
Tensor Compute Primitives: Mid-level Intermediate Representation for Machine Learning Programs
A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
Experiment of using Tangent to autodiff triton
GPU programming related news and material links
PJRT plugin for interfacing the IREE to Jax and TensorFlow.
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
commaVQ is a dataset of compressed driving video
FauxPilot - an open-source alternative to GitHub Copilot server
GPUnet is a native GPU networking layer that provides a socket abstraction over Infiniband to GPU programs for NVIDIA GPUs.
Concurrent Deferred Reference Counting
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Authenticated multi-version database: sparse binary merkle tree with compact partial-tree proofs
Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
PDK for GlobalFoundries' 180nm MCU bulk process technology (GF180MCU).
OpenMMLab's next-generation platform for general 3D object detection.