Skip to content
View kkiningh's full-sized avatar

Block or report kkiningh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs

Verilog 17 Updated Aug 26, 2024

Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs

HTML 688 101 Updated Nov 12, 2025

Veryl: A Modern Hardware Description Language

Rust 826 49 Updated Nov 16, 2025

Tensor Compute Primitives: Mid-level Intermediate Representation for Machine Learning Programs

MLIR 35 7 Updated Jan 30, 2025

Large Context Attention

Python 750 52 Updated Oct 13, 2025

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Python 16,135 568 Updated Nov 12, 2025

Experiment of using Tangent to autodiff triton

Python 79 2 Updated Jan 22, 2024

GPU programming related news and material links

1,789 105 Updated Sep 17, 2025

PJRT plugin for interfacing the IREE to Jax and TensorFlow.

C++ 5 Updated Feb 26, 2024

MLIR For Beginners tutorial

C++ 1,136 103 Updated Jul 18, 2025

MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.

C++ 135 32 Updated Sep 25, 2023

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,288 87 Updated Mar 27, 2025

Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator

Python 214 14 Updated Dec 10, 2023

commaVQ is a dataset of compressed driving video

Jupyter Notebook 336 60 Updated Oct 31, 2025

FauxPilot - an open-source alternative to GitHub Copilot server

Python 14,757 635 Updated Apr 9, 2024

Unified Executors

C++ 1,653 203 Updated Nov 13, 2025

GPUnet is a native GPU networking layer that provides a socket abstraction over Infiniband to GPU programs for NVIDIA GPUs.

C 117 21 Updated Jul 6, 2015

Concurrent Deferred Reference Counting

C++ 169 12 Updated Feb 18, 2024

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Python 45,249 6,515 Updated Nov 14, 2025

Authenticated multi-version database: sparse binary merkle tree with compact partial-tree proofs

C++ 319 16 Updated Feb 9, 2023

A Compiler for the Popr Language

C 254 12 Updated Jan 7, 2021

Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.

JavaScript 13,444 710 Updated Oct 30, 2024

Bazel build rules for GNU M4

Starlark 19 13 Updated Apr 23, 2025

Apple AMX Instruction Set

C 1,168 55 Updated Dec 26, 2024

100 Days of RTL

SystemVerilog 402 112 Updated Aug 15, 2024

Torch Frontend for IREE

Python 25 10 Updated Dec 21, 2023

A framework for Engineering Managers

8,451 590 Updated Nov 1, 2022

A translation validation framework for MLIR

C++ 89 13 Updated Mar 19, 2025

PDK for GlobalFoundries' 180nm MCU bulk process technology (GF180MCU).

Makefile 435 62 Updated May 31, 2023

OpenMMLab's next-generation platform for general 3D object detection.

Python 6,126 1,697 Updated Jul 10, 2024
Next