Run Local LLMs on Any Device. Open-source
Self-hosted, community-driven, local OpenAI compatible API
Port of Facebook's LLaMA model in C/C++
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
A RWKV management and startup tool, full automation, only 8MB
Replace OpenAI GPT with another LLM in your app
Phi-3.5 for Mac: Locally-run Vision and Language Models
A high-throughput and memory-efficient inference and serving engine
PyTorch library of curated Transformer models and their components
Tensor search for humans
Operating LLMs in production
Database system for building simpler and faster AI-powered application
Private Open AI on Kubernetes
Run 100B+ language models at home, BitTorrent-style
Visual Instruction Tuning: Large Language-and-Vision Assistant
State-of-the-art Parameter-Efficient Fine-Tuning
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
A high-performance ML model serving framework, offers dynamic batching
LLM training code for MosaicML foundation models
Implementation of "Tree of Thoughts
The unofficial python package that returns response of Google Bard
Framework that is dedicated to making neural data processing
Low-latency REST API for serving text-embeddings
Implementation of model parallel autoregressive transformers on GPUs