Ready-to-use OCR with 80+ supported languages
A high-throughput and memory-efficient inference and serving engine
Optimizing inference proxy for LLMs
Library for OCR-related tasks powered by Deep Learning
Bring the notion of Model-as-a-Service to life
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Visual Instruction Tuning: Large Language-and-Vision Assistant
FlashInfer: Kernel Library for LLM Serving
Database system for building simpler and faster AI-powered application
LLM training code for MosaicML foundation models
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
20+ high-performance LLMs with recipes to pretrain, finetune at scale
A general-purpose probabilistic programming system
Large Language Model Text Generation Inference
Operating LLMs in production
Openai style api for open large language models
Libraries for applying sparsification recipes to neural networks
State-of-the-art Parameter-Efficient Fine-Tuning
Replace OpenAI GPT with another LLM in your app
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Neural Network Compression Framework for enhanced OpenVINO
Efficient few-shot learning with Sentence Transformers
A Unified Library for Parameter-Efficient Learning
Open platform for training, serving, and evaluating language models