GitHub - sgonorov/openvino.genai: Run Generative AI models with simple C++/Python API and using OpenVINO Runtime

Getting Started • AI Scenarios • Optimization Methods • Documentation

OpenVINO™ GenAI is a library of the most popular Generative AI model pipelines, optimized execution methods, and samples that run on top of highly performant OpenVINO Runtime.

This library is friendly to PC and laptop execution, and optimized for resource consumption. It requires no external dependencies to run generative models as it already includes all the core functionality (e.g. tokenization via openvino-tokenizers).

Getting Started

Explore blogs to setup your first hands-on experience with OpenVINO GenAI:

Quick Start

Install OpenVINO GenAI from PyPI:
```
pip install openvino-genai
```

Obtain model, e.g. export model to OpenVINO IR format from Hugging Face (see Model Preparation Guide for more details):

optimum-cli export openvino --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --trust-remote-code TinyLlama_1_1b_v1_ov

Run inference:

import openvino_genai as ov_genai

pipe = ov_genai.LLMPipeline("TinyLlama_1_1b_v1_ov", "CPU")  # Use CPU or GPU as devices without any other code change
print(pipe.generate("What is OpenVINO?", max_new_tokens=100))

Supported Generative AI Scenarios

OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run the following Generative AI Scenarios:

Text generation using Large Language Models (LLMs) - Chat with local Llama, Phi, Qwen and other models
Image processing using Visual Language Model (VLMs) - Analyze images/videos with LLaVa, MiniCPM-V and other models
Image generation using Diffusers - Generate images with Stable Diffusion & Flux models
Speech recognition using Whisper - Convert speech to text using Whisper models
Speech generation using SpeechT5 - Convert text to speech using SpeechT5 TTS models
Semantic search using Text Embedding - Compute embeddings for documents and queries to enable efficient retrieval in RAG workflows
Text Rerank for Retrieval-Augmented Generation (RAG) - Analyze the relevance and accuracy of documents and queries for your RAG workflows

Library efficiently supports LoRA adapters for Text and Image generation scenarios:

Load multiple adapters per model
Select active adapters for every generation
Mix multiple adapters with coefficients via alpha blending

All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See here for platform support matrix.

Supported Generative AI Optimization Methods

OpenVINO™ GenAI library provides a transparent way to use state-of-the-art generation optimizations:

Speculative decoding that employs two models of different sizes and uses the large model to periodically correct the results of the small model. See here for more detailed overview
KVCache token eviction algorithm that reduces the size of the KVCache by pruning less impacting tokens.

Additionally, OpenVINO™ GenAI library implements a continuous batching approach to use OpenVINO within LLM serving. The continuous batching library could be used in LLM serving frameworks and supports the following features:

Prefix caching that caches fragments of previous generation requests and corresponding KVCache entries internally and uses them in case of repeated query.

Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see here for more details.

Additional Resources

License

The OpenVINO™ GenAI repository is licensed under Apache License Version 2.0. By contributing to the project, you agree to the license and copyright terms therein and release your contribution under these terms.

Name		Name	Last commit message	Last commit date
Latest commit History 2,704 Commits
.github		.github
cmake		cmake
samples		samples
site		site
src		src
tests		tests
thirdparty		thirdparty
tools		tools
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
bandit.yml		bandit.yml
pyproject.toml		pyproject.toml
requirements-build.txt		requirements-build.txt
third-party-programs.txt		third-party-programs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting Started

Quick Start

Supported Generative AI Scenarios

Supported Generative AI Optimization Methods

Additional Resources

License

About

Uh oh!

Releases

Packages

Languages

License

sgonorov/openvino.genai

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Quick Start

Supported Generative AI Scenarios

Supported Generative AI Optimization Methods

Additional Resources

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages