Compare the Top On-Premises Large Language Models as of May 2025

What are On-Premises Large Language Models?

Large language models are artificial neural networks used to process and understand natural language. Commonly trained on large datasets, they can be used for a variety of tasks such as text generation, text classification, question answering, and machine translation. Over time, these models have continued to improve, allowing for better accuracy and greater performance on a variety of tasks. Compare and read user reviews of the best On-Premises Large Language Models currently available using the table below. This list is updated regularly.

  • 1
    LM-Kit.NET
    LM-Kit.NET lets C# and VB.NET developers integrate large and small language models for natural language understanding, text generation, multi-turn dialogue, and low-latency on-device inference, while its vision language models add image analysis and captioning, its embedding models turn text into vectors for fast semantic search, and its LM-Lit catalog lists every state-of-the-art model with continuous updates, all in one efficient toolkit that stays inside your codebase without revealing any AI origin to the user.
    Leader badge
    Starting Price: Free (Community) or $1000/year
    Partner badge
    View Software
    Visit Website
  • 2
    DeepSeek

    DeepSeek

    DeepSeek

    DeepSeek is a cutting-edge AI assistant powered by the advanced DeepSeek-V3 model, featuring over 600 billion parameters for exceptional performance. Designed to compete with top global AI systems, it offers fast responses and a wide range of features to make everyday tasks easier and more efficient. Available across multiple platforms, including iOS, Android, and the web, DeepSeek ensures accessibility for users everywhere. The app supports multiple languages and has been continually updated to improve functionality, add new language options, and resolve issues. With its seamless performance and versatility, DeepSeek has garnered positive feedback from users worldwide.
    Starting Price: Free
  • 3
    Mistral AI

    Mistral AI

    Mistral AI

    Mistral AI is a pioneering artificial intelligence startup specializing in open-source generative AI. The company offers a range of customizable, enterprise-grade AI solutions deployable across various platforms, including on-premises, cloud, edge, and devices. Flagship products include "Le Chat," a multilingual AI assistant designed to enhance productivity in both personal and professional contexts, and "La Plateforme," a developer platform that enables the creation and deployment of AI-powered applications. Committed to transparency and innovation, Mistral AI positions itself as a leading independent AI lab, contributing significantly to open-source AI and policy development.
    Starting Price: Free
  • 4
    Cohere

    Cohere

    Cohere AI

    Cohere is an enterprise AI platform that enables developers and businesses to build powerful language-based applications. Specializing in large language models (LLMs), Cohere provides solutions for text generation, summarization, and semantic search. Their model offerings include the Command family for high-performance language tasks and Aya Expanse for multilingual applications across 23 languages. Focused on security and customization, Cohere allows flexible deployment across major cloud providers, private cloud environments, or on-premises setups to meet diverse enterprise needs. The company collaborates with industry leaders like Oracle and Salesforce to integrate generative AI into business applications, improving automation and customer engagement. Additionally, Cohere For AI, their research lab, advances machine learning through open-source projects and a global research community.
    Starting Price: Free
  • 5
    DeepSeek Coder
    DeepSeek Coder is a cutting-edge software tool designed to revolutionize the landscape of data analysis and coding. By leveraging advanced machine learning algorithms and natural language processing capabilities, it empowers users to seamlessly integrate data querying, analysis, and visualization into their workflow. The intuitive interface of DeepSeek Coder enables both novice and experienced programmers to efficiently write, test, and optimize code. Its robust set of features includes real-time syntax checking, intelligent code completion, and comprehensive debugging tools, all designed to streamline the coding process. Additionally, DeepSeek Coder's ability to understand and interpret complex data sets ensures that users can derive meaningful insights and create sophisticated data-driven applications with ease.
    Starting Price: Free
  • 6
    DeepSeek-V3

    DeepSeek-V3

    DeepSeek

    DeepSeek-V3 is a state-of-the-art AI model designed to deliver unparalleled performance in natural language understanding, advanced reasoning, and decision-making tasks. Leveraging next-generation neural architectures, it integrates extensive datasets and fine-tuned algorithms to tackle complex challenges across diverse domains such as research, development, business intelligence, and automation. With a focus on scalability and efficiency, DeepSeek-V3 provides developers and enterprises with cutting-edge tools to accelerate innovation and achieve transformative outcomes.
    Starting Price: Free
  • 7
    DeepSeek R1

    DeepSeek R1

    DeepSeek

    DeepSeek-R1 is an advanced open-source reasoning model developed by DeepSeek, designed to rival OpenAI's Model o1. Accessible via web, app, and API, it excels in complex tasks such as mathematics and coding, demonstrating superior performance on benchmarks like the American Invitational Mathematics Examination (AIME) and MATH. DeepSeek-R1 employs a mixture of experts (MoE) architecture with 671 billion total parameters, activating 37 billion parameters per token, enabling efficient and accurate reasoning capabilities. This model is part of DeepSeek's commitment to advancing artificial general intelligence (AGI) through open-source innovation.
    Starting Price: Free
  • 8
    Qwen-7B

    Qwen-7B

    Alibaba

    Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. The features of the Qwen-7B series include: Trained with high-quality pretraining data. We have pretrained Qwen-7B on a self-constructed large-scale high-quality dataset of over 2.2 trillion tokens. The dataset includes plain texts and codes, and it covers a wide range of domains, including general domain data and professional domain data. Strong performance. In comparison with the models of the similar model size, we outperform the competitors on a series of benchmark datasets, which evaluates natural language understanding, mathematics, coding, etc. And more.
    Starting Price: Free
  • 9
    Mistral 7B

    Mistral 7B

    Mistral AI

    Mistral 7B is a 7.3-billion-parameter language model that outperforms larger models like Llama 2 13B across various benchmarks. It employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to efficiently handle longer sequences. Released under the Apache 2.0 license, Mistral 7B is accessible for deployment across diverse platforms, including local environments and major cloud services. Additionally, a fine-tuned version, Mistral 7B Instruct, demonstrates enhanced performance in instruction-following tasks, surpassing models like Llama 2 13B Chat.
    Starting Price: Free
  • 10
    Ministral 3B

    Ministral 3B

    Mistral AI

    Mistral AI introduced two state-of-the-art models for on-device computing and edge use cases, named "les Ministraux": Ministral 3B and Ministral 8B. These models set a new frontier in knowledge, commonsense reasoning, function-calling, and efficiency in the sub-10B category. They can be used or tuned for various applications, from orchestrating agentic workflows to creating specialist task workers. Both models support up to 128k context length (currently 32k on vLLM), and Ministral 8B features a special interleaved sliding-window attention pattern for faster and memory-efficient inference. These models were built to provide a compute-efficient and low-latency solution for scenarios such as on-device translation, internet-less smart assistants, local analytics, and autonomous robotics. Used in conjunction with larger language models like Mistral Large, les Ministraux also serve as efficient intermediaries for function-calling in multi-step agentic workflows.
    Starting Price: Free
  • 11
    Ministral 8B

    Ministral 8B

    Mistral AI

    Mistral AI has introduced two advanced models for on-device computing and edge applications, named "les Ministraux": Ministral 3B and Ministral 8B. These models excel in knowledge, commonsense reasoning, function-calling, and efficiency within the sub-10B parameter range. They support up to 128k context length and are designed for various applications, including on-device translation, offline smart assistants, local analytics, and autonomous robotics. Ministral 8B features an interleaved sliding-window attention pattern for faster and more memory-efficient inference. Both models can function as intermediaries in multi-step agentic workflows, handling tasks like input parsing, task routing, and API calls based on user intent with low latency and cost. Benchmark evaluations indicate that les Ministraux consistently outperforms comparable models across multiple tasks. As of October 16, 2024, both models are available, with Ministral 8B priced at $0.1 per million tokens.
    Starting Price: Free
  • 12
    Mistral Small

    Mistral Small

    Mistral AI

    On September 17, 2024, Mistral AI announced several key updates to enhance the accessibility and performance of their AI offerings. They introduced a free tier on "La Plateforme," their serverless platform for tuning and deploying Mistral models as API endpoints, enabling developers to experiment and prototype at no cost. Additionally, Mistral AI reduced prices across their entire model lineup, with significant cuts such as a 50% reduction for Mistral Nemo and an 80% decrease for Mistral Small and Codestral, making advanced AI more cost-effective for users. The company also unveiled Mistral Small v24.09, a 22-billion-parameter model offering a balance between performance and efficiency, suitable for tasks like translation, summarization, and sentiment analysis. Furthermore, they made Pixtral 12B, a vision-capable model with image understanding capabilities, freely available on "Le Chat," allowing users to analyze and caption images without compromising text-based performance.
    Starting Price: Free
  • 13
    Jurassic-2
    Announcing the launch of Jurassic-2, the latest generation of AI21 Studio’s foundation models, a game-changer in the field of AI, with top-tier quality and new capabilities. And that's not all, we're also releasing our task-specific APIs, with plug-and-play reading and writing capabilities that outperform competitors. Our focus at AI21 Studio is to help developers and businesses leverage reading and writing AI to build real-world products with tangible value. Today marks two important milestones with the release of Jurassic-2 and Task-Specific APIs, empowering you to bring generative AI to production. Jurassic-2 (or J2, as we like to call it) is the next generation of our foundation models with significant improvements in quality and new capabilities including zero-shot instruction-following, reduced latency, and multi-language support. Task-specific APIs provide developers with industry-leading APIs that perform specialized reading and writing tasks out-of-the-box.
    Starting Price: $29 per month
  • 14
    Jan

    Jan

    Jan

    10x productivity with customizable AI assistants, global hotkeys, and in-line AI. Seamless integration into your mobile workflows with elegant features. Conversations, preferences, and model usage stay on your computer—secure, exportable, and can be deleted at any time.
    Starting Price: Free
  • 15
    Mixtral 8x7B

    Mixtral 8x7B

    Mistral AI

    Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT-3.5 on most standard benchmarks.
    Starting Price: Free
  • 16
    Codestral

    Codestral

    Mistral AI

    We introduce Codestral, our first-ever code model. Codestral is an open-weight generative AI model explicitly designed for code generation tasks. It helps developers write and interact with code through a shared instruction and completion API endpoint. As it masters code and English, it can be used to design advanced AI applications for software developers. Codestral is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash. It also performs well on more specific ones like Swift and Fortran. This broad language base ensures Codestral can assist developers in various coding environments and projects.
    Starting Price: Free
  • 17
    Mistral Large

    Mistral Large

    Mistral AI

    Mistral Large is Mistral AI's flagship language model, designed for advanced text generation and complex multilingual reasoning tasks, including text comprehension, transformation, and code generation. It supports English, French, Spanish, German, and Italian, offering a nuanced understanding of grammar and cultural contexts. With a 32,000-token context window, it can accurately recall information from extensive documents. The model's precise instruction-following and native function-calling capabilities facilitate application development and tech stack modernization. Mistral Large is accessible through Mistral's platform, Azure AI Studio, and Azure Machine Learning, and can be self-deployed for sensitive use cases. Benchmark evaluations indicate that Mistral Large achieves strong results, making it the world's second-ranked model generally available through an API, next to GPT-4.
    Starting Price: Free
  • 18
    DeepSeek-V2

    DeepSeek-V2

    DeepSeek

    DeepSeek-V2 is a state-of-the-art Mixture-of-Experts (MoE) language model introduced by DeepSeek-AI, characterized by its economical training and efficient inference capabilities. With a total of 236 billion parameters, of which only 21 billion are active per token, it supports a context length of up to 128K tokens. DeepSeek-V2 employs innovative architectures like Multi-head Latent Attention (MLA) for efficient inference by compressing the Key-Value (KV) cache and DeepSeekMoE for cost-effective training through sparse computation. This model significantly outperforms its predecessor, DeepSeek 67B, by saving 42.5% in training costs, reducing the KV cache by 93.3%, and enhancing generation throughput by 5.76 times. Pretrained on an 8.1 trillion token corpus, DeepSeek-V2 excels in language understanding, coding, and reasoning tasks, making it a top-tier performer among open-source models.
    Starting Price: Free
  • 19
    Qwen2.5-VL

    Qwen2.5-VL

    Alibaba

    Qwen2.5-VL is the latest vision-language model from the Qwen series, representing a significant advancement over its predecessor, Qwen2-VL. This model excels in visual understanding, capable of recognizing a wide array of objects, including text, charts, icons, graphics, and layouts within images. It functions as a visual agent, capable of reasoning and dynamically directing tools, enabling applications such as computer and phone usage. Qwen2.5-VL can comprehend videos exceeding one hour in length and can pinpoint relevant segments within them. Additionally, it accurately localizes objects in images by generating bounding boxes or points and provides stable JSON outputs for coordinates and attributes. The model also supports structured outputs for data like scanned invoices, forms, and tables, benefiting sectors such as finance and commerce. Available in base and instruct versions across 3B, 7B, and 72B sizes, Qwen2.5-VL is accessible through platforms like Hugging Face and ModelScope.
    Starting Price: Free
  • 20
    Mistral Large 2
    Mistral AI has launched the Mistral Large 2, an advanced AI model designed to excel in code generation, multilingual capabilities, and complex reasoning tasks. The model features a 128k context window, supporting dozens of languages including English, French, Spanish, and Arabic, as well as over 80 programming languages. Mistral Large 2 is tailored for high-throughput single-node inference, making it ideal for large-context applications. Its improved performance on benchmarks like MMLU and its enhanced code generation and reasoning abilities ensure accuracy and efficiency. The model also incorporates better function calling and retrieval, supporting complex business applications.
    Starting Price: Free
  • 21
    Mistral Medium 3
    Mistral Medium 3 is a powerful AI model designed to deliver state-of-the-art performance at a fraction of the cost compared to other models. It offers simpler deployment options, allowing for hybrid or on-premises configurations. Mistral Medium 3 excels in professional applications like coding and multimodal understanding, making it ideal for enterprise use. Its low-cost structure makes it highly accessible while maintaining top-tier performance, outperforming many larger models in specific domains.
    Starting Price: Free
  • 22
    Pixtral Large

    Pixtral Large

    Mistral AI

    Pixtral Large is a 124-billion-parameter open-weight multimodal model developed by Mistral AI, building upon their Mistral Large 2 architecture. It integrates a 123-billion-parameter multimodal decoder with a 1-billion-parameter vision encoder, enabling advanced understanding of documents, charts, and natural images while maintaining leading text comprehension capabilities. With a context window of 128,000 tokens, Pixtral Large can process at least 30 high-resolution images simultaneously. The model has demonstrated state-of-the-art performance on benchmarks such as MathVista, DocVQA, and VQAv2, surpassing models like GPT-4o and Gemini-1.5 Pro. Pixtral Large is available under the Mistral Research License for research and educational use, and under the Mistral Commercial License for commercial applications.
    Starting Price: Free
  • 23
    PaLM

    PaLM

    Google

    PaLM API is an easy and safe way to build on top of our best language models. Today, we’re making an efficient model available, in terms of size and capabilities, and we’ll add other sizes soon. The API also comes with an intuitive tool called MakerSuite, which lets you quickly prototype ideas and, over time, will have features for prompt engineering, synthetic data generation and custom-model tuning — all supported by robust safety tools. Select developers can access the PaLM API and MakerSuite in Private Preview today, and stay tuned for our waitlist soon.
  • Previous
  • You're on page 1
  • Next