• Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Get Avast Free Antivirus with 24/7 AI-powered online scam detection Icon
    Get Avast Free Antivirus with 24/7 AI-powered online scam detection

    Get protection for today’s online threats. Free.

    Award-winning antivirus protection, as well as protection against online scams, dangerous Wi-Fi connections, hacked accounts, and ransomware. It includes Avast Assistant, your built-in AI partner, which gives you help with suspicious online messages, offers, and more.
    Free Download
  • 1
    llama.cpp

    llama.cpp

    Port of Facebook's LLaMA model in C/C++

    The llama.cpp project enables the inference of Meta's LLaMA model (and other models) in pure C/C++ without requiring a Python runtime. It is designed for efficient and fast model execution, offering easy integration for applications needing LLM-based capabilities. The repository focuses on providing a highly optimized and portable implementation for running large language models directly within C/C++ environments.
    Downloads: 80 This Week
    Last Update:
    See Project
  • 2
    GPT4All

    GPT4All

    Run Local LLMs on Any Device. Open-source

    .... This project also supports Python integrations for easy automation and customization. GPT4All is ideal for individuals and businesses seeking private, offline access to powerful LLMs.
    Downloads: 74 This Week
    Last Update:
    See Project
  • 3
    LocalAI

    LocalAI

    Self-hosted, community-driven, local OpenAI compatible API

    Self-hosted, community-driven, local OpenAI compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU is required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many others. LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 4
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 41 This Week
    Last Update:
    See Project
  • MongoDB Atlas | Run databases anywhere Icon
    MongoDB Atlas | Run databases anywhere

    Ensure the availability of your data with coverage across AWS, Azure, and GCP on MongoDB Atlas—the multi-cloud database for every enterprise.

    MongoDB Atlas allows you to build and run modern applications across 125+ cloud regions, spanning AWS, Azure, and Google Cloud. Its multi-cloud clusters enable seamless data distribution and automated failover between cloud providers, ensuring high availability and flexibility without added complexity.
    Learn More
  • 5
    rwkv.cpp

    rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

    Besides the usual FP32, it supports FP16, quantized INT4, INT5 and INT8 inference. This project is focused on CPU, but cuBLAS is also supported. RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from the previous step to calculate logits. This makes RWKV very CPU-friendly on large context lengths.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 6
    RWKV Runner

    RWKV Runner

    A RWKV management and startup tool, full automation, only 8MB

    RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free. Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 7
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 8
    Xorbits Inference

    Xorbits Inference

    Replace OpenAI GPT with another LLM in your app

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 19 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    OpenLLM

    OpenLLM

    Operating LLMs in production

    ..., CLI, our Python/Javascript client, or any HTTP client.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Curated Transformers

    Curated Transformers

    PyTorch library of curated Transformer models and their components

    State-of-the-art transformers, brick by brick. Curated Transformers is a transformer library for PyTorch. It provides state-of-the-art models that are composed of a set of reusable components. Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    marqo

    marqo

    Tensor search for humans

    A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    EvaDB

    EvaDB

    Database system for building simpler and faster AI-powered application

    Over the last decade, AI models have radically changed the world of natural language processing and computer vision. They are accurate on various tasks ranging from question answering to object tracking in videos. To use an AI model, the user needs to program against multiple low-level libraries, like PyTorch, Hugging Face, Open AI, etc. This tedious process often leads to a complex AI app that glues together these libraries to accomplish the given task. This programming complexity prevents...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Petals

    Petals

    Run 100B+ language models at home, BitTorrent-style

    Run 100B+ language models at home, BitTorrent‑style. Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning. Single-batch inference runs at ≈ 1 sec per step (token) — up to 10x faster than offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec. Beyond classic language model APIs — you can employ any fine-tuning and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    KubeAI

    KubeAI

    Private Open AI on Kubernetes

    Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    PEFT

    PEFT

    State-of-the-art Parameter-Efficient Fine-Tuning

    Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    Mosec

    Mosec

    A high-performance ML model serving framework, offers dynamic batching

    Mosec is a high-performance and flexible model-serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    llama2-webui

    llama2-webui

    Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere

    Running Llama 2 with gradio web UI on GPU or CPU from anywhere (Linux/Windows/Mac).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LLM Foundry

    LLM Foundry

    LLM training code for MosaicML foundation models

    Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Large language models (LLMs) are changing the world, but for those outside well-resourced industry labs, it can be extremely difficult to train and deploy...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Bard API

    Bard API

    The unofficial python package that returns response of Google Bard

    The Python package returns a response of Google Bard through the value of the cookie. This package is designed for application to the Python package ExceptNotifier and Co-Coder. Please note that the bardapi is not a free service, but rather a tool provided to assist developers with testing certain functionalities due to the delayed development and release of Google Bard's API. It has been designed with a lightweight structure that can easily adapt to the emergence of an official API. Therefore...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Repo of Tree of Thoughts (ToT)

    Repo of Tree of Thoughts (ToT)

    Implementation of "Tree of Thoughts

    Language models are increasingly being deployed for general problem-solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Infinity

    Infinity

    Low-latency REST API for serving text-embeddings

    Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    GPT-NeoX

    GPT-NeoX

    Implementation of model parallel autoregressive transformers on GPUs

    This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. For those looking for a TPU-centric codebase, we...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.