FastChat is an open platform for training, serving, and evaluating large language model-based chatbots. If you do not have enough memory, you can enable 8-bit compression by adding --load-8bit to the commands above. This can reduce memory usage by around half with slightly degraded model quality. It is compatible with the CPU, GPU, and Metal backend. Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/T4/V100(16GB) GPU. In addition to that, you can add --cpu-offloading to commands above to offload weights that don't fit on your GPU onto the CPU memory. This requires 8-bit compression to be enabled and the bitsandbytes package to be installed, which is only available on linux operating systems.

Features

  • The weights, training code, and evaluation code for state-of-the-art models
  • A distributed multi-model serving system with Web UI and OpenAI-compatible RESTful APIs
  • For training, serving, and evaluating large language models
  • Reduce the CPU RAM requirement of weight conversion
  • Inference with Command Line Interface
  • Several supported models

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow FastChat

FastChat Web Site

Other Useful Business Software
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of FastChat!

Additional Project Details

Operating Systems

Linux, Mac

Programming Language

Python

Related Categories

Python Artificial Intelligence Software, Python LLM Inference Tool

Registered

2023-06-01