Cactus is a lightweight, high-performance framework for running AI models on mobile devices, with simple and consistent APIs across Flutter and React-Native. Cactus currently leverages GGML backends to support any GGUF model already compatible with Llama.cpp.
- Text completion and chat completion
- Vision Language Models
- Streaming token generation
- Embedding generation
- Text-to-speech model support (early stages)
- JSON mode with schema validation
- Chat templates with Jinja2 support
- Low memory footprint
- Battery-efficient inference
- Background processing
- Agentic workflows (cross-app interactions etc.)
- Phone tool use (gallery search, read email, DM...)
- Thinking mode (planning, evals...)
- Higher-level APIs (sentiments, OCR, TTS...)
┌─────────────────────────────────────────┐
│ Apps │
└─────────────┬─────────────┬─────────────┘
│ │
┌─────────────▼─────────────▼─────────────┐
│ ┌───-───┐ ┌────-────┐ ┌───-───┐ │
│ │ React │ │ Flutter │ │ Native│ │
│ └───────┘ └─────────┘ └───────┘ │
│ Bindings │
└─────────────┬─────────────┬─────────────┘
│ │
┌─────────────▼─────────────▼─────────────┐
│ Cactus Core C++ │
└─────────────┬─────────────┬─────────────┘
│ │
┌─────-▼────┐ ┌─────▼─────┐
│ Llama.cpp │ │ GGML/GGUF │
└───────────┘ └───────────┘
- Update
pubspec.yaml
: Addcactus
to your project's dependencies. Ensure you haveflutter: sdk: flutter
(usually present by default).dependencies: flutter: sdk: flutter cactus: ^0.0.3
- Install dependencies:
Execute the following command in your project terminal:
flutter pub get
- Install the
cactus-react-native
package: Using npm:Or using yarn:npm install cactus-react-native
yarn add cactus-react-native
- Install iOS Pods (if not using Expo):
For native iOS projects, ensure you link the native dependencies. Navigate to your
ios
directory and run:npx pod-install
Cactus backend is written in C/C++, layered on top of GGML/GGUF to support models in the GGUF format. Developers and contributors in this niche can easily get started with examples for:
N/B: Should have CMake
installed, or install with brew install cmake
(on macOS) or standard package managers on Linux.
-
Language Models:
- Navigate to the example directory:
cd example/cpp-llm
- Make the build script executable (only needs to be done once):
chmod +x build.sh
- Run the example:
./build.sh
(This will download the Qwen 3 model) - Play with models and prompts in
example/cpp-llm/main.cpp
.
- Navigate to the example directory:
-
Vision-Language Models:
- Navigate to the example directory:
cd example/cpp-vlm
- Make the build script executable (only needs to be done once):
chmod +x build.sh
- Run the example:
./build.sh
(This will download the SmolVLM model) - Play with models and prompts in
example/cpp-vlm/main.cpp
.
- Navigate to the example directory:
-
Text-to-Speech:
- Navigate to the example directory:
cd example/cpp-tts
- Make the build script executable (only needs to be done once):
chmod +x build.sh
- Run the example:
./build.sh
(This will download the OuteTTS model) - Play with models and prompts in
example/cpp-tts/main.cpp
.
- Navigate to the example directory:
We host our docs on Deep Wiki, so you can additionally ask Devin any question about Cactus! It does not index frequently enough to keep up with our update speed though, so we have manually written docs for the APIs
We have ready-to-run-and-deploy example apps:
- Flutter Chat
- Flutter Notes
- React Chat
- React Productivity
- React Diary
- C++ Language Model (LLM)
- C++ Vision-Language Model (VLM)
- C++ Text-to-Speech (TTS)
We welcome contributions! Here's how you can help:
- Clone the Repository: For simplicity at this stage, clone the repository to your local machine.
- Create a Branch: Create a new branch for your contribution.
- Implement Changes: Make your desired changes or additions.
- Run Tests (for C/C++ contributors):
- Ensure all tests pass by running the script:
scripts/test-cactus.sh
- Ensure all tests pass by running the script:
- Flutter & React-Native Testing: (Testing procedures for these platforms will be updated soon.)
- Submit a Pull Request (PR): Once you're ready, submit a PR with your changes!
- Contribution Ideas Example apps, polishing the examples, features, submitting benchmarks, etc.
Device | Gemma-3 1B Q8 (toks/sec) | Qwen-2.5 1.5B Q8 (toks/sec) | SmolLM2 360M Q8 (toks/sec) |
---|---|---|---|
iPhone 16 Pro Max | 43 | 29 | 103 |
iPhone 16 Pro | - | 28 | 103 |
iPhone 16 | - | 29 | - |
OnePlus 13 5G | 37 | - | - |
Samsung Galaxy S24 Ultra | 36 | - | - |
OnePlus Open | 33 | - | - |
Samsung Galaxy S23 5G | 32 | - | - |
Samsung Galaxy S24 | 31 | - | - |
iPhone 15 Pro Max | - | 23 | - |
iPhone 15 Pro | - | 25 | 81 |
iPhone 15 | - | 25 | - |
iPhone 14 Pro Max | - | 25 | - |
iPhone 13 Pro | 30 | - | - |
OnePlus 12 | 30 | - | - |
Galaxy S25 Ultra | 25 | - | - |
OnePlus 11 | 23 | - | 64 |
iPhone 12 mini | 22 | - | - |
Redmi K70 Ultra | 21 | - | - |
Xiaomi 13 | 21 | - | - |
Samsung Galaxy S24+ | 19 | - | - |
Samsung Galaxy Z Fold 4 | 19 | - | - |
Xiaomi Poco F6 5G | 19 | - | - |
iPhone 13 mini | - | - | 42 |
iPhone 12 Pro Max | - | 17 | - |
Google Pixel 8 | 16 | - | - |
Realme GT2 | 16 | - | - |
Google Pixel 6a | 14 | - | - |
We created a demo chat app we use for benchmarking: