#llama-cpp #bindings #low-level #cuda #moe #offloading #safe-api #llama-cpp-2

sys shimmy-llama-cpp-sys-2

Low Level Bindings to llama.cpp with MoE CPU offloading support

1 unstable release

0.1.123 Oct 9, 2025

#1798 in Machine learning

Download history 325/week @ 2025-10-05 435/week @ 2025-10-12 260/week @ 2025-10-19 288/week @ 2025-10-26 155/week @ 2025-11-02 44/week @ 2025-11-09 44/week @ 2025-11-16 44/week @ 2025-11-23 52/week @ 2025-11-30 273/week @ 2025-12-07

417 downloads per month
Used in 2 crates (via shimmy-llama-cpp-2)

MIT/Apache

9.5MB
174K SLoC

C++ 107K SLoC // 0.1% comments C 28K SLoC // 0.0% comments CUDA 13K SLoC // 0.0% comments GLSL 11K SLoC // 0.0% comments Python 7K SLoC // 0.1% comments Metal Shading Language 6.5K SLoC // 0.1% comments Objective-C 1.5K SLoC // 0.1% comments Rust 634 SLoC // 0.1% comments

llama-cpp-sys

Raw bindings to llama.cpp with cuda support.

See llama-cpp-2 for a safe API.

Dependencies