Releases · ggml-org/llama.cpp

16 May 21:16

3e0be1c

b5410 Latest

Latest

llguidance : official v0.7.20 release (no actual changes) [noci] (#13…

Assets 20

cudart-llama-bin-win-cuda11.7-x64.zip

303 MB 2025-05-16T21:16:02Z
cudart-llama-bin-win-cuda12.4-x64.zip

373 MB 2025-05-16T21:16:13Z
llama-b5410-bin-macos-arm64.zip

10.2 MB 2025-05-16T21:16:27Z
llama-b5410-bin-macos-x64.zip

24.4 MB 2025-05-16T21:16:28Z
llama-b5410-bin-ubuntu-arm64.zip

10.7 MB 2025-05-16T21:16:30Z
llama-b5410-bin-ubuntu-vulkan-x64.zip

19 MB 2025-05-16T21:16:31Z
llama-b5410-bin-ubuntu-x64.zip

11.2 MB 2025-05-16T21:16:32Z
llama-b5410-bin-win-cpu-arm64.zip

11.8 MB 2025-05-16T21:16:33Z
llama-b5410-bin-win-cpu-x64.zip

12.9 MB 2025-05-16T21:16:35Z
llama-b5410-bin-win-cuda11.7-x64.zip

126 MB 2025-05-16T21:16:36Z
Source code (zip)

2025-05-16T20:56:28Z
Source code (tar.gz)

2025-05-16T20:56:28Z

16 May 20:18

github-actions

b5409

6aa892e

b5409

server : do not return error out of context (with ctx shift disabled)…

Assets 20

16 May 18:42

github-actions

b5406

415e40a

b5406

releases : use arm version of curl for arm releases (#13592)

Assets 20

16 May 18:23

github-actions

b5405

654a677

b5405

metal : add FA-vec kernel for head size 64 (#13583)

ggml-ci

Assets 20

16 May 15:16

github-actions

b5404

5364ae4

b5404

llama : print hint when loading a model when no backends are loaded (…

Assets 20

16 May 10:36

github-actions

b5402

0a338ed

b5402

sycl : fixed compilation warnings (#13582)

Assets 20

15 May 23:12

github-actions

b5401

bc098c3

b5401

minja: sync (qwen3) (#13573)

* minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06

- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58

---------

Co-authored-by: ochafik <[email protected]>

Assets 20

15 May 17:35

github-actions

b5400

c6a2c9e

b5400

gguf : use ggml log system (#13571)

* gguf : use ggml log system

* llama : remove unnecessary new lines in exception messages

Assets 20

15 May 15:19

github-actions

b5395

9c404ed

b5395

sycl: use oneDNN for matrices multiplication (#12972)

Assets 20

15 May 14:10

github-actions

b5394

6c8b915

b5394

llama-bench : fix -ot with dl backends (#13563)

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b5410

b5409

b5406

b5405

b5404

b5402

b5401

b5400

b5395

b5394