Releases · ggml-org/llama.cpp

07 May 10:50

39e73ae

b5302 Latest

Latest

common : Add a warning when we can't match samplers from a string or …

Assets 21

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-05-07T10:50:40Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-05-07T10:50:50Z
llama-b5302-bin-macos-arm64.zip

21.7 MB 2025-05-07T10:51:03Z
llama-b5302-bin-macos-x64.zip

23.2 MB 2025-05-07T10:51:04Z
llama-b5302-bin-ubuntu-arm64.zip

23.3 MB 2025-05-07T10:51:06Z
llama-b5302-bin-ubuntu-vulkan-x64.zip

32.1 MB 2025-05-07T10:51:07Z
llama-b5302-bin-ubuntu-x64.zip

24.6 MB 2025-05-07T10:51:08Z
llama-b5302-bin-win-cpu-x64.zip

24.2 MB 2025-05-07T10:51:10Z
llama-b5302-bin-win-cuda-cu11.7-x64.zip

134 MB 2025-05-07T10:51:12Z
llama-b5302-bin-win-cuda-cu12.4-x64.zip

154 MB 2025-05-07T10:51:16Z
Source code (zip)

2025-05-07T08:23:28Z
Source code (tar.gz)

2025-05-07T08:23:28Z

07 May 10:32

github-actions

b5301

1f73301

b5301

cuda : remove nrows_x in mul_mat_q_process_tile (#13325)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 21

07 May 10:11

github-actions

b5300

4773d7a

b5300

examples : remove infill (#13283)

ggml-ci

Assets 21

07 May 09:16

github-actions

b5299

6c7fd67

b5299

llama : support tie embedding for chatglm models (#13328)

Assets 21

06 May 23:10

github-actions

b5298

141a908

b5298

CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)

Assets 21

06 May 22:26

github-actions

b5297

32916a4

b5297

clip : refactor graph builder (#13321)

* mtmd : refactor graph builder

* fix qwen2vl

* clean up siglip cgraph

* pixtral migrated

* move minicpmv to a dedicated build function

* move max_feature_layer to build_llava

* use build_attn for minicpm resampler

* fix windows build

* add comment for batch_size

* also support tinygemma3 test model

* qwen2vl does not use RMS norm

* fix qwen2vl norm (2)

Assets 21

06 May 22:16

github-actions

b5296

ffc7272

b5296

sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)

Assets 21

06 May 19:17

github-actions

b5295

91a86a6

b5295

sampling : don't consider -infinity values in top_n_sigma (#13344)

Assets 21

06 May 16:31

github-actions

b5293

1e333d5

b5293

SYCL: Disable reorder optimize by default and stop setting tensor ext…

Assets 21

06 May 14:29

github-actions

b5292

2f54e34

b5292

llama : fix build_ffn without gate (#13336)

* llama : fix build_ffn without gate

* fix build on windows

* Revert "fix build on windows"

This reverts commit fc420d3c7eef3481d3d2f313fef2757cb33a7c56.

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b5302

b5301

b5300

b5299

b5298

b5297

b5296

b5295

b5293

b5292