Releases · ishandutta2007/llama.cpp

18 Oct 17:42

ee09828

b6795 Latest

Latest

HIP: fix GPU_TARGETS (#16642)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-18T17:42:57Z
llama-b6795-bin-macos-arm64.zip

sha256:b89a004ee0c06c037bddb5014bdce64d6ccd568f8f5b250ca6936bf7d73fd2e2

10.4 MB 2025-10-18T17:43:07Z
llama-b6795-bin-macos-x64.zip

sha256:1b4216279494e7a97431fe0995b9650955719df8d3f9e5c1b529661b65400955

27 MB 2025-10-18T17:43:08Z
llama-b6795-bin-ubuntu-vulkan-x64.zip

sha256:50029787e01285e57e7dcc190a41b5bb767d488c9eeb254f11bceb4a517b5488

25.9 MB 2025-10-18T17:43:10Z
llama-b6795-bin-ubuntu-x64.zip

sha256:de46754892901870e13be5f7d7b7bebd8e174f824e87370352eaf23aa43971d3

12.5 MB 2025-10-18T17:43:12Z
llama-b6795-bin-win-cpu-arm64.zip

sha256:f8942ec0daa054c66126e0f04ca629c81f79cd17fc73bda45634053bf312c616

10.6 MB 2025-10-18T17:43:13Z
llama-b6795-bin-win-cpu-x64.zip

sha256:13ab59f4f5d4a481dce9c72f5f6a4bf9f4365c9831f88861073263a80663d5ba

13.7 MB 2025-10-18T17:43:13Z
llama-b6795-bin-win-cuda-12.4-x64.zip

sha256:dd6fbc4cee92a7c96801ba4d25b8fb21dabdcf183e0b6413387c3ba6d3bd75f9

169 MB 2025-10-18T17:43:15Z
llama-b6795-bin-win-hip-radeon-x64.zip

sha256:6a59437467c5db432d77692871f20082e25df148c673d32dd8e186c3b9514a2a

321 MB 2025-10-18T17:43:18Z
llama-b6795-bin-win-opencl-adreno-arm64.zip

sha256:2f5b6ca1eaad61fcc3b7e425f1cd9e1c7325174fddf628be63c5ab3b3ef44063

11 MB 2025-10-18T17:43:29Z
Source code (zip)

2025-10-18T12:47:32Z
Source code (tar.gz)

2025-10-18T12:47:32Z

18 Oct 12:20

github-actions

b6794

e56abd2

b6794

vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)

This is similar to the CUDA shader from #16130, but doesn't use shared memory
and handles different subgroup sizes.

Assets 15

18 Oct 05:32

github-actions

b6792

8138785

b6792

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)

* opencl: transposed gemm/gemv moe kernel with mxfp4,f32

* add restore kernel for moe transpose

* fix trailing whitespaces

* resolve compilation warnings

Assets 15

17 Oct 18:04

github-actions

b6791

66b0dbc

b6791

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

Assets 15

17 Oct 12:45

github-actions

b6788

342c728

b6788

ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.

Co-authored-by: muggle <[email protected]>

Assets 15

17 Oct 05:36

github-actions

b6783

ceff6bb

b6783

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <[email protected]>

Assets 15

17 Oct 00:46

github-actions

b6782

1bb4f43

b6782

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

16 Oct 19:16

github-actions

b6781

683fa6b

b6781

fix: added a normalization step for MathJax-style \[\] and \(\) delim…

Assets 9

16 Oct 11:54

github-actions

b6779

7a50cf3

b6779

CANN: format code using .clang-format (#15863)

This commit applies .clang-format rules to all source files under the
ggml-cann directory to ensure consistent coding style and readability.
The .clang-format option `SortIncludes: false` has been set to disable
automatic reordering of include directives.
No functional changes are introduced.

Co-authored-by: hipudding <[email protected]>

Assets 15

16 Oct 08:06

github-actions

b6776

ee50ee1

b6776

SYCL: Add GGML_OP_MEAN operator support (#16009)

* SYCL: Add GGML_OP_MEAN operator support

* SYCL: Fix formatting for GGML_OP_MEAN case

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

Releases: ishandutta2007/llama.cpp

b6795

Uh oh!

b6794

Uh oh!

b6792

Uh oh!

b6791

Uh oh!

b6788

Uh oh!

b6783

Uh oh!

b6782

Uh oh!

b6781

Uh oh!

b6779

Uh oh!

b6776

Uh oh!