Releases · ggml-org/llama.cpp

04 May 13:44

9f2da58

b5276

llama : build windows releases with dl backends (#13220)

Assets 21

04 May 13:36

github-actions

b5275

93c4e23

b5275

CUDA: fix race condition in MMQ stream-k fixup (#13299)

Assets 26

04 May 13:26

github-actions

b5274

8afbd96

b5274

CUDA: fix race condition in MMQ ids_dst (#13294)

Assets 26

04 May 06:01

github-actions

b5273

8ae5ebc

b5273

vulkan: Additional type support for unary, binary, and copy (#13266)

Support f16->f32 copy.
Support f16->f16 and f32->f32 unary ops.
Support all combinations of f16/f32 for src0/src1/dst for add/sub/mul/div.

Assets 26

03 May 23:40

github-actions

b5272

3e959f0

b5272

imatrix: fix oob writes if src1 is not contiguous (#13286)

Assets 26

03 May 18:54

github-actions

b5271

36667c8

b5271

clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking c…

Assets 26

03 May 16:31

github-actions

b5270

3bf785f

b5270

llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)

Assets 26

02 May 19:23

github-actions

b5269

1d36b36

b5269

llama : move end-user examples to tools directory (#13249)

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 26

02 May 18:58

github-actions

b5267

a75cb30

b5267

context : fix reorder logic (#13267)

ggml-ci

Assets 26

02 May 18:36

github-actions

b5266

3f3769b

b5266

ggml : Enable MMA for BF16 in llamafile_sgemm (#13148)

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <[email protected]>

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b5276

b5275

b5274

b5273

b5272

b5271

b5270

b5269

b5267

b5266