Skip to content

Releases: ggml-org/llama.cpp

b5276

04 May 13:44
9f2da58
Compare
Choose a tag to compare
llama : build windows releases with dl backends (#13220)

b5275

04 May 13:36
93c4e23
Compare
Choose a tag to compare
CUDA: fix race condition in MMQ stream-k fixup (#13299)

b5274

04 May 13:26
8afbd96
Compare
Choose a tag to compare
CUDA: fix race condition in MMQ ids_dst (#13294)

b5273

04 May 06:01
8ae5ebc
Compare
Choose a tag to compare
vulkan: Additional type support for unary, binary, and copy (#13266)

Support f16->f32 copy.
Support f16->f16 and f32->f32 unary ops.
Support all combinations of f16/f32 for src0/src1/dst for add/sub/mul/div.

b5272

03 May 23:40
3e959f0
Compare
Choose a tag to compare
imatrix: fix oob writes if src1 is not contiguous (#13286)

b5271

03 May 18:54
36667c8
Compare
Choose a tag to compare
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking c…

b5270

03 May 16:31
3bf785f
Compare
Choose a tag to compare
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)

b5269

02 May 19:23
1d36b36
Compare
Choose a tag to compare
llama : move end-user examples to tools directory (#13249)

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b5267

02 May 18:58
a75cb30
Compare
Choose a tag to compare
context : fix reorder logic (#13267)

ggml-ci

b5266

02 May 18:36
3f3769b
Compare
Choose a tag to compare
ggml : Enable MMA for BF16 in llamafile_sgemm (#13148)

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <[email protected]>