Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5276
llama : build windows releases with dl backends (#13220)
b5275
CUDA: fix race condition in MMQ stream-k fixup (#13299)
b5274
CUDA: fix race condition in MMQ ids_dst (#13294)
b5273
vulkan: Additional type support for unary, binary, and copy (#13266) Support f16->f32 copy. Support f16->f16 and f32->f32 unary ops. Support all combinations of f16/f32 for src0/src1/dst for add/sub/mul/div.
b5272
imatrix: fix oob writes if src1 is not contiguous (#13286)
b5271
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking c…
b5270
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)
b5269
llama : move end-user examples to tools directory (#13249) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b5267
context : fix reorder logic (#13267) ggml-ci
b5266
ggml : Enable MMA for BF16 in llamafile_sgemm (#13148) This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <[email protected]>