Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5302
b5301
cuda : remove nrows_x in mul_mat_q_process_tile (#13325) Signed-off-by: Xiaodong Ye <[email protected]>
b5300
examples : remove infill (#13283) ggml-ci
b5299
llama : support tie embedding for chatglm models (#13328)
b5298
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)
b5297
clip : refactor graph builder (#13321) * mtmd : refactor graph builder * fix qwen2vl * clean up siglip cgraph * pixtral migrated * move minicpmv to a dedicated build function * move max_feature_layer to build_llava * use build_attn for minicpm resampler * fix windows build * add comment for batch_size * also support tinygemma3 test model * qwen2vl does not use RMS norm * fix qwen2vl norm (2)
b5296
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)
b5295
sampling : don't consider -infinity values in top_n_sigma (#13344)
b5293
SYCL: Disable reorder optimize by default and stop setting tensor ext…
b5292
llama : fix build_ffn without gate (#13336) * llama : fix build_ffn without gate * fix build on windows * Revert "fix build on windows" This reverts commit fc420d3c7eef3481d3d2f313fef2757cb33a7c56.