Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5634
kv-cache : relax SWA masking condition (#14119) ggml-ci
b5633
server : pass default --keep argument (#14120)
b5632
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)
b5631
vulkan: Track descriptor pools/sets per-context (#14109) Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8) and move it to the vk_device. Move all the descriptor pool and set tracking to the context - none of it is specific to pipelines anymore. It has a single vector of pools and vector of sets, and a single counter to track requests and a single counter to track use.
b5630
opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)
b5629
kv-cache : avoid modifying recurrent cells when setting inputs (#13834) * kv-cache : avoid modifying recurrent cells when setting inputs * kv-cache : remove inp_s_mask It was replaced with equivalent and simpler functionality with rs_z (the first zeroed state) and the already-existing inp_s_copy. * kv-cache : fix non-consecutive token pos warning for recurrent models The problem was apparently caused by how the tail cells were swapped. * graph : simplify logic for recurrent state copies * kv-cache : use cell without src refs for rs_z in recurrent cache * llama-graph : fix recurrent state copy The `state_copy` shuffle assumes everything is moved at once, which is not true when `states_extra` is copied back to the cache before copying the range of states between `head` and `head + n_seqs`. This is only a problem if any of the cells in [`head`, `head + n_seqs`) have an `src` in [`head + n_seqs`, `head + n_kv`), which does happen when `n_ubatch > 1` in the `llama-parallel` example. Changing the order of the operations avoids the potential overwrite before use, although when copies are avoided (like with Mamba2), this will require further changes. * llama-graph : rename n_state to state_size in build_recurrent_state This naming should reduce confusion between the state size and the number of states.
b5627
llama : support GEGLU for jina-bert-v2 (#14090)
b5625
Fixed spec timings to: accepted/tested instead of accepted/drafted (#…
b5624
sync : ggml ggml-ci
b5622
Vulkan: Don't default to CPU device (like llvmpipe), even if no other…