Skip to content

Releases: ggml-org/llama.cpp

b5634

11 Jun 14:43
89a184f

Choose a tag to compare

kv-cache : relax SWA masking condition (#14119)

ggml-ci

b5633

11 Jun 11:07
2baf077

Choose a tag to compare

server : pass default --keep argument (#14120)

b5632

11 Jun 10:38
7ae2932

Choose a tag to compare

kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)

b5631

11 Jun 05:51
1f7d50b

Choose a tag to compare

vulkan: Track descriptor pools/sets per-context (#14109)

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.

b5630

11 Jun 00:10
4c763c8

Choose a tag to compare

opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)

b5629

10 Jun 22:45
dad5c44

Choose a tag to compare

kv-cache : avoid modifying recurrent cells when setting inputs (#13834)

* kv-cache : avoid modifying recurrent cells when setting inputs

* kv-cache : remove inp_s_mask

It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.

* kv-cache : fix non-consecutive token pos warning for recurrent models

The problem was apparently caused by how the tail cells were swapped.

* graph : simplify logic for recurrent state copies

* kv-cache : use cell without src refs for rs_z in recurrent cache

* llama-graph : fix recurrent state copy

The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.

Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.

* llama-graph : rename n_state to state_size in build_recurrent_state

This naming should reduce confusion between the state size
and the number of states.

b5627

10 Jun 17:02
3678b83

Choose a tag to compare

llama : support GEGLU for jina-bert-v2 (#14090)

b5625

10 Jun 16:54
3a12db2

Choose a tag to compare

Fixed spec timings to: accepted/tested instead of accepted/drafted (#…

b5624

10 Jun 16:09

Choose a tag to compare

sync : ggml

ggml-ci

b5622

10 Jun 12:36
97340b4

Choose a tag to compare

Vulkan: Don't default to CPU device (like llvmpipe), even if no other…