Releases · ggml-org/llama.cpp

11 Jun 14:43

89a184f

b5634

kv-cache : relax SWA masking condition (#14119)

ggml-ci

Assets 15

11 Jun 11:07

github-actions

b5633

2baf077

b5633

server : pass default --keep argument (#14120)

Assets 15

11 Jun 10:38

github-actions

b5632

7ae2932

b5632

kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)

Assets 15

11 Jun 05:51

github-actions

b5631

1f7d50b

b5631

vulkan: Track descriptor pools/sets per-context (#14109)

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.

Assets 15

11 Jun 00:10

github-actions

b5630

4c763c8

b5630

opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)

Assets 15

10 Jun 22:45

github-actions

b5629

dad5c44

b5629

kv-cache : avoid modifying recurrent cells when setting inputs (#13834)

* kv-cache : avoid modifying recurrent cells when setting inputs

* kv-cache : remove inp_s_mask

It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.

* kv-cache : fix non-consecutive token pos warning for recurrent models

The problem was apparently caused by how the tail cells were swapped.

* graph : simplify logic for recurrent state copies

* kv-cache : use cell without src refs for rs_z in recurrent cache

* llama-graph : fix recurrent state copy

The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.

Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.

* llama-graph : rename n_state to state_size in build_recurrent_state

This naming should reduce confusion between the state size
and the number of states.

Assets 15

10 Jun 17:02

github-actions

b5627

3678b83

b5627

llama : support GEGLU for jina-bert-v2 (#14090)

Assets 15

10 Jun 16:54

github-actions

b5625

3a12db2

b5625

Fixed spec timings to: accepted/tested instead of accepted/drafted (#…

Assets 15

10 Jun 16:09

github-actions

b5624

ae92c18

b5624

sync : ggml

ggml-ci

Assets 15

10 Jun 12:36

github-actions

b5622

97340b4

b5622

Vulkan: Don't default to CPU device (like llvmpipe), even if no other…

Assets 15

Releases: ggml-org/llama.cpp

b5634

Uh oh!

b5633

Uh oh!

b5632

Uh oh!

b5631

Uh oh!

b5630

Uh oh!

b5629

Uh oh!

b5627

Uh oh!

b5625

Uh oh!

b5624

Uh oh!

b5622

Uh oh!