Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ggml-org/llama.cpp
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master@{1day}
Choose a base ref
...
head repository: ggml-org/llama.cpp
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
  • 15 commits
  • 28 files changed
  • 11 contributors

Commits on Nov 10, 2025

  1. benches : add eval results (#17139)

    [no ci]
    ggerganov authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    15274c0 View commit details
    Browse the repository at this point in the history
  2. cuda/vulkan : bicubic interpolation (#17022)

    * vulkan : implement upscale with bicubic interpolation
    
    * cuda : implement upscale with bicubic interpolation
    
    * tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests
    
    * adapt OpenCL backend to not support the OP in that case so tests don't fail
    
    * print scale mode & flags in test-backend-ops
    Acly authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    1032256 View commit details
    Browse the repository at this point in the history
  3. editorconfig : ignore benches/ (#17140)

    [no ci]
    ggerganov authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    9898b57 View commit details
    Browse the repository at this point in the history
  4. mtmd: fix patch_size initialized to random value in audio models (#17128

    )
    
    * mtmd: fix patch_size initialized to random value in audio models
    
    * add default hparams
    ngxson authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    4b13a68 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f914544 View commit details
    Browse the repository at this point in the history
  6. arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_do…

    …t_q6_K_… (#15277)
    
    * add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K
    
    * Surround SVE function with compiler directive
    
    * fix compile switch
    
    * fix coding style
    
    * ggml : fix indent
    
    ---------
    
    Co-authored-by: Georgi Gerganov <[email protected]>
    fj-y-saito and ggerganov authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    df70bed View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    c27efd2 View commit details
    Browse the repository at this point in the history
  8. memory: Hybrid context shift (#17009)

    * feat(memory): Only fail partial erasure of recurrent tail
    
    The recurrent state is always assumed to be the state as of the last update
    from the final token in the sequence. When doing a partial erasure, if the
    range does not include the final token, the erasure can be considered a
    success since any memory used for the sequence prior to the final token
    (which is no memory) has been successfully removed.
    
    There is one potential case that this doesn't address which is the pruning
    of cache to remove sensitive data from the context. This wouldn't work for
    attention cache partial removal (in the middle) either since the KV state
    is linearly-dependent and states in later sequence positions would still be
    based on the state from the sensitive data, even if that data is no longer
    cached, so I don't think this is relevant, but it is worth noting that the
    semantics of this change for a partial erasure in the middle of the cache
    are essentially "my context is already compressed" and not "all trace of
    the removed tokens has been removed."
    
    #16768
    Branch: HybridContextShift-16768
    
    Signed-off-by: Gabe Goodhart <[email protected]>
    
    * fix(main): Check the output of seq_rm for prefix matching
    
    This prefix matching is explicitly attempting to remove the tokens at the
    end of the sequence that don't match. This is the operation that can't be
    performed on a recurrent cache due to the state being updated in place, so
    if this removal fails, we need to clear the whole cache.
    
    #16768
    Branch: HybridContextShift-16768
    
    Signed-off-by: Gabe Goodhart <[email protected]>
    
    * fix(memory): Fix condition for partial erasure failure if p0 > pos
    
    Signed-off-by: Gabe Goodhart <[email protected]>
    
    Co-authored-by: compilade <[email protected]>
    
    * style: Fix extra parens
    
    Signed-off-by: Gabe Goodhart <[email protected]>
    
    Co-authored-by: Georgi Gerganov <[email protected]>
    
    * fix(main.cpp): Set n_matching_session_tokens to 0 on cache clear
    
    #16768
    Branch: HybridContextShift-16768
    
    Signed-off-by: Gabe Goodhart <[email protected]>
    
    ---------
    
    Signed-off-by: Gabe Goodhart <[email protected]>
    Co-authored-by: compilade <[email protected]>
    Co-authored-by: Georgi Gerganov <[email protected]>
    3 people authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    0c74f32 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    85234a4 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    f117be1 View commit details
    Browse the repository at this point in the history
  11. ggml-cpu : inspect -march and -mcpu to found the CPU (#16333)

    Signed-off-by: Adrien Gallouët <[email protected]>
    angt authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    967eb4b View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    13730c1 View commit details
    Browse the repository at this point in the history
  13. cpu: skip NOPs to avoid barriers (#17133)

    * cpu: skip NOPs to avoid barriers
    
    * cpu: use ggml_op_is_empty
    max-krasnyansky authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    395e286 View commit details
    Browse the repository at this point in the history
  14. models : move build_inp_out_ids outside loop (#17151)

    * move build_inp_out_ids outside loop
    
    * realign
    CISC authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    7bef684 View commit details
    Browse the repository at this point in the history
  15. opencl: add fastdiv and use it in set_rows, ported from cuda (#17090)

    * opencl: add fastdiv for mm q8_0
    
    * opencl: use uint4 for fastdiv vals
    
    * opencl: use fastdiv for set_rows
    
    * opencl: do not use fastdiv for q8_0 mm
    lhez authored Nov 10, 2025
    Configuration menu
    Copy the full SHA
    ece0f5c View commit details
    Browse the repository at this point in the history
Loading