Tags · xek/llama.cpp

b5787

Add Conv2d for CPU (ggml-org#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

Jun 30, 2025
0a5a3b5
zip
tar.gz

b5785

metal : disable fast-math for some cpy kernels (ggml-org#14460)

* metal : disable fast-math for some cpy kernels

ggml-ci

* cont : disable for q4_1

ggml-ci

* cont : disable for iq4_nl

ggml-ci

Jun 30, 2025
5dd942d
zip
tar.gz

b5784

ggml-cpu: sycl: Re-enable exp f16 (ggml-org#14462)

Jun 30, 2025
a7417f5
zip
tar.gz

b5783

test-backend-ops : disable llama test (ggml-org#14461)

Jun 30, 2025
eb3fa29
zip
tar.gz

b5782

cmake : Remove redundant include path in CMakeLists.txt (ggml-org#14452)

* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动

* Remove redundant include path in CMakeLists.txt

The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths.

* Enable scheduled Docker image builds

Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.

Jun 30, 2025
c839a2d
zip
tar.gz

b5780

server : support jinja extra template kwargs (Qwen3 enable_thinking f…

…eature), from command line and from client (ggml-org#13196)

* initial commit for handling extra template kwargs

* enable_thinking and assistant prefill cannot be enabled at the same time

* can set chat_template_kwargs in command line

* added doc

* fixed formatting

* add support for extra context in generic template init

* coding standard: common/chat.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* coding standard:  common/chat.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Apply suggestions from code review

coding standard: cosmetic changes

Co-authored-by: Georgi Gerganov <[email protected]>

* fix merge conflict

* chat.cpp: simplify calls to apply to ensure systematic propagation of extra_context (+ the odd existing additional_context)

* normalize environment variable name

* simplify code

* prefill cannot be used with thinking models

* compatibility with the new reasoning-budget parameter

* fix prefill for non thinking models

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Olivier Chafik <[email protected]>

Jun 29, 2025
caf5681
zip
tar.gz

b5778

SYCL: disable faulty fp16 exp kernel (ggml-org#14395)

* SYCL: disable faulty fp16 CPU exponent for now

* Revert "SYCL: disable faulty fp16 CPU exponent for now"

This reverts commit ed0aab1.

* SYCL: disable faulty fp16 CPU exponent for now

* Fix logic of disabling exponent kernel

Jun 29, 2025
f47c1d7
zip
tar.gz

b5777

ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (ggml-org#14443)

Jun 29, 2025
a5d1fb6
zip
tar.gz

b5775

vulkan: Add fusion support for RMS_NORM+MUL (ggml-org#14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <[email protected]>

Jun 29, 2025
bd9c981
zip
tar.gz

b5774

CUDA: add bf16 and f32 support to cublas_mul_mat_batched (ggml-org#14361

)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools

Jun 28, 2025
27208bf
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b5787

b5785

b5784

b5783

b5782

b5780

b5778

b5777

b5775

b5774

Tags: xek/llama.cpp