Skip to content

Tags: xek/llama.cpp

Tags

b5787

Toggle b5787's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add Conv2d for CPU (ggml-org#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

b5785

Toggle b5785's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : disable fast-math for some cpy kernels (ggml-org#14460)

* metal : disable fast-math for some cpy kernels

ggml-ci

* cont : disable for q4_1

ggml-ci

* cont : disable for iq4_nl

ggml-ci

b5784

Toggle b5784's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-cpu: sycl: Re-enable exp f16 (ggml-org#14462)

b5783

Toggle b5783's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
test-backend-ops : disable llama test (ggml-org#14461)

b5782

Toggle b5782's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake : Remove redundant include path in CMakeLists.txt (ggml-org#14452)

* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动

* Remove redundant include path in CMakeLists.txt

The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths.

* Enable scheduled Docker image builds

Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.

b5780

Toggle b5780's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : support jinja extra template kwargs (Qwen3 enable_thinking f…

…eature), from command line and from client (ggml-org#13196)

* initial commit for handling extra template kwargs

* enable_thinking and assistant prefill cannot be enabled at the same time

* can set chat_template_kwargs in command line

* added doc

* fixed formatting

* add support for extra context in generic template init

* coding standard: common/chat.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* coding standard:  common/chat.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Apply suggestions from code review

coding standard: cosmetic changes

Co-authored-by: Georgi Gerganov <[email protected]>

* fix merge conflict

* chat.cpp: simplify calls to apply to ensure systematic propagation of extra_context (+ the odd existing additional_context)

* normalize environment variable name

* simplify code

* prefill cannot be used with thinking models

* compatibility with the new reasoning-budget parameter

* fix prefill for non thinking models

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Olivier Chafik <[email protected]>

b5778

Toggle b5778's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
SYCL: disable faulty fp16 exp kernel (ggml-org#14395)

* SYCL: disable faulty fp16 CPU exponent for now

* Revert "SYCL: disable faulty fp16 CPU exponent for now"

This reverts commit ed0aab1.

* SYCL: disable faulty fp16 CPU exponent for now

* Fix logic of disabling exponent kernel

b5777

Toggle b5777's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (ggml-org#14443)

b5775

Toggle b5775's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Add fusion support for RMS_NORM+MUL (ggml-org#14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <[email protected]>

b5774

Toggle b5774's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (ggml-org#14361

)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools