Skip to content

Tags: bssrdf/llama.cpp

Tags

b4681

Toggle b4681's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

b4524

Toggle b4524's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add Jinja template support (ggml-org#11016)

* Copy minja from google/minja@58f0ca6

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (google/minja#22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to google/minja@b8437df

* Update minja to google/minja#25

* Update minja from google/minja#27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b2703

Toggle b2703's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llava : use logger in llava-cli (ggml-org#6797)

This change removes printf() logging so llava-cli is shell scriptable.

b2699

Toggle b2699's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: add ubuntu latest release and fix missing build number (mac & ubu…

…ntu) (ggml-org#6748)

b2251

Toggle b2251's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : add KV cache quantization options (ggml-org#5684)

b2116

Toggle b2116's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : use autoreleasepool to avoid memory leaks (ggml-org#5437)

There appears to be a known memory leak when using the
`MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in
[1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

This change-set wraps the `ggml_metal_graph_compute` in a
`@autoreleasepool`.

This commit addresses ggml-org#5436

b1967

Toggle b1967's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
android : use release cmake build type by default (ggml-org#5123)

b1803

Toggle b1803's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llava-cli : don't crash if --image flag is invalid (ggml-org#4835)

This change fixes an issue where supplying `--image missing-file` would
result in a segfault due to a null pointer being dereferenced. This can
result in distracting info being printed if robust crash analysis tools
are being used.

b1796

Toggle b1796's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : fix vld1q_s8_x4 32-bit compat (ggml-org#4828)

* ggml : fix vld1q_s8_x4 32-bit compat

ggml-ci

* ggml : fix 32-bit ARM compat (cont)

ggml-ci

b1795

Toggle b1795's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
CUDA: faster softmax via shared memory + fp16 math (ggml-org#4742)