Skip to content

Tags: philiptaron/llama.cpp

Tags

b2128

Toggle b2128's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (ggml-org#5434)

* CUDA: mul_mat_vec_q tiling, refactor mul mat logic

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b1894

Toggle b1894's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
py : remove unnecessary hasattr (ggml-org#4903)

b1886

Toggle b1886's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
android : introduce starter project example (ggml-org#4926)

* Introduce starter project for Android

Based on examples/llama.swiftui.

* Add github workflow

* Set NDK version

* Only build arm64-v8a in CI

* Sync bench code

* Rename CI prop to skip-armeabi-v7a

* Remove unused tests

b1840

Toggle b1840's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama.swiftui : update models layout (ggml-org#4826)

* Updated Models Layout

- Added a models drawer
- Added downloading directly from Hugging Face
- Load custom models from local folder
- Delete models by swiping left

* trimmed trailing white space

* Updated Models Layout

b1708

Toggle b1708's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : add AWQ for llama, llama2, mpt, and mistral models (ggml-org#…

…4593)

* update: awq support llama-7b model

* update: change order

* update: benchmark results for llama2-7b

* update: mistral 7b v1 benchmark

* update: support 4 models

* fix: Readme

* update: ready for PR

* update: readme

* fix: readme

* update: change order import

* black

* format code

* update: work for bot mpt and awqmpt

* update: readme

* Rename to llm_build_ffn_mpt_awq

* Formatted other files

* Fixed params count

* fix: remove code

* update: more detail for mpt

* fix: readme

* fix: readme

* update: change folder architecture

* fix: common.cpp

* fix: readme

* fix: remove ggml_repeat

* update: cicd

* update: cicd

* uppdate: remove use_awq arg

* update: readme

* llama : adapt plamo to new ffn

ggml-ci

---------

Co-authored-by: Trần Đức Nam <[email protected]>
Co-authored-by: Le Hoang Anh <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b1699

Toggle b1699's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
simplify bug issue template (ggml-org#4623)

b1696

Toggle b1696's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fallback to CPU buffer if host buffer alloc fails (ggml-org#4610)

b1680

Toggle b1680's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : change ggml_scale to take a float instead of tensor (ggml-org#…

…4573)

* ggml : change ggml_scale to take a float instead of tensor

* ggml : fix CPU implementation

* tests : fix test-grad0

ggml-ci

gguf-v0.4.0

Toggle gguf-v0.4.0's commit message
gguf version 0.4.0