Tags: philiptaron/llama.cpp
Tags
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (ggml-org#5434) * CUDA: mul_mat_vec_q tiling, refactor mul mat logic Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
android : introduce starter project example (ggml-org#4926) * Introduce starter project for Android Based on examples/llama.swiftui. * Add github workflow * Set NDK version * Only build arm64-v8a in CI * Sync bench code * Rename CI prop to skip-armeabi-v7a * Remove unused tests
llama.swiftui : update models layout (ggml-org#4826) * Updated Models Layout - Added a models drawer - Added downloading directly from Hugging Face - Load custom models from local folder - Delete models by swiping left * trimmed trailing white space * Updated Models Layout
llama : add AWQ for llama, llama2, mpt, and mistral models (ggml-org#… …4593) * update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci --------- Co-authored-by: Trần Đức Nam <[email protected]> Co-authored-by: Le Hoang Anh <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
fallback to CPU buffer if host buffer alloc fails (ggml-org#4610)