Skip to content

Tags: RakhithJK/llama.cpp

Tags

b4156

Toggle b4156's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
convert : XLMRoberta Type Vocab Size (ggml-org#10458)

This matches the key in common bert-based embedding models and may have a
value other than 1 in it.

Branch: XLMRobertaTypeVocabSize

Signed-off-by: Gabe Goodhart <[email protected]>

b3166

Toggle b3166's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
rpc : fix load/store misaligned addresses (ggml-org#7948)

b3163

Toggle b3163's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add support for sqrt on CUDA (ggml-org#7953)

* cuda sqrt support

* enable cuda in pca

* fix comments in pca

* add test

* add sqrt to ggml_backend_cuda_supports_op

* fix test

* new line

* Use F32 sqrtf instead of F64 sqrt

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b3162

Toggle b3162's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)

* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b3158

Toggle b3158's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
unicode : avoid char32_t (ggml-org#7957)

ggml-ci

b3156

Toggle b3156's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : fix handling of zero blocks in IQ quants (ggml-org#7955)

ggml-ci

b3154

Toggle b3154's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Vulkan Shader Refactor, Memory Debugging Option (ggml-org#7947)

* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory

* Improve debug log code

* Add memory debug output option

* Fix flake8

* Fix unnecessary high llama-3 VRAM use

b3153

Toggle b3153's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add `cvector-generator` example (ggml-org#7514)

* add control-vector-generator

* calc diff

* add comments

* proof-of-concept stdlib implementation

Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.

* param parsing, refactor, comments

Added basic command-line parameters for outfile and one each positive/negative prompt.

Refactored some messy code in PCA computation and GGUF exporting.

Left a bunch of comments regarding further work needed.

* example template completions

Implements an example template set built from the positive/negative prompts like the control vector Python implementation.

* add multi prompts, multi-thread for PCA

* fix mem error

* add debugs

* fix matrix transpose multiplication

you have got to be kidding me

* preliminary template/multiprompt support

model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish

* fix zero output & param parsing, functional templating

fixed a bug where the output file had no tensor data/was all zero

fixed a bug where single hyphen flags were not being correctly parsed

implements creation of templated prompts from input (still need to adapt based on model)

* fix square_diff matmul index range and CRLF->LF line endings

fixed a logic error where square_diff would not multiply all rows

fixed a formatting error where the provided completions.txt had CRLF line endings

* add command-line args for num threads, num completions file lines, always reload model

refactored a few things and did what the commit message says on the tin

* code aestheticization

* fix compiler warnings

* in-series multithreading for prompt embedding?

added commented-out code to attempt to start implementing mutlithreading for embedding in main

* remove unnecessary multithreading

* interim fix memory leak

* translated everything but PCA (I think)

* tentatively translate the rest

* fix ggml errors and make new ones

at least it compiles and runs

* fix cb_eval

* temporary commit while I move dev environments

it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent

* update debug statements

* pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped

* update comments

* (wip) refactor

* clean up PCA ggml implementation

* fix shape of v_diff_original

* add n_batch for pca

* working version

* remember to copy back the last_eigenvector

* fix n_completions

* bring back n_completions

* default n_pca_batch to 20

* fix macos build

* add to makefile all targets

* use ggml_format_name

* add readme

* fix .editorconfig

* use ggml_backend_tensor_copy

* attemp to fix compile problem on mac

* fix compile warn

* reuse allocr

* move param parser to common

* better error handling

* clean up a bit

* add print_usage

* shorten help msg

* beautify help msg

* escape prompt by default

* change compile target to llama-cvector-generator

* typo

* disable GPU for PCA

* code style

---------

Co-authored-by: Christian Zhou-Zheng <[email protected]>

b3152

Toggle b3152's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] remove global variables (ggml-org#7710)

* separate DPCT helpers outside

* replace global variables with context

* remove useless extra

* update mul_mat condition

* remove duplicate buft initialization

* remove duplicate extra and global work group size

* remove useless backend check

* remove duplicated extras

* use macro for group_size and remove cuda-related

b3151

Toggle b3151's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci : fix macos x86 build (ggml-org#7940)

In order to use old `macos-latest` we should use `macos-12`

Potentially will fix: ggml-org#6975