Skip to content

ggml : various fixes #1450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2023
Merged

ggml : various fixes #1450

merged 1 commit into from
May 14, 2023

Conversation

ggerganov
Copy link
Member

The ggml_rope() fixes are irrelevant for LLaMA since n_rot == (n_embd / n_head), but it makes a difference for other models like GPT-J and GPT-NeoX where n_rot < (n_embd / n_head). I'm still not sure if this is the correct implementation, especially for the GPT-NeoX mode, but results kind of seem a bit better than before.

The non-inplace multi-thread ggml_diag_mask_inf() was broken here: #1428 . Again, irrelevant since in LLaMA forward we use ggml_diag_mask_inf_inplace(). Might be relevant to @xaedes

The "scratch buffers" fix might be relevant for LLaMA. See the new ggml_scratch_save() and ggml_scratch_load() functions and their usage in ggml.c: https://github.com/ggerganov/llama.cpp/blob/fixes/ggml.c#LL3925C1-L3939C1
The scratch buffers are mechanism for reusing memory from previous ops when it is no longer needed. The current way of using them is manual and very error-prone. Will hopefully come up with something better in the future.
More info here: ggml-org/whisper.cpp#431

- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant