[sync #10544] llama/ggml: add LLM training support #13105

ggerganov · 2025-04-25T08:50:38Z

original #10544

This is a rebase of the #10544 PR by @JohannesGaessler on top of the upcoming #12799 (edit: merged now into master). The purpose is only to highlight the necessary changes that need to be applied to #10544.

Testing with:

make -j && ./bin/llama-finetune --file ./wikitext-2-raw/wiki.test.raw --model ../models/llama-3.2-3b/ggml-model-f32.gguf -c 512 -b 512 -ub 512

TODOs:

Currently test-backend-ops asserts because ggml_set_param asserts tensor->op == GGML_OP_NONE, but does not take into account that the tensor could be a view.

ggerganov · 2025-04-25T09:05:39Z

@JohannesGaessler This is a tentative sync - still need to wait for #12799 to get merged. The optimization code in libllama is well implemented and IMO it's OK to merge it as proposed. The optimization context could maybe be separated from the llama_context to improve the design, but it's something that can be done separately.

In #12799, the batch management is delegated to the KV cache object, so I've updated llama_context::opt_epoch_iter to use that.

zhouwg · 2025-05-02T11:14:36Z

this new feature is very helpful for AI beginners(such as me) to understand more details in hard-core AI tech. thanks too much!

more compact progress bar refactor: llama_prepare_sbatch/ubatch llama_save_model_to_file gqa_mode arg for repeat_back llama_opt_param_filter ggml_graph_dup force_grads refactor ggml_opt, fix test-opt

ggml-ci

ggerganov · 2025-05-02T18:37:06Z

@JohannesGaessler I've rebased this and should be good to update #10544 respectively and merge. Let me know if something does not work as expected.

JohannesGaessler · 2025-05-02T19:02:02Z

Thank you. I'll take a look when I get a chance.

github-actions bot added testing Everything test related examples ggml changes relating to the ggml tensor library for machine learning labels Apr 25, 2025

ggerganov force-pushed the gg/llama-kv-cache-v6 branch 5 times, most recently from 780d6fb to 58115a2 Compare May 2, 2025 10:28

ggerganov force-pushed the gg/llama-kv-cache-v6 branch from 58115a2 to 7e79a42 Compare May 2, 2025 13:02

Base automatically changed from gg/llama-kv-cache-v6 to master May 2, 2025 14:48

JohannesGaessler and others added 4 commits May 2, 2025 21:23

llama/ggml: add LLM training support

111c9c7

more compact progress bar refactor: llama_prepare_sbatch/ubatch llama_save_model_to_file gqa_mode arg for repeat_back llama_opt_param_filter ggml_graph_dup force_grads refactor ggml_opt, fix test-opt

try CI fix

4e73b81

opt : fix n_outputs

cee751c

ggml-ci

opt : remove print [no ci]

15dea7b

ggerganov force-pushed the jg/llama-opt-3 branch from b2fce25 to 15dea7b Compare May 2, 2025 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sync #10544] llama/ggml: add LLM training support #13105

[sync #10544] llama/ggml: add LLM training support #13105

ggerganov commented Apr 25, 2025 •

edited

Loading

ggerganov commented Apr 25, 2025

zhouwg commented May 2, 2025 •

edited

Loading

ggerganov commented May 2, 2025

JohannesGaessler commented May 2, 2025

[sync #10544] llama/ggml: add LLM training support #13105

Are you sure you want to change the base?

[sync #10544] llama/ggml: add LLM training support #13105

Conversation

ggerganov commented Apr 25, 2025 • edited Loading

ggerganov commented Apr 25, 2025

zhouwg commented May 2, 2025 • edited Loading

ggerganov commented May 2, 2025

JohannesGaessler commented May 2, 2025

ggerganov commented Apr 25, 2025 •

edited

Loading

zhouwg commented May 2, 2025 •

edited

Loading