Open
Description
Hi,
I'd like to verify some understanding about gradient tensors and memory allocation. I set-up a simple tutorial to do a forward and backward pass (code below), and realized after debugging that I have to call "ggml_backend_alloc_ctx_tensors(context, backend);" after all graphs constructed so all intermediate and gradient tensors are created.
Is that how it's supposed to work? does this duplicate memory if they're allocated on main memory first before being allocated on the backend (in my case Metal).
In that case, do we need to setup this allocator: ggml_gallocr_alloc_graph(allocr, gf)?
Here's the working code: (see where the allocation of tensors is done)
ggml_backend_t backend = ggml_backend_metal_init();
ggml_init_params params = {
.mem_size = ggml_graph_overhead() * 2 + 64 * ggml_tensor_overhead(),
.mem_buffer = NULL,
.no_alloc = true,
};
ggml_context * ctx0 = ggml_init(params);
ggml_tensor * x = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1);
ggml_set_name(x, "x");
ggml_set_param(ctx0, x);
ggml_tensor * a = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1);
ggml_set_name(a, "a");
ggml_tensor * b = ggml_mul(ctx0, x, x);
ggml_set_name(b, "b = x * x");
ggml_tensor * f = ggml_mul(ctx0, b, a); ggml_set_name(f, "f = a * x * x");
ggml_set_loss(f);
ggml_gallocr_t allocr = ggml_gallocr_new(ggml_backend_get_default_buffer_type(backend));
// forward graph
ggml_cgraph * gf = ggml_new_graph_custom(ctx0, 32, true);
ggml_gallocr_alloc_graph(allocr, gf);
ggml_build_forward_expand(gf, f);
// backprop graph
ggml_cgraph * gb = ggml_graph_dup(ctx0, gf);
ggml_gallocr_alloc_graph(allocr, gb);
ggml_build_backward_expand(ctx0, ctx0, gb, false);
// allocate all tensors on the backend
// need to do all of this after building all graphs
ggml_backend_alloc_ctx_tensors(ctx0, backend);
// set initial values
float val = 2.f;
ggml_backend_tensor_set(x, &val, 0, sizeof(val));
val = 3.f;
ggml_backend_tensor_set(a, &val, 0, sizeof(val));
ggml_graph_reset(gf);
val = 1.f;
ggml_backend_tensor_set(f, &val, 0, sizeof(val));
// set gradient in lieu of a loss function
ggml_tensor * f_grad = ggml_graph_get_grad(gb, f);
if (f_grad) {
ggml_backend_tensor_set(f_grad, &val, 0, sizeof(val));
}
// compute forward
ggml_graph_compute_with_ctx(ctx0, gf, 1);
// compute backward
ggml_graph_compute_with_ctx(ctx0, gb, 1);
Thank you!
Metadata
Metadata
Assignees
Labels
No labels