Skip to content

gradient tensors with a non-cpu backend #1215

Open
@Rush2k

Description

@Rush2k

Hi,

I'd like to verify some understanding about gradient tensors and memory allocation. I set-up a simple tutorial to do a forward and backward pass (code below), and realized after debugging that I have to call "ggml_backend_alloc_ctx_tensors(context, backend);" after all graphs constructed so all intermediate and gradient tensors are created.

Is that how it's supposed to work? does this duplicate memory if they're allocated on main memory first before being allocated on the backend (in my case Metal).
In that case, do we need to setup this allocator: ggml_gallocr_alloc_graph(allocr, gf)?

Here's the working code: (see where the allocation of tensors is done)

ggml_backend_t backend = ggml_backend_metal_init();

ggml_init_params params = {
    .mem_size   = ggml_graph_overhead() * 2 + 64 * ggml_tensor_overhead(),
    .mem_buffer = NULL,
    .no_alloc   = true,
};
ggml_context * ctx0 = ggml_init(params);

ggml_tensor * x = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); 
ggml_set_name(x, "x");
ggml_set_param(ctx0, x);

ggml_tensor * a = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); 
ggml_set_name(a, "a");

ggml_tensor * b = ggml_mul(ctx0, x, x); 
ggml_set_name(b, "b = x * x");

ggml_tensor * f = ggml_mul(ctx0, b, a); ggml_set_name(f, "f = a * x * x");
ggml_set_loss(f);

ggml_gallocr_t allocr = ggml_gallocr_new(ggml_backend_get_default_buffer_type(backend));

// forward graph
ggml_cgraph * gf = ggml_new_graph_custom(ctx0, 32, true);
ggml_gallocr_alloc_graph(allocr, gf);

ggml_build_forward_expand(gf, f);

// backprop graph
ggml_cgraph * gb = ggml_graph_dup(ctx0, gf);
ggml_gallocr_alloc_graph(allocr, gb);

ggml_build_backward_expand(ctx0, ctx0, gb, false);

// allocate all tensors on the backend
// need to do all of this after building all graphs
ggml_backend_alloc_ctx_tensors(ctx0, backend);

// set initial values
float val = 2.f;
ggml_backend_tensor_set(x, &val, 0, sizeof(val));

val = 3.f;
ggml_backend_tensor_set(a, &val, 0, sizeof(val));

ggml_graph_reset(gf);

val = 1.f;
ggml_backend_tensor_set(f, &val, 0, sizeof(val));

// set gradient in lieu of a loss function
ggml_tensor * f_grad   = ggml_graph_get_grad(gb, f);
if (f_grad) {
    ggml_backend_tensor_set(f_grad, &val, 0, sizeof(val));
}

// compute forward
ggml_graph_compute_with_ctx(ctx0, gf, 1);

// compute backward
ggml_graph_compute_with_ctx(ctx0, gb, 1);

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions