gradient tensors with a non-cpu backend

Hi,

I'd like to verify some understanding about gradient tensors and memory allocation.  I set-up a simple tutorial to do a forward and backward pass (code below), and realized after debugging that I have to call "ggml_backend_alloc_ctx_tensors(context, backend);" after all graphs constructed so all intermediate and gradient tensors are created. 

Is that how it's supposed to work? does this duplicate memory if they're allocated on main memory first before being allocated on the backend (in my case Metal).
In that case, do we need to setup this allocator: ggml_gallocr_alloc_graph(allocr, gf)?

Here's the working code: (see where the allocation of tensors is done)

    ggml_backend_t backend = ggml_backend_metal_init();

    ggml_init_params params = {
        .mem_size   = ggml_graph_overhead() * 2 + 64 * ggml_tensor_overhead(),
        .mem_buffer = NULL,
        .no_alloc   = true,
    };
    ggml_context * ctx0 = ggml_init(params);

    ggml_tensor * x = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); 
    ggml_set_name(x, "x");
    ggml_set_param(ctx0, x);
    
    ggml_tensor * a = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); 
    ggml_set_name(a, "a");

    ggml_tensor * b = ggml_mul(ctx0, x, x); 
    ggml_set_name(b, "b = x * x");
    
    ggml_tensor * f = ggml_mul(ctx0, b, a); ggml_set_name(f, "f = a * x * x");
    ggml_set_loss(f);

    ggml_gallocr_t allocr = ggml_gallocr_new(ggml_backend_get_default_buffer_type(backend));

    // forward graph
    ggml_cgraph * gf = ggml_new_graph_custom(ctx0, 32, true);
    ggml_gallocr_alloc_graph(allocr, gf);

    ggml_build_forward_expand(gf, f);

    // backprop graph
    ggml_cgraph * gb = ggml_graph_dup(ctx0, gf);
    ggml_gallocr_alloc_graph(allocr, gb);

    ggml_build_backward_expand(ctx0, ctx0, gb, false);

    // allocate all tensors on the backend
    // need to do all of this after building all graphs
    ggml_backend_alloc_ctx_tensors(ctx0, backend);

    // set initial values
    float val = 2.f;
    ggml_backend_tensor_set(x, &val, 0, sizeof(val));

    val = 3.f;
    ggml_backend_tensor_set(a, &val, 0, sizeof(val));

    ggml_graph_reset(gf);

    val = 1.f;
    ggml_backend_tensor_set(f, &val, 0, sizeof(val));

    // set gradient in lieu of a loss function
    ggml_tensor * f_grad   = ggml_graph_get_grad(gb, f);
    if (f_grad) {
        ggml_backend_tensor_set(f_grad, &val, 0, sizeof(val));
    }

    // compute forward
    ggml_graph_compute_with_ctx(ctx0, gf, 1);

    // compute backward
    ggml_graph_compute_with_ctx(ctx0, gb, 1);

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gradient tensors with a non-cpu backend #1215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gradient tensors with a non-cpu backend #1215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions