Code example for `stop_gradient()` #580

RahulBhalley · 2024-01-29T17:32:41Z

RahulBhalley
Jan 29, 2024

Could someone please help me with a code example how to use stop_gradient? In MLX docs it's said we don't need detach as in PyTorch but I wonder if stop_gradient fulfils the same purpose.

Answered by awni

Jan 29, 2024

def fun(x):
  return mx.exp(x) + mx.stop_gradient(mx.exp(x))

print(mx.grad(fun)(mx.array(1.0)))

Gives array(2.71828, dtype=float32).

So there you would only get the gradient through the first mx.exp.

Compare to:

def fun(x):
  return mx.exp(x) + mx.exp(x)

print(mx.grad(fun)(mx.array(1.0)))

Which gives you twice the result of the first (grad through both paths). Gives array(5.43656, dtype=float32).

View full answer

awni · 2024-01-29T19:00:04Z

awni
Jan 29, 2024
Maintainer

def fun(x):
  return mx.exp(x) + mx.stop_gradient(mx.exp(x))

print(mx.grad(fun)(mx.array(1.0)))

Gives array(2.71828, dtype=float32).

So there you would only get the gradient through the first mx.exp.

Compare to:

def fun(x):
  return mx.exp(x) + mx.exp(x)

print(mx.grad(fun)(mx.array(1.0)))

Which gives you twice the result of the first (grad through both paths). Gives array(5.43656, dtype=float32).

0 replies

stockeh · 2025-10-10T13:39:35Z

stockeh
Oct 10, 2025

@awni It may be helpful to have a no_grad decorator to simplify the following:

import functools

import mlx.core as mx


def no_grad(fn):
    @functools.wraps(fn)
    def wrapper(*args, **kwargs):
        out = fn(*args, **kwargs)
        if isinstance(out, mx.array):
            return mx.stop_gradient(out)
        elif isinstance(out, (tuple, list)):
            return type(out)(
                mx.stop_gradient(x) if isinstance(x, mx.array) else x for x in out
            )
        else:
            return out

    return wrapper


@no_grad
def fun(x):
    return mx.exp(x) + mx.exp(x)

print(mx.grad(fun)(mx.array(1.0)))
# array(0, dtype=float32)

5 replies

awni Oct 10, 2025
Maintainer

Just curious what did you have in mind to use that for? The no_grad idiom is not very common in MLX because it doesn't by default keep the tape around for comptuing gradients.

Like mx.eval(fun(x)) is just as memory efficient as mx.eval(mx.stop_gradient(fun(x)))

stockeh Oct 10, 2025

I'm reimplementing TRM (see Fig 3, https://arxiv.org/abs/2510.04871) in MLX and looking to have the no_grad components within the model's call, similar to the original implementation here. I was wanting to have a context or decorator to achieve this. Am I thinking of things backwards with best practices or do you have any recommendations?

awni Oct 10, 2025
Maintainer

I'm reimplementing TRM

Cool!!

awni Oct 10, 2025
Maintainer

I see. I think your decorator is fine for that use case if you split the no grad part out into it's own function. But it's also quite simple to do something like the following (which is equivalent):

        # H_cycles-1 without grad
        for _H_step in range(self.config.H_cycles-1):
            for _L_step in range(self.config.L_cycles):
                z_L = self.L_level(z_L, z_H + input_embeddings, **seq_info)
            z_H = self.L_level(z_H, z_L, **seq_info)

        # Add that
        z_H, z_L = mx.stop_gradient(z_H), mx.stop_gradient(z_L)

        # 1 with grad
        for _L_step in range(self.config.L_cycles):
            z_L = self.L_level(z_L, z_H + input_embeddings, **seq_info)
        z_H = self.L_level(z_H, z_L, **seq_info)

stockeh Oct 10, 2025

Simple indeed, that totally makes sense! Thank you.

Code example for stop_gradient() #580

Uh oh!

RahulBhalley Jan 29, 2024

Replies: 2 comments · 5 replies

Uh oh!

Uh oh!

awni Jan 29, 2024 Maintainer

Uh oh!

Uh oh!

stockeh Oct 10, 2025

Uh oh!

awni Oct 10, 2025 Maintainer

Uh oh!

Uh oh!

stockeh Oct 10, 2025

Uh oh!

Uh oh!

awni Oct 10, 2025 Maintainer

Uh oh!

awni Oct 10, 2025 Maintainer

Uh oh!

stockeh Oct 10, 2025

Code example for `stop_gradient()` #580

RahulBhalley
Jan 29, 2024

Replies: 2 comments 5 replies

awni
Jan 29, 2024
Maintainer

stockeh
Oct 10, 2025

awni Oct 10, 2025
Maintainer

awni Oct 10, 2025
Maintainer

awni Oct 10, 2025
Maintainer