Skip to content

Conversation

@iden-kalemaj
Copy link
Contributor

Summary:
We use SimpleDistributedPerLayerOptimizer instead of DistributedPerLayerOptimizer.

The latter causes an issue when switching to register_full_backward_hook.

The issue arises because DistributedPerLayerOptimizer uses per-parameter hooks on top of the per-module hooks. During the backward pass, the per-parameter hooks fire before the per-module hooks. Per-sample gradients are computed when the per-module hooks fire, and an error occurs when the per-parameter hooks try to access the per-sample gradients before they are computed. Forcing the order in which hooks are called is not possible with PyTorch.

Differential Revision: D72420168

…-pytorch#720)

Summary:
Pull Request resolved: meta-pytorch#720

register_backward_hook is deprecated and may lead to errors in gradient calculation. We switch to the supported register_full_backward_hook.

Differential Revision: D68562558

Reviewed By: HuanyuZhang
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72420168

iden-kalemaj added a commit to iden-kalemaj/opacus that referenced this pull request Apr 3, 2025
…ytorch#750)

Summary:

We use SimpleDistributedPerLayerOptimizer instead of DistributedPerLayerOptimizer.

The latter causes an issue when switching to `register_full_backward_hook`.

The issue arises because DistributedPerLayerOptimizer uses per-parameter hooks on top of the per-module hooks. During the backward pass, the per-parameter hooks fire before the per-module hooks. Per-sample gradients are computed when the per-module hooks fire, and an error occurs when the per-parameter hooks try to access the per-sample gradients before they are computed. Forcing the order in which hooks are called is not possible with PyTorch.

Differential Revision: D72420168
…ytorch#750)

Summary:
Pull Request resolved: meta-pytorch#750

We use SimpleDistributedPerLayerOptimizer instead of DistributedPerLayerOptimizer.

The latter causes an issue when switching to `register_full_backward_hook`.

The issue arises because DistributedPerLayerOptimizer uses per-parameter hooks on top of the per-module hooks. During the backward pass, the per-parameter hooks fire before the per-module hooks. Per-sample gradients are computed when the per-module hooks fire, and an error occurs when the per-parameter hooks try to access the per-sample gradients before they are computed. Forcing the order in which hooks are called is not possible with PyTorch.

Differential Revision: D72420168
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72420168

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 58f11ec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants