Closed
Description
Hi,
I would like to first thank you for open-sourcing your code for the community.
During using the code for fp16 (or amp) training, I found something confusing at https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/fp16_util.py#L202 . Why do you only scale the gradient for scalar parameters? what about the gradient of matrix parameters?
Sincerely looking forward to your reply.
Metadata
Metadata
Assignees
Labels
No labels