gradient scaling in fp16 training

Hi, 

I would like to first thank you for open-sourcing your code for the community. 

During using the code for fp16 (or amp) training, I found something confusing at https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/fp16_util.py#L202 . Why do you only scale the gradient for scalar parameters? what about the gradient of matrix parameters?

Sincerely looking forward to your reply.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gradient scaling in fp16 training #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gradient scaling in fp16 training #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions