While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!”” #3930

KevinClaint · 2025-04-18T08:19:40Z

我在用GRPO的时候出现了下图的情况，这个情况发生的很随机，会在training过程中的任何一个步骤中出现

KevinClaint · 2025-04-18T08:20:06Z

effortprogrammer · 2025-04-22T06:51:36Z

Did you solve the issue?

KevinClaint · 2025-04-22T12:14:20Z

I turned down my learning rate and then changed the initial REWARD from 0 to 1e-4 (a very small value), after which it didn't happen. Tried and true from other issues as well.
volcengine/verl#747

In the meantime, I'm hoping that someone with more authority can help come up with a more generalized method and answer the question of why the error occurs

effortprogrammer · 2025-04-23T00:34:33Z

I don't understand how did you change initial reward from 0 to 1e-4.. Can you give more context with this?

In addition, can you provide which versions of libraries you are currently using?

effortprogrammer · 2025-04-23T00:40:56Z

cc. @Jintao-Huang Can you tag some people that can help with this issue?

KevinClaint · 2025-04-23T04:07:34Z

I don't understand how did you change initial reward from 0 to 1e-4.. Can you give more context with this?

In addition, can you provide which versions of libraries you are currently using?

For example， if the model doesn't output the answer that can get the reward, the reward will be computed as zero. I just change it into 1e-4

hjh0119 · 2025-04-23T05:48:00Z

what's the version of swift? I believe the issue with NaN gradients has been fixed.

https://github.com/modelscope/ms-swift/blob/main/swift/trainers/mixin.py#L264-L281

JingMog · 2025-04-23T14:54:48Z

My swfit version is 3.3.0 dev0, same problem, the grad clip seems not work. I use deepspeed zero2.

KevinClaint · 2025-04-26T13:01:38Z

My swfit version is 3.3.0 dev0, same problem, the grad clip seems not work. I use deepspeed zero2.

My swfit version is the same version as yours. I also use deepspeed zero2. And I find that my problem can't be solved by using the method mentioned above. It must be something wrong, but it comes casually. I can't be stable to re-produce this problem again.

KevinClaint · 2025-04-26T13:03:43Z

I also find that it occurs in the first 2k steps (most of them are in 1k steps). And after 2k steps, I never meet it.

zhangansen · 2025-05-06T08:42:56Z

我把我的temperature设为0，一开始就出现!!!!!!

zhangansen · 2025-05-06T08:43:11Z

温度设为其他就没事了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!”” #3930

While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!”” #3930

KevinClaint commented Apr 18, 2025 •

edited

Loading

KevinClaint commented Apr 18, 2025

effortprogrammer commented Apr 22, 2025

KevinClaint commented Apr 22, 2025 •

edited

Loading

effortprogrammer commented Apr 23, 2025 •

edited

Loading

effortprogrammer commented Apr 23, 2025

KevinClaint commented Apr 23, 2025

hjh0119 commented Apr 23, 2025

JingMog commented Apr 23, 2025

KevinClaint commented Apr 26, 2025 •

edited

Loading

KevinClaint commented Apr 26, 2025

zhangansen commented May 6, 2025

zhangansen commented May 6, 2025

While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!”” #3930

While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!”” #3930

Comments

KevinClaint commented Apr 18, 2025 • edited Loading

KevinClaint commented Apr 18, 2025

effortprogrammer commented Apr 22, 2025

KevinClaint commented Apr 22, 2025 • edited Loading

effortprogrammer commented Apr 23, 2025 • edited Loading

effortprogrammer commented Apr 23, 2025

KevinClaint commented Apr 23, 2025

hjh0119 commented Apr 23, 2025

JingMog commented Apr 23, 2025

KevinClaint commented Apr 26, 2025 • edited Loading

KevinClaint commented Apr 26, 2025

zhangansen commented May 6, 2025

zhangansen commented May 6, 2025

KevinClaint commented Apr 18, 2025 •

edited

Loading

KevinClaint commented Apr 22, 2025 •

edited

Loading

effortprogrammer commented Apr 23, 2025 •

edited

Loading

KevinClaint commented Apr 26, 2025 •

edited

Loading