You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rank0]: File "/home/ms-swift/swift/trainers/callback.py", line 95, in on_epoch_end
[rank0]: if args.max_epochs <= math.ceil(state.epoch):
[rank0]: TypeError: '<=' not supported between instances of 'NoneType' and 'int'
Train: 50%|███████████████████████████████ | 269/538 [18:24<18:24, 4.11s/it]
[rank0]:[W511 00:19:28.037565019 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]: File "/home/ms-swift/swift/trainers/callback.py", line 95, in on_epoch_end
[rank0]: if args.max_epochs <= math.ceil(state.epoch):
[rank0]: TypeError: '<=' not supported between instances of 'NoneType' and 'int'
Train: 50%|███████████████████████████████ | 269/538 [18:24<18:24, 4.11s/it]
[rank0]:[W511 00:19:28.037565019 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
训练中被强行终止
--num_train_epochs设置为2,已经保存两个文件:checkpoint-100、checkpoint-200
发现最近有更新,是否和这个更新有关?
#4125
The text was updated successfully, but these errors were encountered: