-
Notifications
You must be signed in to change notification settings - Fork 109
Description
I only change the config file (free_anchor_R-50-FPN_test.txt), and got the error
2020-01-16 15:49:05,738 maskrcnn_benchmark.trainer INFO: eta: 3:46:57 iter: 244400 loss: 1.7938 (1.8977) loss_retina_positive: 1.6451 (1.7404) loss_retina_negative: 0.1402 (0.1573) time: 0.1097 (0.1178) data: 0.0042 (0.0045) lr: 0.010000 max mem: 1404
2020-01-16 15:49:08,161 maskrcnn_benchmark.trainer INFO: eta: 3:46:55 iter: 244420 loss: 1.7646 (1.8977) loss_retina_positive: 1.6248 (1.7404) loss_retina_negative: 0.1239 (0.1573) time: 0.1109 (0.1178) data: 0.0041 (0.0045) lr: 0.010000 max mem: 1404
2020-01-16 15:49:10,560 maskrcnn_benchmark.trainer INFO: eta: 3:46:52 iter: 244440 loss: 1.8001 (1.8977) loss_retina_positive: 1.6412 (1.7404) loss_retina_negative: 0.1554 (0.1573) time: 0.1126 (0.1178) data: 0.0040 (0.0045) lr: 0.010000 max mem: 1404
2020-01-16 15:49:12,817 maskrcnn_benchmark.trainer INFO: eta: 3:46:50 iter: 244460 loss: 1.7907 (1.8977) loss_retina_positive: 1.6191 (1.7404) loss_retina_negative: 0.1470 (0.1573) time: 0.1076 (0.1178) data: 0.0037 (0.0045) lr: 0.010000 max mem: 1404
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=59 : device-side assert triggered
/opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [2887,0,0], thread: [16,0,0] Assertion *input >= 0. && *input <= 1. failed.
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/zz/work/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 70, in do_train
loss_dict_reduced = reduce_loss_dict(loss_dict)
File "/home/zz/work/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 28, in reduce_loss_dict
all_losses = torch.stack(all_losses, dim=0)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCCachingHostAllocator.cpp:265
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered (insert_events at /opt/conda/conda-bld/pytorch_1556653215914/work/c10/cuda/CUDACachingAllocator.cpp:564)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcb2ed3fdc5 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x14792 (0x7fcb2bc1c792 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x50 (0x7fcb2ed2f640 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x3067fb (0x7fcb2c33c7fb in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: + 0x14019b (0x7fcb54b2019b in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x3bfc84 (0x7fcb54d9fc84 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x3bfcd1 (0x7fcb54d9fcd1 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: + 0x19dfce (0x56446760afce in /home/zz/anaconda3/envs/fa/bin/python)
frame #8: + 0x113a6b (0x564467580a6b in /home/zz/anaconda3/envs/fa/bin/python)
frame #9: + 0x103948 (0x564467570948 in /home/zz/anaconda3/envs/fa/bin/python)
frame #10: + 0x114267 (0x564467581267 in /home/zz/anaconda3/envs/fa/bin/python)
frame #11: + 0x11427d (0x56446758127d in /home/zz/anaconda3/envs/fa/bin/python)
frame #12: + 0x11427d (0x56446758127d in /home/zz/anaconda3/envs/fa/bin/python)
frame #13: + 0x11427d (0x56446758127d in /home/zz/anaconda3/envs/fa/bin/python)
frame #14: PyDict_SetItem + 0x502 (0x5644675cc602 in /home/zz/anaconda3/envs/fa/bin/python)
frame #15: PyDict_SetItemString + 0x4f (0x5644675cd0cf in /home/zz/anaconda3/envs/fa/bin/python)
frame #16: PyImport_Cleanup + 0x9e (0x56446760c91e in /home/zz/anaconda3/envs/fa/bin/python)
frame #17: Py_FinalizeEx + 0x67 (0x564467682367 in /home/zz/anaconda3/envs/fa/bin/python)
frame #18: + 0x227d93 (0x564467694d93 in /home/zz/anaconda3/envs/fa/bin/python)
frame #19: _Py_UnixMain + 0x3c (0x5644676950bc in /home/zz/anaconda3/envs/fa/bin/python)
frame #20: __libc_start_main + 0xe7 (0x7fcb651ccb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #21: + 0x1d0990 (0x56446763d990 in /home/zz/anaconda3/envs/fa/bin/python)