Kernel keeps crashing at the exact same point while training during the 2nd epoch (p2ch12.training.LunaTrainingApp) #17

navpreetnp7 · 2020-08-21T15:48:15Z

I am trying to train the luna model using data augmentation of chapter 12. The issue I am facing that the kernel crashes everytime during the end of the 2nd epoch of training set. The same behaviour is exhibited whether I run from Jupyter notebook or command line. If I check my resources during training ( attached ), it doesn't look like there is any memory shortage in ram or gpu.

And here's the logs while training.

After this the training crashes. Can you please point out what seems to be the issue? I am running the exact same code except change in the path for the subset data that I downloaded in my local machine.
I am running Windows 10, 32 GB RAM, 8 GB GPU.
I also tried with num-workers = 4,6 with the same result (only slower), decreased the batch size to 64 and again same thing.
Also during the 2nd epoch, my systems seems to slow down as i experience some lag in switching tabs/windows but if i check the task manager as in screenshot, there is plenty of ram left.

Any help would be appreciated as I am new to deep learning and I am running a huge model for the first time. Thank you.

melhzy · 2021-02-05T09:03:26Z

Error keeps happening to my machine.

MuhammedIkbalKARADELI · 2024-03-06T07:52:47Z

I have same issues for different process. Did you solve your problem? If you solved this problem, could you help me about kernel crashes?

navpreetnp7 closed this as completed Aug 22, 2020

ghost mentioned this issue Jan 20, 2021

p2ch12 (training.py)- Training stops without error #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel keeps crashing at the exact same point while training during the 2nd epoch (p2ch12.training.LunaTrainingApp) #17

Kernel keeps crashing at the exact same point while training during the 2nd epoch (p2ch12.training.LunaTrainingApp) #17

navpreetnp7 commented Aug 21, 2020 •

edited

Loading

melhzy commented Feb 5, 2021

MuhammedIkbalKARADELI commented Mar 6, 2024 •

edited

Loading

Kernel keeps crashing at the exact same point while training during the 2nd epoch (p2ch12.training.LunaTrainingApp) #17

Kernel keeps crashing at the exact same point while training during the 2nd epoch (p2ch12.training.LunaTrainingApp) #17

Comments

navpreetnp7 commented Aug 21, 2020 • edited Loading

melhzy commented Feb 5, 2021

MuhammedIkbalKARADELI commented Mar 6, 2024 • edited Loading

navpreetnp7 commented Aug 21, 2020 •

edited

Loading

MuhammedIkbalKARADELI commented Mar 6, 2024 •

edited

Loading