Skip to content

Kernel keeps crashing at the exact same point while training during the 2nd epoch (p2ch12.training.LunaTrainingApp) #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
navpreetnp7 opened this issue Aug 21, 2020 · 2 comments

Comments

@navpreetnp7
Copy link

navpreetnp7 commented Aug 21, 2020

I am trying to train the luna model using data augmentation of chapter 12. The issue I am facing that the kernel crashes everytime during the end of the 2nd epoch of training set. The same behaviour is exhibited whether I run from Jupyter notebook or command line. If I check my resources during training ( attached ), it doesn't look like there is any memory shortage in ram or gpu.

Screenshot (2)

And here's the logs while training.

Screenshot (3)

Screenshot (4)

After this the training crashes. Can you please point out what seems to be the issue? I am running the exact same code except change in the path for the subset data that I downloaded in my local machine.
I am running Windows 10, 32 GB RAM, 8 GB GPU.
I also tried with num-workers = 4,6 with the same result (only slower), decreased the batch size to 64 and again same thing.
Also during the 2nd epoch, my systems seems to slow down as i experience some lag in switching tabs/windows but if i check the task manager as in screenshot, there is plenty of ram left.

Any help would be appreciated as I am new to deep learning and I am running a huge model for the first time. Thank you.

@melhzy
Copy link

melhzy commented Feb 5, 2021

Error keeps happening to my machine.

@MuhammedIkbalKARADELI
Copy link

MuhammedIkbalKARADELI commented Mar 6, 2024

I have same issues for different process. Did you solve your problem? If you solved this problem, could you help me about kernel crashes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants