Managing training
In this section, we will go through some of the common challenges that you may encounter while managing the training of DL models. This includes troubleshooting in terms of saving model parameters and debugging the model logic efficiently.
Saving model hyperparameters
There is often a need to save the model's hyperparameters. A few reasons are reproducibility, consistency, and that some models' network architecture are extremely sensitive to hyperparameters.
On more than one occasion, you may find yourself being unable to load the model from the checkpoint. The load_from_checkpoint method of the LightningModule class fails with an error.
Solution
A checkpoint is nothing more than a saved state of the model. Checkpoints contain precise values of all parameters used by the model. However, hyperparameter arguments passed to the __init__ model are not saved in the checkpoint by default. Calling self.save_hyperparameters inside __init__ of the...