ImageClassifier with ExponentialLRDecay: metrics not updated/calculated during validation #4807

gartangh · 2020-02-07T11:10:00Z

System information

OS version: Windows 10 Pro 18363
.NET Version: Core 2.1
Platform: x64
ML.NET version: 0.15-preview

Issue

What did you do?
I am trying to train an image classifier that makes use of ExponentialLRDecay.
I would like to see the metrics for training and validation for each epoch.

var options = new ImageClassificationTrainer.Options()
{
  LearningRateScheduler = new ExponentialLRDecay(),
  ValidationSetFraction = 0.1f,
  MetricsCallback = (metrics) => Console.WriteLine(metrics  + $"   CrossEntropy: {metrics.Train.CrossEntropy}, LearningRate: {metrics.Train.LearningRate})
};
var model = mlContext.MulticlassClassification.Trainers.ImageClassification(options);

(I added + $" CrossEntropy: {metrics.Train.CrossEntropy}, LearningRate: {metrics.Train.LearningRate}", because the crossentropy and learning rate are not printed by default for the validation set.)

What happened?
- The learning rate is not updated for the validation set (seen on every even row in the image).
- The cross-entropy is not calculated for the validation set (seen on every even row in the image).
- The learning rate is not updated after the second epoch like the default value of 2 for numEpochsPerDecay in ExponentialLRDecay(), but after the first instead (seen on the third row in the image). After that, the learning rate is correctly updated every 2 epochs. I'm not sure if this is the expected behavior.
What did you expect?
- I expected a decaying learning rate in the validation step, equal to the one in the training step.
- I expected the cross-entropy to be calculated in the validation step. The model with the highest Accuracy and lowest CrossEntropy are the best, so if 2 models perform equally well in terms of accuracy, the one with the lowest cross-entropy on the validation set should be picked.
- Further, I expected the learning rate to start decaying after the 2nd epoch.

The text was updated successfully, but these errors were encountered:

gartangh · 2020-02-07T15:06:32Z

After looking into some blog posts, I found that my point about the learning rate decaying too soon might be invalid.
Mostly, the learning rate indeed decays at epoch n-1 when starting to count epochs from 0 and numEpochsPerDecay=n.
So epoch 0 has the original learning rate and epoch 1 is already decayed when numEpochsPerDecay=2, or epoch 0-8 have the original learning rate and epoch 9 is decayed first when numEpochsPerDecay=10.
This is not really intuitive to me, but it might be correct!

gartangh · 2020-06-18T13:38:55Z

I created a minimal working example based on the DeepLearning_ImageClassification_Training example, that reproduces the bug here.
I updated to version 1.5.0, changed the ImageClassificationTrainer.Options, and added a custom MetricsCallback.

The initial learning rate is still returned for the validation set. (In version 1.4.0, it did return the correct learning rate.)
The cross-entropy is still not calculated for the validation set.
The learning rate might decay as intended. (In version 1.4.0, it did decay after n epochs, instead of n-1, but that might not have been as intended.)

mstfbl · 2020-06-23T06:53:31Z

Hi @gartangh ,

Thank you so much for bringing this issue to our attention, and providing a good repro. I successfully replicated your repro with our local ML.NET build and confirmed the issue.

I figured out that the problem with cross entropy during validation is that it simply isn't being updated in validation metrics, as it is missing in the below code snippet:

machinelearning/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs

Lines 1043 to 1048 in bb13d62

    
           if (statisticsCallback != null) 
        
           { 
        
               metrics.Train.Epoch = epoch; 
        
               metrics.Train.Accuracy /= metrics.Train.BatchProcessedCount; 
        
               metrics.Train.DatasetUsed = ImageClassificationMetrics.Dataset.Validation; 
        
               statisticsCallback(metrics);

The reason why learning rate is not decreasing during validation is because learning rate schedulers, which ExponentialLRDecay is one, are not currently being used in validation training. The null in the parameters below for TrainAndEvaluateClassificationLayerCore is where a learning rate scheduler would go:

machinelearning/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs

Lines 1031 to 1041 in bb13d62

    
           TrainAndEvaluateClassificationLayerCore(epoch, learningRate, featureFileStartOffset, 
        
               metrics, labelTensorShape, featureTensorShape, batchSize, 
        
               validationSetLabelReader, validationSetFeatureReader, labelBuffer, featuresBuffer, 
        
               labelBufferSizeInBytes, featureBufferSizeInBytes, featureFileRecordSize, null, 
        
               trainState, validationEvalRunner, featureBufferPtr, labelBufferPtr, 
        
               (outputTensors, metrics) => 
        
                   { 
        
                       outputTensors[0].ToScalar(ref accuracy); 
        
                       metrics.Train.Accuracy += accuracy; 
        
                       outputTensors[0].Dispose(); 
        
                   });

I don't see a reason why we don't support learning rate schedulers during validation, and as a result believe the learning rate during validation should also be decreasing due to the ExponentialLRDecay being used, and not remain stable (which it currently is at 0.01). Is my intuition correct @antoniovs1029 @harishsk ?

I'll be making a PR to address this issue and add tests to verify the changes soon.

In addition, I see that you add to manually log the learning rate and cross-entropy to see this bug. I see that these metrics are not reported during validation by default:

machinelearning/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs

Lines 170 to 179 in bb13d62

    
               public override string ToString() 
        
               { 
        
                   if (DatasetUsed == ImageClassificationMetrics.Dataset.Train) 
        
                       return $"Phase: Training, Dataset used: {DatasetUsed.ToString(),10}, Batch Processed Count: {BatchProcessedCount,3}, Learning Rate: {LearningRate,10} " + 
        
                           $"Epoch: {Epoch,3}, Accuracy: {Accuracy,10}, Cross-Entropy: {CrossEntropy,10}"; 
        
                   else 
        
                       return $"Phase: Training, Dataset used: {DatasetUsed.ToString(),10}, Batch Processed Count: {BatchProcessedCount,3}, " + 
        
                           $"Epoch: {Epoch,3}, Accuracy: {Accuracy,10}"; 
        
               } 
        
           }

I wonder if this gives a hint on whether or not learning rates should be updated during validation with ExponentialLRDecay. I'm in favor of printing these metrics during validation as well.

gartangh · 2020-06-23T08:51:25Z

Hi @mstfbl ,

You're welcome. I'm happy that I could be of help.

My guess is that the main part of this issue could be solved by copying that part of the code from training to validation?

About the learning rate during validation: one option would be to remove the LearningRate field from ImageClassificationMetrics.Train during validation, as it is not actually used. It was just very confusing to me that for training, the decay was visible, while for validation, it remained fixed at the initial value.
The other option would be as you say, but then there must be made sure that the learning rate does not decay twice as fast due to the extra usage during validation.

About your addition: if the cross-entropy is correctly updated, it must certainly be reported as well. It would be really awesome if the output signature would be exactly the same, i.e. also reporting the learning rate.
Then, the ToString method could look like this:

public override string ToString() 
{ 
    return $"Phase: Training, Dataset used: {DatasetUsed.ToString(),10}, Batch Processed Count: {BatchProcessedCount,3}, Learning Rate: {LearningRate,10} " + 
    $"Epoch: {Epoch,3}, Accuracy: {Accuracy,10}, Cross-Entropy: {CrossEntropy,10}"; 
}

mstfbl · 2020-07-02T17:52:52Z

Hi @gartangh,

Thank you once again for notifying us about this issue. My now merged PR #5255 added cross entropy metric support for validation training. I did not add learning decay support for validation, as during validation there is no real training being done, so learning rate decay is not applicable here.

gartangh · 2020-07-05T13:13:25Z

Hi @mstfbl

I just verified your changes by building the master branch locally.
This was the output:

This looks really good!
The cross-entropy and learning rate decay seem to work correctly.
Printing the learning rate during validation would probably be too confusing, so I understand why it is not used.

Thank you very much for your work.

mstfbl · 2020-07-06T16:41:00Z

Of course, anytime @gartangh ! Please feel free to make more issues/feature requests if you see any in the future, and I hope you continue to enjoy using ML.NET!

antoniovs1029 added bug Something isn't working P1 Priority of the issue for triage purpose: Needs to be fixed soon. labels Feb 7, 2020

antoniovs1029 self-assigned this Feb 7, 2020

harishsk added the image Bugs related image datatype tasks label Apr 29, 2020

antoniovs1029 removed their assignment Jun 16, 2020

mstfbl self-assigned this Jun 18, 2020

mstfbl mentioned this issue Jun 24, 2020

Added cross entropy support to validation training, edited metric reporting #5255

Merged

mstfbl closed this as completed in #5255 Jul 2, 2020

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageClassifier with ExponentialLRDecay: metrics not updated/calculated during validation #4807

ImageClassifier with ExponentialLRDecay: metrics not updated/calculated during validation #4807

gartangh commented Feb 7, 2020 •

edited

Loading

gartangh commented Feb 7, 2020

gartangh commented Jun 18, 2020 •

edited

Loading

mstfbl commented Jun 23, 2020 •

edited

Loading

gartangh commented Jun 23, 2020

mstfbl commented Jul 2, 2020

gartangh commented Jul 5, 2020

mstfbl commented Jul 6, 2020

ImageClassifier with ExponentialLRDecay: metrics not updated/calculated during validation #4807

ImageClassifier with ExponentialLRDecay: metrics not updated/calculated during validation #4807

Comments

gartangh commented Feb 7, 2020 • edited Loading

System information

Issue

gartangh commented Feb 7, 2020

gartangh commented Jun 18, 2020 • edited Loading

mstfbl commented Jun 23, 2020 • edited Loading

gartangh commented Jun 23, 2020

mstfbl commented Jul 2, 2020

gartangh commented Jul 5, 2020

mstfbl commented Jul 6, 2020

gartangh commented Feb 7, 2020 •

edited

Loading

gartangh commented Jun 18, 2020 •

edited

Loading

mstfbl commented Jun 23, 2020 •

edited

Loading