[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field #4234

luisquintanilla · 2019-09-20T18:13:45Z

System information

OS version/distro: Windows 10
.NET Version (eg., dotnet --info): .NET Core 2.2

Issue

What did you do?

Tried to train an image classification DNN model using the Image Classification API on the Intel Image Classification dataset.

What happened?

The following exception was raised

While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.

What did you expect?

The model to train.

Source code / logs

Source Code

public static IEnumerable<ImageInput> LoadImagesFromDirectory(string folder, bool useFolderNameasLabel = true)
{
    var files = Directory.GetFiles(folder, "*",
        searchOption: SearchOption.AllDirectories);

    foreach (var file in files)
    {
        if ((Path.GetExtension(file) != ".jpg") && (Path.GetExtension(file) != ".png"))
            continue;

        var label = Path.GetFileName(file);
        if (useFolderNameasLabel)
            label = Directory.GetParent(file).Name;
        else
        {
            for (int index = 0; index < label.Length; index++)
            {
                if (!char.IsLetter(label[index]))
                {
                    label = label.Substring(0, index);
                    break;
                }
            }
        }

        yield return new ImageInput()
        {
            ImagePath = file,
            Label = label
        };

    }
}

MLContext mlContext = new MLContext();

IEnumerable<ImageInput> train = LoadImagesFromDirectory(trainRelativePath, true).Take(10).ToArray();
IEnumerable<ImageInput> test = LoadImagesFromDirectory(testRelativePath, true).Take(10).ToArray();

IDataView trainSet = mlContext.Data.LoadFromEnumerable(train);
IDataView testSet = mlContext.Data.LoadFromEnumerable(test);

var mapLabelTransform = mlContext.Transforms.Conversion.MapValueToKey
  (outputColumnName: "LabelAsKey",
   inputColumnName: "Label",
   keyOrdinality: ValueToKeyMappingEstimator.KeyOrdinality.ByValue);

var trainingPipeline = 
    mapLabelTransform
   .Append(mlContext.Model.ImageClassification(
       "ImagePath",
       "LabelAsKey",
       arch: ImageClassificationEstimator.Architecture.ResnetV2101,
       epoch: 100,
       batchSize: 150,
       metricsCallback: (metrics) => Console.WriteLine(metrics)));

ITransformer trainedModel = trainingPipeline.Fit(trainSet);

Logs

System.FormatException
  HResult=0x80131537
  Message=Tensorflow exception triggered while loading model.
  Source=Microsoft.ML.Dnn
  StackTrace:
   at Microsoft.ML.Transforms.Dnn.DnnUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)
   at Microsoft.ML.DnnCatalog.ImageClassification(ModelOperationsCatalog catalog, String featuresColumnName, String labelColumnName, String scoreColumnName, String predictedLabelColumnName, Architecture arch, Int32 epoch, Int32 batchSize, Single learningRate, ImageClassificationMetricsCallback metricsCallback, Int32 statisticFrequency, DnnFramework framework, String modelSavePath, String finalModelPrefix, IDataView validationSet, Boolean testOnTrainSet, Boolean reuseTrainSetBottleneckCachedValues, Boolean reuseValidationSetBottleneckCachedValues, String trainSetBottleneckCachedValuesFilePath, String validationSetBottleneckCachedValuesFilePath)
   at ImageClassificationAPIMLNETSample.Program.Main(String[] args) in C:\Users\luquinta.REDMOND\source\repos\ImageClassificationAPIMLNETSample\ImageClassificationAPIMLNETSample\Program.cs:line 59

Inner Exception 1:
InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.

Additional output to the console:

Google.Protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
   at Google.Protobuf.CodedInputStream.RefillBuffer(Boolean mustSucceed)
   at Google.Protobuf.CodedInputStream.ReadRawBytes(Int32 size)
   at Google.Protobuf.CodedInputStream.ReadBytes()
   at Tensorflow.TensorProto.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Tensorflow.AttrValue.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Google.Protobuf.FieldCodec.<>c__DisplayClass16_0`1.<ForMessage>b__0(CodedInputStream input)
   at Google.Protobuf.Collections.MapField`2.Codec.MessageAdapter.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Google.Protobuf.Collections.MapField`2.AddEntriesFrom(CodedInputStream input, Codec codec)
   at Tensorflow.NodeDef.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Google.Protobuf.FieldCodec.<>c__DisplayClass16_0`1.<ForMessage>b__0(CodedInputStream input)
   at Google.Protobuf.Collections.RepeatedField`1.AddEntriesFrom(CodedInputStream input, FieldCodec`1 codec)
   at Tensorflow.GraphDef.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Tensorflow.MetaGraphDef.MergeFrom(CodedInputStream input)
   at Google.Protobuf.MessageExtensions.MergeFrom(IMessage message, Byte[] data)
   at Google.Protobuf.MessageParser`1.ParseFrom(Byte[] data)
   at Tensorflow.saver._import_meta_graph_with_return_elements(String meta_graph_or_file, Boolean clear_devices, String import_scope, String[] return_elements)
   at Microsoft.ML.Transforms.Dnn.DnnUtils.<>c__DisplayClass5_0.<LoadMetaGraph>b__0(Graph graph)
   at Tensorflow.Python.tf_with[TIn,TOut](TIn py, Func`2 action)

The text was updated successfully, but these errors were encountered:

luisquintanilla · 2019-09-25T15:23:28Z

Using this pipeline worked.

var trainingPipeline = 
    mapLabelTransform
   .Append(mlContext.Model.ImageClassification(
       "ImagePath",
       "LabelAsKey",
       arch: ImageClassificationEstimator.Architecture.ResnetV2101,
       epoch: 100,
       batchSize: 30,
       metricsCallback: (metrics) => Console.WriteLine(metrics)));

Switching back to the original code with 150 batch size or another value for that parameter worked as well.

CESARDELATORRE · 2019-10-02T23:31:29Z

@luisquintanilla - So what exactly was causing the issue then?

zorthgo · 2019-10-06T20:14:12Z

Any idea as to why that message is being thrown. I tried the same pipeline as you have in your comment but I am still getting that error message.

luisquintanilla · 2019-10-07T17:54:56Z

@CESARDELATORRE not sure what happened as I was not able to replicate in this instance. I have experienced the issue in other runs but there's nothing I can potentially attribute this to. Re-running the application seems to "fix" it but it's not clear what causes it in the first place so it's difficult to replicate.

CESARDELATORRE · 2019-10-07T18:38:51Z

It might make sense to hold off a bit on this issue and try the new preview API we're releasing in a few days for Image Classification since it's been evolving significantly.

codemzs · 2019-10-07T22:55:46Z

@luisquintanilla are you trying to run this in parallel with another instance of this code? Can you please provide a link to your repo with the complete sample so that we can repro it the same as you? Also what version of the nuget you are using?

@ashbhandare is working on this.

luisquintanilla · 2019-10-08T16:47:01Z

@codemzs I was only running one instance of this code.

Here is the link to the repo

These are the NuGet packages being used.

Package	Version
Microsoft.ML	1.4.0-preview
Microsoft.ML.ImageAnalytics	1.4.0-preview
Microsoft.ML.Dnn	0.16.0-preview

luisquintanilla · 2019-10-08T16:55:14Z

I think I found a way to reproduce. May be related to what @codemzs mentioned of running multiple instances (although not deliberately). If I run my application and stop it once it initializes, I run into this issue for subsequent runs. Deleting the bin and obj directories and re-running (without stopping) clears the issue and the application trains a model. I suspect in the background, the training continues even though the application has been stopped triggering the issue because multiple instances of the application are running.

ashbhandare · 2019-10-08T18:49:08Z

I have isolated the source of this error. When you run the ImageClassification pipeline for the first time, the meta graph of the model (ResnetV2101 or InceptionV3) is downloaded, and in the subsequent runs, it is reused. If the run is interrupted while the download is in progress(by stopping), the protobuff is partially downloaded. This throws an error when this incomplete graph is attempted to be read in the subsequent runs. A temporary workaround is to delete the protobuff file and rerun. I'm working on a fix.

* Add MaybeDownloadFile(), where check to redownload existing file if file is not of expected size. * Changed WebClient to HttpClient, renamed function to DownloadIfNeeded. * Added unit test. * Fixed newline after test * Removed asynchronous copy * added test for InceptionV3, fixed formatting. * Modify to call DownloadIfNeeded * fixed unit test, minor formatting * fix test and change after rebase * sync to master and use LoadRawImagesBytes instead of LoadImages. * Dispose HttpClient object and wait for task for finish. * clean up. * Use resource manager to download meta files. * remove test. * remove unused namespaces.

…t#4314) * Add MaybeDownloadFile(), where check to redownload existing file if file is not of expected size. * Changed WebClient to HttpClient, renamed function to DownloadIfNeeded. * Added unit test. * Fixed newline after test * Removed asynchronous copy * added test for InceptionV3, fixed formatting. * Modify to call DownloadIfNeeded * fixed unit test, minor formatting * fix test and change after rebase * sync to master and use LoadRawImagesBytes instead of LoadImages. * Dispose HttpClient object and wait for task for finish. * clean up. * Use resource manager to download meta files. * remove test. * remove unused namespaces.

luisquintanilla mentioned this issue Sep 26, 2019

Problems with pipeline.fit in DeepLearning_ImageClassification_Training dotnet/machinelearning-samples#663

Closed

codemzs assigned ashbhandare Oct 2, 2019

ashbhandare mentioned this issue Oct 8, 2019

Use resource manager to download meta files. Fixes #4234 #4314

Merged

codemzs closed this as completed in #4314 Oct 28, 2019

jwood803 mentioned this issue Apr 13, 2020

a bug come from DeepNeuralNetword jwood803/MLNetExamples#15

Open

ghost locked as resolved and limited conversation to collaborators Mar 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field #4234

[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field #4234

luisquintanilla commented Sep 20, 2019

luisquintanilla commented Sep 25, 2019

CESARDELATORRE commented Oct 2, 2019

zorthgo commented Oct 6, 2019

luisquintanilla commented Oct 7, 2019 •

edited

Loading

CESARDELATORRE commented Oct 7, 2019

codemzs commented Oct 7, 2019 •

edited

Loading

luisquintanilla commented Oct 8, 2019

luisquintanilla commented Oct 8, 2019

ashbhandare commented Oct 8, 2019

[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field #4234

[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field #4234

Comments

luisquintanilla commented Sep 20, 2019

System information

Issue

Source code / logs

Source Code

Logs

luisquintanilla commented Sep 25, 2019

CESARDELATORRE commented Oct 2, 2019

zorthgo commented Oct 6, 2019

luisquintanilla commented Oct 7, 2019 • edited Loading

CESARDELATORRE commented Oct 7, 2019

codemzs commented Oct 7, 2019 • edited Loading

luisquintanilla commented Oct 8, 2019

luisquintanilla commented Oct 8, 2019

ashbhandare commented Oct 8, 2019

luisquintanilla commented Oct 7, 2019 •

edited

Loading

codemzs commented Oct 7, 2019 •

edited

Loading