Skip to content

[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field #4234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
luisquintanilla opened this issue Sep 20, 2019 · 9 comments · Fixed by #4314
Assignees

Comments

@luisquintanilla
Copy link
Contributor

System information

  • OS version/distro: Windows 10
  • .NET Version (eg., dotnet --info): .NET Core 2.2

Issue

  • What did you do?

Tried to train an image classification DNN model using the Image Classification API on the Intel Image Classification dataset.

  • What happened?

The following exception was raised

While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
  • What did you expect?

The model to train.

Source code / logs

Source Code

public static IEnumerable<ImageInput> LoadImagesFromDirectory(string folder, bool useFolderNameasLabel = true)
{
    var files = Directory.GetFiles(folder, "*",
        searchOption: SearchOption.AllDirectories);

    foreach (var file in files)
    {
        if ((Path.GetExtension(file) != ".jpg") && (Path.GetExtension(file) != ".png"))
            continue;

        var label = Path.GetFileName(file);
        if (useFolderNameasLabel)
            label = Directory.GetParent(file).Name;
        else
        {
            for (int index = 0; index < label.Length; index++)
            {
                if (!char.IsLetter(label[index]))
                {
                    label = label.Substring(0, index);
                    break;
                }
            }
        }

        yield return new ImageInput()
        {
            ImagePath = file,
            Label = label
        };

    }
}
MLContext mlContext = new MLContext();

IEnumerable<ImageInput> train = LoadImagesFromDirectory(trainRelativePath, true).Take(10).ToArray();
IEnumerable<ImageInput> test = LoadImagesFromDirectory(testRelativePath, true).Take(10).ToArray();

IDataView trainSet = mlContext.Data.LoadFromEnumerable(train);
IDataView testSet = mlContext.Data.LoadFromEnumerable(test);

var mapLabelTransform = mlContext.Transforms.Conversion.MapValueToKey
  (outputColumnName: "LabelAsKey",
   inputColumnName: "Label",
   keyOrdinality: ValueToKeyMappingEstimator.KeyOrdinality.ByValue);

var trainingPipeline = 
    mapLabelTransform
   .Append(mlContext.Model.ImageClassification(
       "ImagePath",
       "LabelAsKey",
       arch: ImageClassificationEstimator.Architecture.ResnetV2101,
       epoch: 100,
       batchSize: 150,
       metricsCallback: (metrics) => Console.WriteLine(metrics)));

ITransformer trainedModel = trainingPipeline.Fit(trainSet);

Logs

System.FormatException
  HResult=0x80131537
  Message=Tensorflow exception triggered while loading model.
  Source=Microsoft.ML.Dnn
  StackTrace:
   at Microsoft.ML.Transforms.Dnn.DnnUtils.LoadTFSessionByModelFilePath(IExceptionContext ectx, String modelFile, Boolean metaGraph)
   at Microsoft.ML.DnnCatalog.ImageClassification(ModelOperationsCatalog catalog, String featuresColumnName, String labelColumnName, String scoreColumnName, String predictedLabelColumnName, Architecture arch, Int32 epoch, Int32 batchSize, Single learningRate, ImageClassificationMetricsCallback metricsCallback, Int32 statisticFrequency, DnnFramework framework, String modelSavePath, String finalModelPrefix, IDataView validationSet, Boolean testOnTrainSet, Boolean reuseTrainSetBottleneckCachedValues, Boolean reuseValidationSetBottleneckCachedValues, String trainSetBottleneckCachedValuesFilePath, String validationSetBottleneckCachedValuesFilePath)
   at ImageClassificationAPIMLNETSample.Program.Main(String[] args) in C:\Users\luquinta.REDMOND\source\repos\ImageClassificationAPIMLNETSample\ImageClassificationAPIMLNETSample\Program.cs:line 59

Inner Exception 1:
InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.

Additional output to the console:

Google.Protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
   at Google.Protobuf.CodedInputStream.RefillBuffer(Boolean mustSucceed)
   at Google.Protobuf.CodedInputStream.ReadRawBytes(Int32 size)
   at Google.Protobuf.CodedInputStream.ReadBytes()
   at Tensorflow.TensorProto.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Tensorflow.AttrValue.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Google.Protobuf.FieldCodec.<>c__DisplayClass16_0`1.<ForMessage>b__0(CodedInputStream input)
   at Google.Protobuf.Collections.MapField`2.Codec.MessageAdapter.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Google.Protobuf.Collections.MapField`2.AddEntriesFrom(CodedInputStream input, Codec codec)
   at Tensorflow.NodeDef.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Google.Protobuf.FieldCodec.<>c__DisplayClass16_0`1.<ForMessage>b__0(CodedInputStream input)
   at Google.Protobuf.Collections.RepeatedField`1.AddEntriesFrom(CodedInputStream input, FieldCodec`1 codec)
   at Tensorflow.GraphDef.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Tensorflow.MetaGraphDef.MergeFrom(CodedInputStream input)
   at Google.Protobuf.MessageExtensions.MergeFrom(IMessage message, Byte[] data)
   at Google.Protobuf.MessageParser`1.ParseFrom(Byte[] data)
   at Tensorflow.saver._import_meta_graph_with_return_elements(String meta_graph_or_file, Boolean clear_devices, String import_scope, String[] return_elements)
   at Microsoft.ML.Transforms.Dnn.DnnUtils.<>c__DisplayClass5_0.<LoadMetaGraph>b__0(Graph graph)
   at Tensorflow.Python.tf_with[TIn,TOut](TIn py, Func`2 action)
@luisquintanilla
Copy link
Contributor Author

Using this pipeline worked.

var trainingPipeline = 
    mapLabelTransform
   .Append(mlContext.Model.ImageClassification(
       "ImagePath",
       "LabelAsKey",
       arch: ImageClassificationEstimator.Architecture.ResnetV2101,
       epoch: 100,
       batchSize: 30,
       metricsCallback: (metrics) => Console.WriteLine(metrics)));

Switching back to the original code with 150 batch size or another value for that parameter worked as well.

@CESARDELATORRE
Copy link
Contributor

@luisquintanilla - So what exactly was causing the issue then?

@zorthgo
Copy link

zorthgo commented Oct 6, 2019

Any idea as to why that message is being thrown. I tried the same pipeline as you have in your comment but I am still getting that error message.

@luisquintanilla
Copy link
Contributor Author

luisquintanilla commented Oct 7, 2019

@CESARDELATORRE not sure what happened as I was not able to replicate in this instance. I have experienced the issue in other runs but there's nothing I can potentially attribute this to. Re-running the application seems to "fix" it but it's not clear what causes it in the first place so it's difficult to replicate.

@CESARDELATORRE
Copy link
Contributor

It might make sense to hold off a bit on this issue and try the new preview API we're releasing in a few days for Image Classification since it's been evolving significantly.

@codemzs
Copy link
Member

codemzs commented Oct 7, 2019

@luisquintanilla are you trying to run this in parallel with another instance of this code? Can you please provide a link to your repo with the complete sample so that we can repro it the same as you? Also what version of the nuget you are using?

@ashbhandare is working on this.

@luisquintanilla
Copy link
Contributor Author

@codemzs I was only running one instance of this code.

Here is the link to the repo

These are the NuGet packages being used.

Package Version
Microsoft.ML 1.4.0-preview
Microsoft.ML.ImageAnalytics 1.4.0-preview
Microsoft.ML.Dnn 0.16.0-preview

@luisquintanilla
Copy link
Contributor Author

I think I found a way to reproduce. May be related to what @codemzs mentioned of running multiple instances (although not deliberately). If I run my application and stop it once it initializes, I run into this issue for subsequent runs. Deleting the bin and obj directories and re-running (without stopping) clears the issue and the application trains a model. I suspect in the background, the training continues even though the application has been stopped triggering the issue because multiple instances of the application are running.

@ashbhandare
Copy link
Contributor

I have isolated the source of this error. When you run the ImageClassification pipeline for the first time, the meta graph of the model (ResnetV2101 or InceptionV3) is downloaded, and in the subsequent runs, it is reused. If the run is interrupted while the download is in progress(by stopping), the protobuff is partially downloaded. This throws an error when this incomplete graph is attempted to be read in the subsequent runs. A temporary workaround is to delete the protobuff file and rerun. I'm working on a fix.

codemzs pushed a commit that referenced this issue Oct 28, 2019
* Add MaybeDownloadFile(), where check to redownload existing  file if file is not of expected size.

* Changed WebClient to HttpClient, renamed function to DownloadIfNeeded.

* Added unit test.

* Fixed newline after test

* Removed asynchronous copy

* added test for InceptionV3, fixed formatting.

* Modify to call DownloadIfNeeded

* fixed unit test, minor formatting

* fix test and change after rebase

* sync to master and use LoadRawImagesBytes instead of LoadImages.

* Dispose HttpClient object and wait for task for finish.

* clean up.

* Use resource manager to download meta files.

* remove test.

* remove unused namespaces.
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this issue Nov 4, 2019
…t#4314)

* Add MaybeDownloadFile(), where check to redownload existing  file if file is not of expected size.

* Changed WebClient to HttpClient, renamed function to DownloadIfNeeded.

* Added unit test.

* Fixed newline after test

* Removed asynchronous copy

* added test for InceptionV3, fixed formatting.

* Modify to call DownloadIfNeeded

* fixed unit test, minor formatting

* fix test and change after rebase

* sync to master and use LoadRawImagesBytes instead of LoadImages.

* Dispose HttpClient object and wait for task for finish.

* clean up.

* Use resource manager to download meta files.

* remove test.

* remove unused namespaces.
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this issue Nov 4, 2019
…t#4314)

* Add MaybeDownloadFile(), where check to redownload existing  file if file is not of expected size.

* Changed WebClient to HttpClient, renamed function to DownloadIfNeeded.

* Added unit test.

* Fixed newline after test

* Removed asynchronous copy

* added test for InceptionV3, fixed formatting.

* Modify to call DownloadIfNeeded

* fixed unit test, minor formatting

* fix test and change after rebase

* sync to master and use LoadRawImagesBytes instead of LoadImages.

* Dispose HttpClient object and wait for task for finish.

* clean up.

* Use resource manager to download meta files.

* remove test.

* remove unused namespaces.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants