Skip to content

Redesign DnnCatalog methods API for ease of use and consistency. #4362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 22, 2019

Conversation

ashbhandare
Copy link
Contributor

@ashbhandare ashbhandare commented Oct 22, 2019

Fixes #4307

Initially the methods in the DnnCatalog took a lot of optional arguments which would confuse users and adding features to the ImageClassification API would require breaking changes in the future.

To be consistent with the rest of ML .Net, this change follows the pattern where we have 2 overloads:
1)An overload that takes the required parameters, and optionally the most important/common parameters to a method.
2)An overload that takes an Options object, which contains all the options to the method

This changes also removes a couple of unused options for the ImageClassification API, while making RetrainDnnModel internal to enable further testing before releasing.

WIP to get initial feedback/add necessary tests.

@ashbhandare ashbhandare requested a review from a team as a code owner October 22, 2019 00:06
@eerhardt
Copy link
Member

eerhardt commented Oct 22, 2019

@ashbhandare - please don't use WIP in the title anymore on PRs. If the PR isn't ready for review and merge, then open a Draft PR.

https://github.blog/2019-02-14-introducing-draft-pull-requests/ #Resolved

/// <summary>
/// The options for the <see cref="ImageClassificationTransformer"/>.
/// </summary>
public sealed class Options
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need 2 "Options" classes. We only need one.

This appears to be a duplicate of the ImageClassificationEstimator.Options class. One of these should be removed. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen is the nested ImageClassificationEstimator.Options class should become public, and then this one can be deleted.


In reply to: 337563694 [](ancestors = 337563694)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two classes have most members in common, except that DnnCatalog.Options has featuresColumnName, and ImageClassificationEstimator.Options() takes in InputColumns and OutputColumns. Using ImageClassificationEstimator.Options directly would change the necessary arguments that ImageClassification() would take. I can make that change if that is acceptable.


In reply to: 337568602 [](ancestors = 337568602,337563694)

};

if (!File.Exists(options.ModelLocation))
var env = CatalogUtils.GetEnvironment(catalog);
return new ImageClassificationEstimator(env, options, DnnUtils.LoadDnnModel(env, options.Arch, true));
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should just call into the ImageClassification overload below that takes an Options class. #Resolved

@@ -83,6 +83,120 @@ public static class DnnCatalog
return new DnnRetrainEstimator(env, options, DnnUtils.LoadDnnModel(env, modelPath, true));
}

/// <summary>
/// The options for the <see cref="ImageClassificationTransformer"/>.
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say it is the options for the Estimator, not the Transformer. These options are used to create the Estimator. #Resolved

/// <param name="reuseTrainSetBottleneckCachedValues">Indicates to not re-compute cached trainset bottleneck values if already available in the bin folder.</param>
/// <param name="reuseValidationSetBottleneckCachedValues">Indicates to not re-compute validataionset cached bottleneck validationset values if already available in the bin folder.</param>
/// <param name="trainSetBottleneckCachedValuesFilePath">Indicates the file path to store trainset bottleneck values for caching.</param>
/// <param name="validationSetBottleneckCachedValuesFilePath">Indicates the file path to store validationset bottleneck values for caching.</param>
/// <remarks>
/// The support for image classification is under preview.
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codemzs - When should this remark get removed? I'm assuming we don't want to ship with this in the documentation. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the remark after talking to @codemzs offline.


In reply to: 337567053 [](ancestors = 337567053)

/// <summary>
/// The names of the model inputs.
/// </summary>
[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "The names of the model inputs", ShortName = "inputs", SortOrder = 1)]
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this marked with ArgumentType.Multiple? We only support a single Features column name, correct? #Resolved

/// Validation set.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Validation set.", SortOrder = 15)]
public IDataView ValidationSet = null;
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference types are null by default. You don't need to explicitly put = null at the end here, and below. #Resolved

@@ -15,7 +15,7 @@ internal static void RunAll()
{
var sample = type.GetMethod("Example", BindingFlags.Public | BindingFlags.Static | BindingFlags.FlattenHierarchy);

if (sample != null)
if (sample != null && type.Name == "ResnetV2101TransferLearningTrainTestSplit")
{
Copy link
Contributor

@harishsk harishsk Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this get accidentally left behind? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, thank you!


In reply to: 337586213 [](ancestors = 337586213)

@ashbhandare ashbhandare changed the title WIP: Redesign DnnCatalog methods API for ease of use and consistency. Redesign DnnCatalog methods API for ease of use and consistency. Oct 22, 2019
@@ -46,7 +46,7 @@ public static class DnnCatalog
/// <remarks>
/// The support for retraining is under preview.
/// </remarks>
public static DnnRetrainEstimator RetrainDnnModel(
internal static DnnRetrainEstimator RetrainDnnModel(
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the DnnRetrainTransformer and DnnRetrainEstimator classes also be marked as internal? #Resolved

/// The names of the model input features.
/// </summary>
[Argument(ArgumentType.AtMostOnce | ArgumentType.Required, HelpText = "The names of the model inputs", ShortName = "features", SortOrder = 1)]
public string FeaturesColumnName;
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have both InputColumns and FeaturesColumnName? We should only have 1. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies for LabelColumnName.

And also for OutputColumns and ScoreColumnName.


In reply to: 337694012 [](ancestors = 337694012)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputColumns is an array of ScoreColumn and PredictedLabelColumn. Users can have a choice of setting either one or both of these names. I could make OutputColumns internal if that's cleaner.


In reply to: 337694812 [](ancestors = 337694812,337694012)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this API take more than one input at any point, as InputColumn is an array? I could either make InputColumn internal or get rid of one of InputColumns and FeaturesColumnName.


In reply to: 337694012 [](ancestors = 337694012)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making InputColumns and OutputColumns internal after talking with @[email protected] offline.


In reply to: 337712848 [](ancestors = 337712848,337694012)

@@ -1341,31 +1340,31 @@ public enum Dataset
/// <summary>
/// The options for the <see cref="ImageClassificationTransformer"/>.
/// </summary>
internal sealed class Options : TransformInputBase
public sealed class Options : TransformInputBase
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't inherit from TransformInputBase. That class is for data transforms, and this is a trainer. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


In reply to: 337694597 [](ancestors = 337694597)

return false;
}

var wc = new WebClient();
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix this pattern. This is becoming pervasive in our repo.

It is not recommended to use WebClient. We should use HttpClient instead.

https://docs.microsoft.com/en-us/dotnet/api/system.net.webclient?view=netframework-4.8#remarks #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is included in another PR #4314, which is failing the CI test intermittently. I would like to make that change separate from this if that's okay.


In reply to: 337696066 [](ancestors = 337696066)

if (File.Exists(Path.Combine(destFolder, flag))) return;

Console.WriteLine($"Extracting.");
var task = Task.Run(() =>
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to do a Task.Run here, and then Thread.Sleep later. Just call ZipFile.ExtractToDirectory inline in the method. #Resolved

@@ -6,11 +6,11 @@
using System.IO;
using System.IO.Compression;
using System.Net;
using Microsoft.ML.CommandLine;
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this using? #Resolved

}
}
}
public static ImageClassificationEstimator ImageClassification(
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have XML docs since it is a public method. #Resolved

[Argument(ArgumentType.AtMostOnce, HelpText = "Training labels.", ShortName = "label", SortOrder = 4)]
public string LabelColumn;
[Argument(ArgumentType.AtMostOnce | ArgumentType.Required, HelpText = "Training labels.", ShortName = "label", SortOrder = 4)]
public string LabelColumnName;
Copy link
Member

@eerhardt eerhardt Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and TensorFlowLabel? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can understand, TensorFlowLabel is the name of the TF node in the model that contains the label, while the LabelColumn is the name of the column in the IDataView. They are assigned the same value, so working on removing one of them.


In reply to: 337697817 [](ancestors = 337697817)

[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "The names of the model inputs", ShortName = "inputs", SortOrder = 1)]
public string[] InputColumns;
[Argument(ArgumentType.Multiple , HelpText = "The names of the model inputs", ShortName = "inputs", SortOrder = 1)]
internal string[] InputColumns;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal string[] InputColumns; [](start = 11, length = 32)

probably don't need this or Output coliumns but I will remove this in my PR where I will refactor all this code and convert to ITrainerEstimator

@codemzs
Copy link
Member

codemzs commented Oct 22, 2019

@eerhardt I need this change for my AutoML integration change and also the change where I convert this API from IEstimator to ITrainerEstimator, hence I will try to check-in as it mostly looks good and if there is any other feedback from you I will take it as part of my PR.

@codecov
Copy link

codecov bot commented Oct 22, 2019

Codecov Report

Merging #4362 into master will increase coverage by <.01%.
The diff coverage is 93.29%.

@@            Coverage Diff             @@
##           master    #4362      +/-   ##
==========================================
+ Coverage   74.61%   74.62%   +<.01%     
==========================================
  Files         882      882              
  Lines      154848   154899      +51     
  Branches    16913    16914       +1     
==========================================
+ Hits       115539   115591      +52     
+ Misses      34563    34557       -6     
- Partials     4746     4751       +5
Flag Coverage Δ
#Debug 74.62% <93.29%> (ø) ⬆️
#production 70.19% <91.04%> (ø) ⬆️
#test 89.55% <94.84%> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Dnn/DnnRetrainTransform.cs 57.26% <ø> (ø) ⬆️
src/Microsoft.ML.Dnn/DnnCatalog.cs 100% <100%> (+20.48%) ⬆️
src/Microsoft.ML.Dnn/DnnUtils.cs 69.4% <89.47%> (+3.31%) ⬆️
...c/Microsoft.ML.Dnn/ImageClassificationTransform.cs 88.54% <90.47%> (+0.72%) ⬆️
...cenariosWithDirectInstantiation/TensorflowTests.cs 90.52% <94.84%> (+0.04%) ⬆️
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 88.17% <0%> (-4.31%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 79.83% <0%> (-4.21%) ⬇️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 59.6% <0%> (-2.65%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.61%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
... and 2 more

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs merged commit 5f7527f into dotnet:master Oct 22, 2019
@eerhardt
Copy link
Member

    public enum DnnFramework

@codemzs - do we still need this public enum? I don't think it is used anymore.


Refers to: src/Microsoft.ML.Dnn/ImageClassificationTransform.cs:1115 in f956a07. [](commit_id = f956a07, deletion_comment = False)

@ashbhandare ashbhandare deleted the DnnCatalogAPI branch October 23, 2019 16:44
public static ImageClassificationEstimator ImageClassification(
this ModelOperationsCatalog catalog, Options options)
{
options.EarlyStoppingCriteria = options.DisableEarlyStopping ? null : options.EarlyStoppingCriteria ?? new EarlyStopping();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this acts in a surprising behavior. What if we changed it to the following?

  • Remove options.DisableEarlyStopping.
  • Default options.EarlyStoppingCriteria = new EarlyStopping();

Then, by default the EarlyStopping is enabled to the default EarlyStopping. If someone wants to disable EarlyStopping, they set options.EarlyStoppingCriteria = null; when calling this API. If they want to change EarlyStopping to a different criteria, they can just set options.EarlyStoppingCriteria = new EarlyStopping(metric: EarlyStoppingMetric.Loss) (or whatever criteria they need).

This feels like a better API design choice because then the caller doesn't have 2 properties to deal with - DisableEarlyStopping and EarlyStoppingCriteria. There is only 1 to consider, that either they leave to the default, or set it as appropriate. If they want to disable EarlyStopping - they just set it to null.

@eerhardt
Copy link
Member

if there is any other feedback from you I will take it as part of my PR.

I have left the above 2 pieces of feedback. Can you address them in your PR? Thanks.

frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
…net#4362)

* WIP initial change

* Changed API design, changed tests and samples to use new API

* Combined DnnCatalog.Options and ImageClassificationEstimator.Options, addressed review comments.

* Added unit test and sample

* Removed duplicate members in Options class, addresses PR comments

* Removed preview remark for ImageClassification.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DnnCatalog methods should use a public Options class
4 participants