Skip to content

Add public generic methods to TextLoader catalog that accept Options objects #5134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2020

Conversation

antoniovs1029
Copy link
Member

The TextLoaderSaverCatalog contains 2 overloads of non-generic CreateTextLoader and LoadFromTextFile; one overload accepts a given set of a parameters and the other overload accepts a TexLoader.Options object.

The generic methods for CreateTextLoader<TInput> and LoadFromTextFile<TInput> only have 1 method that accepts a given set of parameters. So in this PR I add another overload for those 2 methods that also accepts an Options object.

The reason to do this is that if we add new parameters to TextLoader (for example, as done in #5125) then they're only accessible through a TextLoader.Options (so without this PR CreateTextLoader<TInput> and LoadFromTextFile<TInput> won't be able to use the new options). Notice that we can't simply add a new parameter to the existing CreateTextLoader<TInput> and LoadFromTextFile<TInput> methods, since that is considered a breaking API change, and we can't do those until ML.NET version 2.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner May 15, 2020 10:34
@antoniovs1029 antoniovs1029 requested a review from harishsk May 15, 2020 10:34
@codecov
Copy link

codecov bot commented May 15, 2020

Codecov Report

Merging #5134 into master will decrease coverage by 0.23%.
The diff coverage is 56.52%.

@@            Coverage Diff             @@
##           master    #5134      +/-   ##
==========================================
- Coverage   75.83%   75.60%   -0.24%     
==========================================
  Files         951      990      +39     
  Lines      172613   179227    +6614     
  Branches    18632    19292     +660     
==========================================
+ Hits       130904   135497    +4593     
- Misses      36529    38462    +1933     
- Partials     5180     5268      +88     
Flag Coverage Δ
#Debug 75.60% <56.52%> (-0.24%) ⬇️
#production 71.50% <56.52%> (+0.05%) ⬆️
#test 88.85% <ø> (-1.64%) ⬇️
Impacted Files Coverage Δ
....AutoML/API/RunDetails/CrossValidationRunDetail.cs 100.00% <ø> (ø)
...rc/Microsoft.ML.AutoML/API/RunDetails/RunDetail.cs 95.45% <ø> (ø)
....AutoML/ColumnInference/ColumnGroupingInference.cs 85.00% <ø> (ø)
...t.ML.AutoML/ColumnInference/ColumnTypeInference.cs 85.46% <ø> (+1.76%) ⬆️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 62.25% <0.00%> (ø)
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 88.65% <ø> (ø)
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 83.19% <ø> (ø)
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs 84.32% <ø> (ø)
src/Microsoft.ML.AutoML/Sweepers/SmacSweeper.cs 93.47% <ø> (ø)
...plates/Azure/Model/AzureAttachImageConsumeModel.cs 0.00% <0.00%> (ø)
... and 444 more

@@ -1453,6 +1453,23 @@ void ICanSaveModel.Save(ModelSaveContext ctx)
bool trimWhitespace = Defaults.TrimWhitespace,
IMultiStreamSource dataSample = null)
{
Options options = new Options
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to add readMultiLines as an option in a later PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readMultilines is already added to the TextLoader.Options class in my other PR, and it will always default as false, if its value isn't provided.

The method here, that you've pointed to, is used by this public method in the catalog. I can't add a new readMultilines parameter there precisely because it would be a breaking API change. So it doesn't make much sense to me to add a readMultilines option here, and so TextLoader.Options will simply use the default for it (which is false).

In the future, if a user (or we internally) want to use the new options (such as readMultiline), we would simply use a TextLoader.Options object and the overloads for creating TextLoaders that uses the option object. I think it's better and easier to maintain and update only the options object than an overload with several parameters that can't be changed in the public API.

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@antoniovs1029 antoniovs1029 merged commit b4bff87 into dotnet:master May 16, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants