Skip to content

[AutoML] bring AutoML API library to master #3882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 285 commits into from
Aug 23, 2019
Merged

Conversation

Dmitry-A
Copy link
Contributor

@Dmitry-A Dmitry-A commented Jun 19, 2019

This moves the AutoML API code from a feature branch to master. The CLI will move next.

Closes #4008

srsaggam and others added 30 commits January 28, 2019 19:46
* Added sequential grouping of columns

* added ungrouping of column option

* reverted the file
* misc fixes -- fix bug where SMAC returning already-seen values; fix param encoding return bug in pipeline object model; nit clean-up AutoFit; return in pipeline suggester when sweeper has no next proposal; null ref fix in public object model pipeline suggester

* fix in BuildPipelineNodePropsLightGbm test, fix / use correct 'newTrainer' variable in PipelneSuggester

* SMAC perf improvement
…age sources. (dotnet#38)

* Added sequential grouping of columns

* removed nuget.config and have only props mentions the nuget sources

* reverted the file
* Added sequential grouping of columns

* reverted the file

* addded infer columns label name checking

* added column detection error

* removed unsed usings

* added quotes

* replace Where with Any clause

* replace Where with Any clause
* Added sequential grouping of columns

* reverted the file

* added auto params as null

* change to the update fields method
* Includes following
1) Final proposal for 0.1 public API surface
2) Prefeaturization
3) Splitting train data into train and validate when validation data is null
4) Providing end to end samples one each for regression, binaryclassification and multiclass classification

* Incorporating code review feedbacks
* Revert "First public api propsal (dotnet#52)"

This reverts commit e4a64cf.

* Revert "Set Nullable Auto params to null values (dotnet#50)"

This reverts commit 41c663c.
AutoFit returns is now an IEnumerable - this enables many good things

Implementing variety of early stopping criteria (See sample)
Early discard of models that are no good. This improves memory usage efficiency. (See sample)
No need to implement a callback to get results back
Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample).

Also templatized the return type for better type safety through out the code.
2) Fixing up samples to reflect it
* added global tool initial project

* removed unneccesary files, renamed files

* refactoring and added base abstract classes for trainer generator

* removed unused class

* Added classes for transforms

* added transform generate dummy classes

* more refactoring, added first transform

* more refactoring and added classes

* changed the project structure

* restructing added options class

* sln changes

* refactored options to different class:

* added more logic for code generation of class

* misc changes

* reverted file

* added commandline api package

* reverted sample

* added new command line api parser

* added normalization of column names

* Added command defaults and error message

* implementation of all trainers

* changed auto to null

* added all transform generators

* added error handling when args is empty and minor changes due to change in AutoML api names

* changed the name of param

* added new command line options and restructuring code

* renamed proj file and added solution

* Added code to generate usings, Fixed few bugs in the code

* added validation to the command line options

* changed project name

* Bug fixes due to API change in AutoML

* changed directory structure

* added test framework and basic tests

* added more tests

* added improvements to template and error handling

* renamed the estimator name

* fixed test case

* added comments

* added headers

* changed namespace and removed unneccesary properties from project

* Revert "changed namespace and removed unneccesary properties from project"

This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f.

* fixed test cases and renamed namespaces

* cleaned up proj file

* added folder structure

* added symbols/tokens for strings

* added more tests

* review comments

* modified test cases

* review comments

* change in the exception message

* normalized line endings

* made method private static

* simplified range building /optimization

* minor fix

* added header

* added static methods in command where necessary

* nit picks

*  made few methods static

* review comments

* nitpick

* remove line pragmas

* fix test case
…sks (dotnet#65)

* Added sequential grouping of columns

* reverted the file

* upgrade to v .10 and refactoring

* added null check

* fixed unit tests

* review comments

* removed the settings change

* added regions

* fixed unit tests
* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors
* Added sequential grouping of columns

* reverted the file

* changed to new API of Text Loader

* changed signature

* added params for taking additional settings

* changes to codegen params

* refactoring of templates and fixing errors

* added run-tests.proj and referred it in build.proj
…dation in generated code (dotnet#83)

* Added sequential grouping of columns

* reverted the file

* bug fixes, more logic to templates to support cross-validate

* formatting and fix type in consolehelper

* Added logic in templates

* revert settings
* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break
…otnet#86)

* Added sequential grouping of columns

* reverted the file

* added calibration workaround

* removed print probability

* reverted settings
…g nuget package (dotnet#99)

* Create test.txt

* Create test.txt

* changes needed for benchmarking

* forgot one file

* merge conflict fix

* fix build break

* back out my version of the fix for Label column issue and fix the original fix

* bogus file removal

* undo SuggestedPipeline change

* remove labelCol from pipeline suggester

* fix build break

* rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)
* use dotnet-internal-temp agent for internal build

* use dotnet-internal feed
… individual transform tests (dotnet#95)

* Added sequential grouping of columns

* reverted the file

* fix usings for type convert

* added transforms tests

* review comments
)

* Added sequential grouping of columns

* reverted the file

* Added code to have unique strings

* refactoring

* minor fix

* minor fix
1) Introduce AutoFit overloads (basic and advanced)
2) AutoFit Cancellation
3) AutoFit progress callbacks
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty close. My biggest concerns are the PackageReference in the .nupkgproj and the code duplication. Also - the 2 comments on InternalVisibleTo.

Once those get addressed, I think this will be ready to merge.

…L package version to 0.15.1 (dotnet#4071)

* bumped version

* change versions in nupkg

* revert version bump in branch props
…equals infinity. (dotnet#4073)

* bumped version

* change versions in nupkg

* revert version bump in branch props

* added infinity fix
@eerhardt
Copy link
Member

@Dmitry-A Any update here?

- sync block on creating test data file (failed intermittently)
- removed classes we copied over from ML.Core and fixed their uses to de-dupe and use original ML.Core versions since we now have InternalsVisible and BestFriends
- Fixed nupkg creation  to use projects insted of public nuget version for AutoML
- Fixed a bunch of unit tests that didn't actually test what they were supposed to test, while removing cut&past code and dependencies.
- Few more misc small changes
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall build structure looks good to me. I didn't really review the AutoML specific code, since I figure your team has been reviewing it as you go.

:shipit:

…assembly, removed unused references from AutoML test project
@codemzs codemzs dismissed their stale review August 22, 2019 16:53

Thanks for resolving Eric's changes. My concern was around type system changes your team had made to internal AutoML branch but I don't seem them here so I'm dismissing by hold. Thank you.

@codemzs
Copy link
Member

codemzs commented Aug 22, 2019

@Dmitry-A Can you please rebase to master and push to ensure the build isn't broke? We want to put these commits on top of master.

@Dmitry-A Dmitry-A force-pushed the master branch 2 times, most recently from 4e0979f to 3db7d98 Compare August 22, 2019 22:59
@codemzs codemzs merged commit e50c4d2 into dotnet:master Aug 23, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AutoML] bring AutoML API code into master from feature branch
8 participants