-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[AutoML] bring AutoML API library to master #3882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Added sequential grouping of columns * added ungrouping of column option * reverted the file
* misc fixes -- fix bug where SMAC returning already-seen values; fix param encoding return bug in pipeline object model; nit clean-up AutoFit; return in pipeline suggester when sweeper has no next proposal; null ref fix in public object model pipeline suggester * fix in BuildPipelineNodePropsLightGbm test, fix / use correct 'newTrainer' variable in PipelneSuggester * SMAC perf improvement
…age sources. (dotnet#38) * Added sequential grouping of columns * removed nuget.config and have only props mentions the nuget sources * reverted the file
* Added sequential grouping of columns * reverted the file * addded infer columns label name checking * added column detection error * removed unsed usings * added quotes * replace Where with Any clause * replace Where with Any clause
* Added sequential grouping of columns * reverted the file * added auto params as null * change to the update fields method
* Includes following 1) Final proposal for 0.1 public API surface 2) Prefeaturization 3) Splitting train data into train and validate when validation data is null 4) Providing end to end samples one each for regression, binaryclassification and multiclass classification * Incorporating code review feedbacks
AutoFit returns is now an IEnumerable - this enables many good things Implementing variety of early stopping criteria (See sample) Early discard of models that are no good. This improves memory usage efficiency. (See sample) No need to implement a callback to get results back Getting best score is now outside of API implementation. It is a simple math function to compare scores (See sample). Also templatized the return type for better type safety through out the code.
2) Fixing up samples to reflect it
* added global tool initial project * removed unneccesary files, renamed files * refactoring and added base abstract classes for trainer generator * removed unused class * Added classes for transforms * added transform generate dummy classes * more refactoring, added first transform * more refactoring and added classes * changed the project structure * restructing added options class * sln changes * refactored options to different class: * added more logic for code generation of class * misc changes * reverted file * added commandline api package * reverted sample * added new command line api parser * added normalization of column names * Added command defaults and error message * implementation of all trainers * changed auto to null * added all transform generators * added error handling when args is empty and minor changes due to change in AutoML api names * changed the name of param * added new command line options and restructuring code * renamed proj file and added solution * Added code to generate usings, Fixed few bugs in the code * added validation to the command line options * changed project name * Bug fixes due to API change in AutoML * changed directory structure * added test framework and basic tests * added more tests * added improvements to template and error handling * renamed the estimator name * fixed test case * added comments * added headers * changed namespace and removed unneccesary properties from project * Revert "changed namespace and removed unneccesary properties from project" This reverts commit 9edae033e9845e910f663f296e168f1182b84f5f. * fixed test cases and renamed namespaces * cleaned up proj file * added folder structure * added symbols/tokens for strings * added more tests * review comments * modified test cases * review comments * change in the exception message * normalized line endings * made method private static * simplified range building /optimization * minor fix * added header * added static methods in command where necessary * nit picks * made few methods static * review comments * nitpick * remove line pragmas * fix test case
…sks (dotnet#65) * Added sequential grouping of columns * reverted the file * upgrade to v .10 and refactoring * added null check * fixed unit tests * review comments * removed the settings change * added regions * fixed unit tests
* Added sequential grouping of columns * reverted the file * changed to new API of Text Loader * changed signature * added params for taking additional settings * changes to codegen params * refactoring of templates and fixing errors
* Added sequential grouping of columns * reverted the file * changed to new API of Text Loader * changed signature * added params for taking additional settings * changes to codegen params * refactoring of templates and fixing errors * added run-tests.proj and referred it in build.proj
…dation in generated code (dotnet#83) * Added sequential grouping of columns * reverted the file * bug fixes, more logic to templates to support cross-validate * formatting and fix type in consolehelper * Added logic in templates * revert settings
* Create test.txt * Create test.txt * changes needed for benchmarking * forgot one file * merge conflict fix * fix build break * back out my version of the fix for Label column issue and fix the original fix * bogus file removal * undo SuggestedPipeline change * remove labelCol from pipeline suggester * fix build break
…otnet#86) * Added sequential grouping of columns * reverted the file * added calibration workaround * removed print probability * reverted settings
…g nuget package (dotnet#99) * Create test.txt * Create test.txt * changes needed for benchmarking * forgot one file * merge conflict fix * fix build break * back out my version of the fix for Label column issue and fix the original fix * bogus file removal * undo SuggestedPipeline change * remove labelCol from pipeline suggester * fix build break * rename AutoML to Microsoft.ML.Auto everywhere and a shot at publishing nuget package (will probably need tweaks once I try to use the pipleline)
* use dotnet-internal-temp agent for internal build * use dotnet-internal feed
… individual transform tests (dotnet#95) * Added sequential grouping of columns * reverted the file * fix usings for type convert * added transforms tests * review comments
1) Introduce AutoFit overloads (basic and advanced) 2) AutoFit Cancellation 3) AutoFit progress callbacks
src/Microsoft.ML.AutoML/Utils/MLNetUtils/ArrayDataViewBuilder.cs
Outdated
Show resolved
Hide resolved
test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty close. My biggest concerns are the PackageReference
in the .nupkgproj and the code duplication. Also - the 2 comments on InternalVisibleTo.
Once those get addressed, I think this will be ready to merge.
…L package version to 0.15.1 (dotnet#4071) * bumped version * change versions in nupkg * revert version bump in branch props
…equals infinity. (dotnet#4073) * bumped version * change versions in nupkg * revert version bump in branch props * added infinity fix
@Dmitry-A Any update here? |
- sync block on creating test data file (failed intermittently) - removed classes we copied over from ML.Core and fixed their uses to de-dupe and use original ML.Core versions since we now have InternalsVisible and BestFriends - Fixed nupkg creation to use projects insted of public nuget version for AutoML - Fixed a bunch of unit tests that didn't actually test what they were supposed to test, while removing cut&past code and dependencies. - Few more misc small changes
test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall build structure looks good to me. I didn't really review the AutoML specific code, since I figure your team has been reviewing it as you go.
…assembly, removed unused references from AutoML test project
Thanks for resolving Eric's changes. My concern was around type system changes your team had made to internal AutoML branch but I don't seem them here so I'm dismissing by hold. Thank you.
@Dmitry-A Can you please rebase to master and push to ensure the build isn't broke? We want to put these commits on top of master. |
4e0979f
to
3db7d98
Compare
This moves the AutoML API code from a feature branch to master. The CLI will move next.
Closes #4008