Skip to content

Commit cde0d7d

Browse files
authored
Add ML.NET Roadmap (dotnet#30)
##Add Roadmap.md for the ML.NET project Resolves ISSUE: dotnet#27 * Adds readmap.md to root * Updates solution to include MD files. * fixes broken link to LICENSE in readme.md
1 parent 972f623 commit cde0d7d

File tree

3 files changed

+128
-1
lines changed

3 files changed

+128
-1
lines changed

Microsoft.ML.sln

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,35 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.Parquet", "src
5656
EndProject
5757
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Microsoft.ML.Sweeper", "src\Microsoft.ML.Sweeper\Microsoft.ML.Sweeper.csproj", "{55C8122D-79EA-48AB-85D0-EB551FC1C427}"
5858
EndProject
59+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "docs", "docs", "{E20AF96D-3F66-4065-8A89-BEE479D74536}"
60+
ProjectSection(SolutionItems) = preProject
61+
Documentation\README.md = Documentation\README.md
62+
EndProjectSection
63+
EndProject
64+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "project-docs", "project-docs", "{52794B40-AB8A-41AF-9EF7-799C80D6E0BC}"
65+
ProjectSection(SolutionItems) = preProject
66+
Documentation\project-docs\contributing.md = Documentation\project-docs\contributing.md
67+
Documentation\project-docs\developer-guide.md = Documentation\project-docs\developer-guide.md
68+
EndProjectSection
69+
EndProject
70+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{76F579E4-B9D2-4A0C-A511-EEFA4B2B829F}"
71+
ProjectSection(SolutionItems) = preProject
72+
CONTRIBUTING.md = CONTRIBUTING.md
73+
README.md = README.md
74+
ROADMAP.md = ROADMAP.md
75+
EndProjectSection
76+
EndProject
77+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "building", "building", "{DB751004-5D49-4B88-B78F-29CA9887087D}"
78+
ProjectSection(SolutionItems) = preProject
79+
Documentation\building\unix-instructions.md = Documentation\building\unix-instructions.md
80+
Documentation\building\windows-instructions.md = Documentation\building\windows-instructions.md
81+
EndProjectSection
82+
EndProject
83+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "specs", "specs", "{2DEFC784-F2B5-44EA-ABBB-0DCF3E689DAC}"
84+
ProjectSection(SolutionItems) = preProject
85+
Documentation\specs\mvp.md = Documentation\specs\mvp.md
86+
EndProjectSection
87+
EndProject
5988
Global
6089
GlobalSection(SolutionConfigurationPlatforms) = preSolution
6190
Debug|Any CPU = Debug|Any CPU
@@ -178,6 +207,9 @@ Global
178207
{B7B593C5-FB8C-4ADA-A638-5B53B47D087E} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
179208
{16BB1454-2108-40E5-B3A6-594654005303} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
180209
{55C8122D-79EA-48AB-85D0-EB551FC1C427} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
210+
{52794B40-AB8A-41AF-9EF7-799C80D6E0BC} = {E20AF96D-3F66-4065-8A89-BEE479D74536}
211+
{DB751004-5D49-4B88-B78F-29CA9887087D} = {E20AF96D-3F66-4065-8A89-BEE479D74536}
212+
{2DEFC784-F2B5-44EA-ABBB-0DCF3E689DAC} = {E20AF96D-3F66-4065-8A89-BEE479D74536}
181213
EndGlobalSection
182214
GlobalSection(ExtensibilityGlobals) = postSolution
183215
SolutionGuid = {41165AF1-35BB-4832-A189-73060F82B01D}

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ For more information, see the [.NET Foundation Code of Conduct](https://dotnetfo
5353

5454
## License
5555

56-
ML.NET is licensed under the [MIT license](LICENSE.TXT).
56+
ML.NET is licensed under the [MIT license](LICENSE).
5757

5858
## .NET Foundation
5959

ROADMAP.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# The ML.NET Roadmap
2+
3+
The goal of ML.NET project is to provide an easy to use, .NET-friendly ML platform. This document describes the tentative plan for the project in the short and long-term.
4+
5+
ML.NET is a community effort and we welcome community feedback on our plans. The best way to give feedback is to open an issue in this repo. It's always a good idea to have a discussion before embarking on a large code change to make sure there is not duplicated effort.
6+
Many of the features listed on the roadmap already exist in the internal version of the code-base. They are marked with (*). We plan to release more and more internal features to Github over time.
7+
8+
In the meanwhile, we are looking for contributions. An easy place to start is to look at _up-for-grabs_ issues on [Github](https://github.com/dotnet/machinelearning/issues?q=is%3Aopen+is%3Aissue+label%3Aup-for-grabs)
9+
10+
## Short Term
11+
### Training Improvements
12+
* Improved public API for training and inference
13+
* Enhanced tests and scenarios
14+
* Additional Learners
15+
* [LibSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) for anomaly detection (*)
16+
* [LightGBM](https://github.com/Microsoft/LightGBM) - a high-performance boosted decision tree (*)
17+
* Additional Learning Tasks (*)
18+
* _Ranking_ - problem where the goal is to automatically sort (rank) instances within a group based on ranked examples in training data
19+
* _Anomaly Detection_ - is also known as _outlier detection_. It is a task to identify items, events or observations which do not conform to an expected pattern in the dataset.
20+
* _Quantile Regression_ is a type of regression analysis. Whereas regression results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating either the conditional median or other quantiles of the response variable
21+
* Additional Data source support (*)
22+
* Apache Parquet
23+
* Native Binary high-performance format
24+
25+
### Featurization Improvements
26+
* Text (*)
27+
* Natural language text preprocessing such as tokenization, part-of-speech tagging, and sentence breaking
28+
* Pre-trained text models that can be used for extracting of semantic or sentiment features from text
29+
* Image (*)
30+
* Image preprocessing such as loading, resizing, and normalization if images
31+
* Image featurization, including industry-standard pre-trained ImageNet neural models, such as ResNet and AlexNet
32+
33+
### Trained Model Management
34+
* Export models to [ONNX](https://github.com/onnx/models) (*)
35+
36+
### GUI
37+
* Release the Model Builder tool to ease model development (*)
38+
* Design improvements to make the design adhere better to Fluent principles
39+
* Add a view for an easier comparison of several experiments
40+
* Ability to select the best performing pipeline, by sweeping transforms, the same way learners are swept.
41+
42+
## Longer Term
43+
44+
### Training Improvements
45+
* Add more learners, perhaps, including: (*)
46+
* Generative Additive Models
47+
* [SymSGD](https://arxiv.org/pdf/1705.08030.pdf) -a fast linear SGD learner
48+
* Factorization Machines
49+
* [ProtoNN and Bonsaii](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/) for compact and effecient models
50+
* Integration with other ML packages
51+
* Accord.NET
52+
* etc.
53+
* Deep Learning Support
54+
* Integrate with leading DNN package(s)
55+
* Support for transfer learning
56+
* Hybrid training of pipelines containing both DNN and non-DNN predictors
57+
* Additional ML tasks (*)
58+
* _Recommendation_ - Is a problem that can be phrased a: "For a given user, predict the ratings this user would give to the items that they have not explicitly rated yet"
59+
* _Anomaly Detection_, also known as _outlier detection_. It is a task to identify items, events or observations which do not conform to an expected pattern in the dataset. Typical examples are: detecting credit card fraud, medical problems or errors in text. Anomalies are also referred to as outliers,  novelties, noise, deviations and exceptions
60+
* _Sequence Classification_ - learns from a series of examples in a sequence, and each item is assigned a distinct label, akin to a multiclass classification task
61+
* Additional Data source support
62+
* Data from SQL Databases, such as SQL Server
63+
* Data located on the cloud
64+
* Distributed Training
65+
* Easily train models on the cloud
66+
* Whole-pipeline optimizations for both training and inference
67+
* Automation of more data science tasks
68+
* Additional Trainers
69+
* Additional tasks
70+
71+
### Featurization Improvements
72+
* Improved data wrangling support
73+
* Add auto-suggestion of training pipelines. The technology will provide intelligent ```LearningPipeline``` suggestions based on training data attributes (*)
74+
* Additional natural language text preprocessing
75+
* Time series and forecasting
76+
* Support for Video, audio, and other data types
77+
78+
### Trained Model Management
79+
* Model operationalization in the Cloud
80+
* Model deployment on mobile platforms
81+
* Ability to run [ONNX](https://github.com/onnx/models) models in the ```LearningPipeline```
82+
* Support for the next version of ONNX
83+
* Model deployment to IOT devices
84+
85+
### GUI Improvements
86+
* Usability improvements
87+
* Support of additional ML.NET features
88+
* Improved code generation for training and inference
89+
* Run the pipelines rather than just suggesting them; present to the user the pipelines and the metrics generated from running.
90+
* Distributed runs, rather than sequential.
91+
92+
### Other
93+
* Support for additional languages
94+
* Published reproducible benchmarks against industry-leading ML toolkits on a variety of tasks and datasets
95+

0 commit comments

Comments
 (0)