Skip to content

Commit abba48a

Browse files
authored
Merge pull request dotnet#1547 from shauheen/v07
Cherrypick for release 0.7
2 parents c5cef31 + 86088a1 commit abba48a

File tree

52 files changed

+1302
-129
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1302
-129
lines changed

docs/release-notes/0.7/release-0.7.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# ML.NET 0.7 Release Notes
2+
3+
Today we are excited to release ML.NET 0.7, which our algorithms strongly
4+
recommend you to try out! This release enables making recommendations with
5+
matrix factorization, identifying unusual events with anomaly detection,
6+
adding custom transformations to your ML pipeline, and more! We also have a
7+
small surprise for those who work in teams that use both .NET and Python.
8+
Finally, we wanted to thank the many new contributors to the project since the
9+
last release!
10+
11+
### Installation
12+
13+
ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
14+
Core
15+
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
16+
for more details.
17+
18+
You can install ML.NET NuGet from the CLI using:
19+
```
20+
dotnet add package Microsoft.ML
21+
```
22+
23+
From package manager:
24+
```
25+
Install-Package Microsoft.ML
26+
```
27+
28+
### Release Notes
29+
30+
Below are some of the highlights from this release.
31+
32+
* Added Matrix factorization for recommendation problems
33+
([#1263](https://github.com/dotnet/machinelearning/pull/1263))
34+
35+
* Matrix factorization (MF) is a common approach to recommendations when
36+
you have data on how users rated items in your catalog. For example, you
37+
might know how users rated some movies and want to recommend which other
38+
movies they are likely to watch next.
39+
* ML.NET's MF uses [LIBMF](https://github.com/cjlin1/libmf).
40+
* Example usage of MF can be found
41+
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/samples/Microsoft.ML.Samples/Dynamic/MatrixFactorization.cs).
42+
The example is general but you can imagine that the matrix rows
43+
correspond to users, matrix columns correspond to movies, and matrix
44+
values correspond to ratings. This matrix would be quite sparse as users
45+
have only rated a small subset of the catalog.
46+
* Note: [ML.NET
47+
0.3](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/release-notes/0.3/release-0.3.md)
48+
included Field-Aware Factorization Machines (FFM) as a learner for
49+
binary classification. FFM is a generalization of MF, but there are a
50+
few differences:
51+
* FFM enables taking advantage of other information beyond the rating
52+
a user assigns to an item (e.g. movie genre, movie release date,
53+
user profile).
54+
* FFM is currently limited to binary classification (the ratings needs
55+
to be converted to 0 or 1), whereas MF solves a regression problem
56+
(the ratings can be continuous numbers).
57+
* If the only information available is the user-item ratings, MF is
58+
likely to be significantly faster than FFM.
59+
* A more in-depth discussion can be found
60+
[here](https://www.csie.ntu.edu.tw/~cjlin/talks/recsys.pdf).
61+
62+
* Enabled anomaly detection scenarios
63+
([#1254](https://github.com/dotnet/machinelearning/pull/1254))
64+
65+
* [Anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection)
66+
enables identifying unusual values or events. It is used in scenarios
67+
such as fraud detection (identifying suspicious credit card
68+
transactions) and server monitoring (identifying unusual activity).
69+
* This release includes the following anomaly detection techniques:
70+
SSAChangePointDetector, SSASpikeDetector, IidChangePointDetector, and
71+
IidSpikeDetector.
72+
* Example usage can be found
73+
[here](https://github.com/dotnet/machinelearning/blob/7fb76b026d0035d6da4d0b46bd3f2a6e3c0ce3f1/test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs).
74+
75+
* Enabled using ML.NET in Windows x86 apps
76+
([#1008](https://github.com/dotnet/machinelearning/pull/1008))
77+
78+
* ML.NET can now be used in x86 apps.
79+
* Some components that are based on external dependencies (e.g.
80+
TensorFlow) will not be available in x86. Please open an issue on GitHub
81+
for discussion if this blocks you.
82+
83+
* Added the `CustomMappingEstimator` for custom data transformations
84+
[#1406](https://github.com/dotnet/machinelearning/pull/1406)
85+
86+
* ML.NET has a wide variety of data transformations for pre-processing and
87+
featurizing data (e.g. processing text, images, categorical features,
88+
etc.).
89+
* However, there might be application-specific transformations that would
90+
be useful to do within an ML.NET pipeline (as opposed to as a
91+
pre-processing step). For example, calculating [cosine
92+
similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between two
93+
text columns (after featurization) or something as simple as creating a
94+
new column that adds the values in two other columns.
95+
* An example of the `CustomMappingEstimator` can be found
96+
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/test/Microsoft.ML.Tests/Transformers/CustomMappingTests.cs#L55).
97+
98+
* Consolidated several API concepts in `MLContext`
99+
[#1252](https://github.com/dotnet/machinelearning/pull/1252)
100+
101+
* `MLContext` replaces `LocalEnvironment` and `ConsoleEnvironment` but
102+
also includes properties for ML tasks like
103+
`BinaryClassification`/`Regression`, various transforms/trainers, and
104+
evaluation. More information can be found in
105+
[#1098](https://github.com/dotnet/machinelearning/issues/1098).
106+
* Example usage can be found
107+
[here](https://github.com/dotnet/machinelearning/blob/d68388a1c9994a5b429b194b64b2b0782834cb78/docs/code/MlNetCookBook.md).
108+
109+
* Open sourced [NimbusML](https://github.com/microsoft/nimbusml): experimental
110+
Python bindings for ML.NET.
111+
112+
* NimbusML makes it easy for data scientists to train models in Python and
113+
hand them off to .NET developers to include in their apps and services
114+
using ML.NET.
115+
* NimbusML components easily integrate into
116+
[scikit-learn](http://scikit-learn.org/stable/) pipelines.
117+
* Note that NimbusML is an experimental project without the same support
118+
level as ML.NET.
119+
120+
### Acknowledgements
121+
122+
Shoutout to [dzban2137](https://github.com/dzban2137),
123+
[beneyal](https://github.com/beneyal),
124+
[pkulikov](https://github.com/pkulikov),
125+
[amiteshenoy](https://github.com/amiteshenoy),
126+
[DAXaholic](https://github.com/DAXaholic),
127+
[Racing5372](https://github.com/Racing5372),
128+
[ThePiranha](https://github.com/ThePiranha),
129+
[helloguo](https://github.com/helloguo),
130+
[elbruno](https://github.com/elbruno),
131+
[harshsaver](https://github.com/harshsaver),
132+
[f1x3d](https://github.com/f1x3d), [rauhs](https://github.com/rauhs),
133+
[nihitb06](https://github.com/nihitb06),
134+
[nandaleite](https://github.com/nandaleite),
135+
[timitoc](https://github.com/timitoc),
136+
[feiyun0112](https://github.com/feiyun0112),
137+
[Pielgrin](https://github.com/Pielgrin),
138+
[malik97160](https://github.com/malik97160),
139+
[Niladri24dutta](https://github.com/Niladri24dutta),
140+
[suhailsinghbains](https://github.com/suhailsinghbains),
141+
[terop](https://github.com/terop), [Matei13](https://github.com/Matei13),
142+
[JorgeAndd](https://github.com/JorgeAndd), and the ML.NET team for their
143+
contributions as part of this release!
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
// Licensed to the .NET Foundation under one or more agreements.
2+
// The .NET Foundation licenses this file to you under the MIT license.
3+
// See the LICENSE file in the project root for more information.
4+
5+
using Microsoft.ML.Runtime.Api;
6+
using Microsoft.ML.Runtime.Data;
7+
using Microsoft.ML.Trainers;
8+
using System;
9+
using System.Collections.Generic;
10+
11+
// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT.
12+
// If you change the existinc content, check that the files referencing it in the XML documentation are still correct, as they reference
13+
// line by line.
14+
namespace Microsoft.ML.Samples.Dynamic
15+
{
16+
public partial class TrainerSamples
17+
{
18+
// The following variables defines the shape of a matrix. Its shape is _synthesizedMatrixRowCount-by-_synthesizedMatrixColumnCount.
19+
// The variable _synthesizedMatrixFirstRowIndex indicates the integer that would be mapped to the first row index. If user data uses
20+
// 0-based indices for rows, _synthesizedMatrixFirstRowIndex can be set to 0. Similarly, for 1-based indices, _synthesizedMatrixFirstRowIndex
21+
// could be 1.
22+
const int _synthesizedMatrixFirstColumnIndex = 1;
23+
const int _synthesizedMatrixFirstRowIndex = 1;
24+
const int _synthesizedMatrixColumnCount = 60;
25+
const int _synthesizedMatrixRowCount = 100;
26+
27+
// A data structure used to encode a single value in matrix
28+
internal class MatrixElement
29+
{
30+
// Matrix column index starts from _synthesizedMatrixFirstColumnIndex and is at most
31+
// _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount - 1.
32+
// Contieuous=true means that all values between the min and max indexes are all allowed.
33+
[KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)]
34+
public uint MatrixColumnIndex;
35+
// Matrix row index starts from _synthesizedMatrixFirstRowIndex and is at most
36+
// _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount - 1.
37+
// Contieuous=true means that all values between the min and max indexes are all allowed.
38+
[KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)]
39+
public uint MatrixRowIndex;
40+
// The value at the column MatrixColumnIndex and row MatrixRowIndex.
41+
public float Value;
42+
}
43+
44+
// A data structure used to encode prediction result. Comparing with MatrixElement, The field Value in MatrixElement is
45+
// renamed to Score because Score is the default name of matrix factorization's output.
46+
internal class MatrixElementForScore
47+
{
48+
[KeyType(Contiguous = true, Count = _synthesizedMatrixColumnCount, Min = _synthesizedMatrixFirstColumnIndex)]
49+
public uint MatrixColumnIndex;
50+
[KeyType(Contiguous = true, Count = _synthesizedMatrixRowCount, Min = _synthesizedMatrixFirstRowIndex)]
51+
public uint MatrixRowIndex;
52+
public float Score;
53+
}
54+
55+
// This example first creates in-memory data and then use it to train a matrix factorization model. Afterward, quality metrics are reported.
56+
public static void MatrixFactorizationInMemoryData()
57+
{
58+
// Create an in-memory matrix as a list of tuples (column index, row index, value).
59+
var dataMatrix = new List<MatrixElement>();
60+
for (uint i = _synthesizedMatrixFirstColumnIndex; i < _synthesizedMatrixFirstColumnIndex + _synthesizedMatrixColumnCount; ++i)
61+
for (uint j = _synthesizedMatrixFirstRowIndex; j < _synthesizedMatrixFirstRowIndex + _synthesizedMatrixRowCount; ++j)
62+
dataMatrix.Add(new MatrixElement() { MatrixColumnIndex = i, MatrixRowIndex = j, Value = (i + j) % 5 });
63+
64+
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
65+
// as a catalog of available operations and as the source of randomness.
66+
var mlContext = new MLContext(seed: 0, conc: 1);
67+
68+
// Convert the in-memory matrix into an IDataView so that ML.NET components can consume it.
69+
var dataView = ComponentCreation.CreateDataView(mlContext, dataMatrix);
70+
71+
// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the
72+
// matrix's column index, and "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used to extract field
73+
// names' in MatrixElement class.
74+
var pipeline = new MatrixFactorizationTrainer(mlContext, nameof(MatrixElement.Value),
75+
nameof(MatrixElement.MatrixColumnIndex), nameof(MatrixElement.MatrixRowIndex),
76+
advancedSettings: s =>
77+
{
78+
s.NumIterations = 10;
79+
s.NumThreads = 1; // To eliminate randomness, # of threads must be 1.
80+
s.K = 32;
81+
});
82+
83+
// Train a matrix factorization model.
84+
var model = pipeline.Fit(dataView);
85+
86+
// Apply the trained model to the training set.
87+
var prediction = model.Transform(dataView);
88+
89+
// Calculate regression matrices for the prediction result.
90+
var metrics = mlContext.Regression.Evaluate(prediction,
91+
label: nameof(MatrixElement.Value), score: nameof(MatrixElementForScore.Score));
92+
93+
// Print out some metrics for checking the model's quality.
94+
Console.WriteLine($"L1 - {metrics.L1}");
95+
Console.WriteLine($"L2 - {metrics.L2}");
96+
Console.WriteLine($"LossFunction - {metrics.LossFn}");
97+
Console.WriteLine($"RMS - {metrics.Rms}");
98+
Console.WriteLine($"RSquared - {metrics.RSquared}");
99+
100+
// Create two two entries for making prediction. Of course, the prediction value, Score, is unknown so it's default.
101+
// If any of row and column indexes are out-of-range (e.g., MatrixColumnIndex=99999), the prediction value will be NaN.
102+
var testMatrix = new List<MatrixElementForScore>() {
103+
new MatrixElementForScore() { MatrixColumnIndex = 1, MatrixRowIndex = 7, Score = default },
104+
new MatrixElementForScore() { MatrixColumnIndex = 3, MatrixRowIndex = 6, Score = default } };
105+
106+
// Again, convert the test data to a format supported by ML.NET.
107+
var testDataView = ComponentCreation.CreateDataView(mlContext, testMatrix);
108+
109+
// Feed the test data into the model and then iterate through all predictions.
110+
foreach (var pred in model.Transform(testDataView).AsEnumerable<MatrixElementForScore>(mlContext, false))
111+
Console.WriteLine($"Predicted value at row {pred.MatrixRowIndex} and column {pred.MatrixColumnIndex} is {pred.Score}");
112+
}
113+
}
114+
}

0 commit comments

Comments
 (0)