Skip to content

Add doc for MF #1450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 31, 2018
Merged

Add doc for MF #1450

merged 13 commits into from
Oct 31, 2018

Conversation

wschin
Copy link
Member

@wschin wschin commented Oct 30, 2018

Add doc for matrix factorization trainer so that users can see what it is. Fixes #1401.

@wschin wschin added the documentation Related to documentation of ML.NET label Oct 30, 2018
@wschin wschin self-assigned this Oct 30, 2018
@wschin wschin requested review from Ivanidzo4ka and sfilipi October 30, 2018 17:10
});

// Train a matrix factorization model.
var model = pipeline.Fit(dataView);
Copy link
Member

@sfilipi sfilipi Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is removed. I just add a pointer to that test.


In reply to: 229412969 [](ancestors = 229412969)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is not a complete example.

Copy link
Member Author

@wschin wschin Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand. Could you tell me which part is missing?

Following Senja's suggestion, a new example in Sample project is created.


In reply to: 229474507 [](ancestors = 229474507)

Assume that <i>R</i> is a m-by-n matrix and the rank of the two factor matrices are <i>P</i> (m-by-k matrix) and <i>Q</i> (n-by-k), where k is the approximation rank.
The predicted entry value at the u-th row and the v-th column in <i>R</i> would be the inner product of the u-th row of P and the v-th row of Q; that is,
<i>R</i> is approximated by <i>P</i> and <i>Q</i>. This trainer implements a stochastic gradient method for finding <i>P</i> and <i>Q</i> via minimizing
the distance between <i>R</i> and the product of <i>P</i>'s transpose and Q. For users interested in the mathematical details, please see the references below.
Copy link
Member

@sfilipi sfilipi Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

to structure, maybe? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


In reply to: 229413708 [](ancestors = 229413708)

<remarks>
The basic idea of matrix factorization is finding two low-rank factor marcies to apporimate the training matrix. In this module, the expected
training data is a list of tuples. Every tuple consists of a column index, a row index, and the value at the location specified by the two indexes.
For an example of the data structure you can use to encode those tuples, please see <a href='https://github.com/dotnet/machinelearning/pull/1407/files#diff-ce59b50bd87003b0ffb26912fc4a0e65R144'>MatrixElement.
Copy link
Member

@sfilipi sfilipi Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatrixElement [](start = 92, length = 123)

I am not sure about how I feel pointing our documentation to the tests. @JRAlexander, what do yo u think? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must have code reference for this sentence and that code need to be tested. I believe a doc has to be self-contained. Because I documented that test as detailed as a doc, I'd like to reuse it here.


In reply to: 229414267 [](ancestors = 229414267)

Copy link

@JRAlexander JRAlexander Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have an actual sample that users could use right away. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test is the minimal end-to-end example user can use. It even includes data preparation.


In reply to: 229453835 [](ancestors = 229453835)

Copy link

@JRAlexander JRAlexander Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the link, I would just add this from the test: // The following variables defines the shape of a matrix. Its shape is _synthesizedMatrixRowCount-by-_synthesizedMatrixColumnCount.
// The variable _synthesizedMatrixFirstRowIndex indicates the integer that would be mapped to the first row index. If user data uses
// 0-based indices for rows, _synthesizedMatrixFirstRowIndex can be set to 0. Similarly, for 1-based indices, _synthesizedMatrixFirstRowIndex
// could be 1.
#Resolved

Copy link

@JRAlexander JRAlexander Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an explanation of what you were trying to show with the test, correct? #Resolved

Copy link
Member Author

@wschin wschin Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about data preparation, not directly related the trainer.

[Update] Yes, you're right.


In reply to: 229475909 [](ancestors = 229475909)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better for the user to include the comment instead of the link.

Copy link
Member Author

@wschin wschin Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will add a real C# data structure and comments here. In addition, links to code are removed.

[Update] In addition, we now have a reference to file which may be automatically converted to example by Microsoft's doc site.


In reply to: 229483731 [](ancestors = 229483731)

@sfilipi
Copy link
Member

sfilipi commented Oct 30, 2018

public static class Trainers

if you update now, you'll see this file is gone in favor of having one file per example... #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Static/Trainers.cs:22 in 728f379. [](commit_id = 728f379, deletion_comment = False)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

// line by line.
namespace Microsoft.ML.Samples.Static
{
public partial class TrainerSamples
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TrainerSamples [](start = 25, length = 14)

in Senja code it's TrainersSamples not TrainerSamples #Resolved

Copy link
Member Author

@wschin wschin Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDCA.cs is using TrainerSamples..


In reply to: 229806788 [](ancestors = 229806788)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Senja says up to me. TrainerSamples (this is also used in Dynamic SDCA'cs) looks better because I don't feel we need two "s" in the name.


In reply to: 229807913 [](ancestors = 229807913,229806788)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shauheen
Copy link
Contributor

shauheen commented Oct 31, 2018

please separate code fixes into another PR #Resolved

@wschin wschin merged commit 273e36c into dotnet:master Oct 31, 2018
@wschin wschin deleted the docmf branch October 31, 2018 21:08
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants