-
Notifications
You must be signed in to change notification settings - Fork 1.9k
AutoML Add Recommendation Task #4246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoML Add Recommendation Task #4246
Conversation
…odeGen (dotnet#4043) * add CodeGen Library * rename namespace to ML.CodeGen.* * cancel delay sign in CodeGen * update based on comment * remove useless nuget package * add ComsumeModel class * use consumeModel in CodeGen * use different annotation for different target * target to netstandard2.0 * remove console output * adjust output result * remove useless variable * update CodeGen name to CodeGenerator * rebase to latest branch * fix bug in Normalize function * fix test * rename features to featuresList * move enum out of class * remove useless items * remove NLog.config * update generated CSharp file * change wording and delete useless file * using Uppercase in comment
…arning into dotnet-features/automl
@@ -13,6 +13,7 @@ internal enum ColumnPurpose | |||
TextFeature = 4, | |||
Weight = 5, | |||
ImagePath = 6, | |||
SamplingKey = 7 | |||
SamplingKey = 7, | |||
LabelFeature, // CategoricalFeature that requires ValueToKey converter, better naming? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want to use HashToKey
(name be off) instead of the mentioned ValueToKey
as the ValueToKey
will map future unseen values to NA in your test dataset; and as a lesser issue is slow by taking a full pass of the dataset.
src/Microsoft.ML.AutoML/TransformInference/TransformInference.cs
Outdated
Show resolved
Hide resolved
docs/samples/Microsoft.ML.AutoML.Samples/RecommendationExperiment.cs
Outdated
Show resolved
Hide resolved
docs/samples/Microsoft.ML.AutoML.Samples/RecommendationExperiment.cs
Outdated
Show resolved
Hide resolved
docs/samples/Microsoft.ML.AutoML.Samples/RecommendationExperiment.cs
Outdated
Show resolved
Hide resolved
- remove Label column for recommendation: not required to set up
docs/samples/Microsoft.ML.AutoML.Samples/RecommendationExperiment.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.AutoML/TransformInference/TransformInference.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.AutoML/TransformInference/TransformInference.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good, @maryamariyan. Thanks for the great work!
LGTM
Trains Recommendation models able to predict rating for existing users Conflicts: pkg/Microsoft.ML.AutoML/Microsoft.ML.AutoML.nupkgproj src/Microsoft.ML.AutoML/Microsoft.ML.AutoML.csproj test/Microsoft.ML.AutoML.Tests/AutoFitTests.cs test/Microsoft.ML.AutoML.Tests/ColumnInferenceTests.cs test/Microsoft.ML.AutoML.Tests/ColumnInformationUtilTests.cs test/Microsoft.ML.AutoML.Tests/Microsoft.ML.AutoML.Tests.csproj test/Microsoft.ML.AutoML.Tests/TrainerExtensionsTests.cs test/Microsoft.ML.AutoML.Tests/TransformInferenceTests.cs test/Microsoft.ML.AutoML.Tests/UserInputValidationTests.cs
Trains Recommendation models able to predict rating for existing users
What's already be done in this PR
Recommendation
task and experiment inAutoML
MatrixFactorization
asMatrixFactorizationExtension
LabelFeature
) and it's corresponding TransformerExtension (LabelCategorical
) so thatAutoML
can construct the pre-process pipeline forMatrixFactorizationExtension
correctlyAutoML.Example
, and you can play with that!What's need to be done (Feel Free to CRUD)
MatrixFactorization
requires more time to train a round, and the algorithm for sweeping params requires to train many rounds to find out the best parameter. It's time costy and customers might not like that.CodeGen
partAutoML
(it requires some refactor works and shouldn't be done in this PR. But it's important)