-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Adding the initial prototype of a DatabaseLoader #4035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Infer source index if not specified. Add initial public API. Fix bug in GetIdGetter
CC. @eerhardt |
using System.Runtime.CompilerServices; | ||
using Microsoft.ML; | ||
|
||
[assembly: InternalsVisibleTo(assemblyName: "Microsoft.ML.Tests" + PublicKey.TestValue)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, we should change the test to use the public APIs. Basically new DatabaseLoader
=> mlContext.Data.CreateDatabaseLoader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only comment I have left on this PR. If this is resolved, and all the builds pass, I think we will be ready to merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. (sorry forgot to update the comment)
This is still needed for converting from the DbType to the InternalKind to the public DataKind.
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoaderCursor.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
{ | ||
var mlContext = new MLContext(seed: 1); | ||
|
||
var connectionString = @"Server=(localdb)\mssqllocaldb;Database=EFGetStarted.AspNetCore.NewDb;Trusted_Connection=True;ConnectRetryCount=0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will either need to disable this test, or change it so it doesn't rely on machine state/configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a MockDbConnection
, MockDbCommand
, and MockDbDataReader
class which wrap a TextLoader
. I would guess this needs some cleanup to support more testing scenarios and to match w/e conventions ML.NET is currently using.
"SELECT SepalLength, SepalWidth, PetalLength, PetalWidth, Label FROM IrisData", | ||
connection)) | ||
{ | ||
DatabaseLoader loader = new DatabaseLoader(mlContext, new DatabaseLoader.Options() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to that way using a Columns array, when possible, can we have another test (and implementation if not ready yet) that is using a convenient data-model class?
} | ||
}); | ||
|
||
IDataView trainingData = loader.Load(() => command.ExecuteReader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have this foundational API we also need to add it to the MLContext catalog, so having convenient methods such as:
mlContext.Data.LoadFromDbDataReader<ModelInputData>(() => command.ExecuteReader());
or
IDataView trainingDataView = mlContext.Data.LoadFromDbTable<ModelInputData, SqlConnection>(connString: myConnString,
tableName: "TrainingDataTable");
I put a couple of comments but those asks don't need to be implemented before merging this first version. It could come later. :) |
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoaderCursor.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's get this in and iterate on it.
This adds the initial barebones prototype for DatabaseLoader to the Microsot.ML.Experimental project.