Skip to content

Context.Data.CreateTextLoader<T> throws error Can't determine the number of source columns without valid data #3705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PeterPann23 opened this issue May 11, 2019 · 5 comments · Fixed by #3788
Assignees
Labels
bug Something isn't working P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.

Comments

@PeterPann23
Copy link

System information

  • OS version/distro: Windows 10
  • .NET Version (eg., dotnet --info):
    AutoML 0.3.0
    Microsoft.ML 1.0.0.0

Issue

I thought one can map a class and load it with the annotation but get an error when calling it on a parameter that's not available for me.

  • What did you do?
    I wrote a little test to play with Microsoft.ML.AutoML and test it against a label and a vector
    I add my litle Program
    where I call in the ticket

basically I cal:

public static IDataView GetDataView<T>(MLContext mlContext, FileInfo trainingFile)
{
    var loader = mlContext.Data.CreateTextLoader<T>(separatorChar: '|', hasHeader: false);
    return loader.Load(trainingFile.FullName);
            
}
public class Data
{
     [LoadColumn(0)]
     public string Label { get; }

     [LoadColumn(1, 40_731)]
     [VectorType(40_730)]
     public float[] Features { get; }
}
  • What happened?
    I get an error stating

System.ArgumentOutOfRangeException: 'Can't determine the number of source columns without valid data
Parameter name: Source'

Stack

   at Microsoft.ML.Data.TextLoader.Bindings..ctor(TextLoader parent, Column[] cols, IMultiStreamSource headerFile, IMultiStreamSource dataSample)
   at Microsoft.ML.Data.TextLoader..ctor(IHostEnvironment env, Options options, IMultiStreamSource dataSample)
   at Microsoft.ML.Data.TextLoader.CreateTextLoader[TInput](IHostEnvironment host, Boolean hasHeader, Char separator, Boolean allowQuoting, Boolean supportSparse, Boolean trimWhitespace, IMultiStreamSource dataSample)
   at ConsoleMLWizard.Program.GetDataView[T](MLContext mlContext, FileInfo trainingFile) in C:\Users\W2307\source\repos\ConsoleMLWizard\Program.cs:line 99
   at ConsoleMLWizard.Program.Main(String[] args) in C:\Users\W2307\source\repos\ConsoleMLWizard\Program.cs:line 37
  • What did you expect?

I expected the anotations to work, I can load the data fine using:

var loader = context.Data.CreateTextLoader(options: new TextLoader.Options()
{
    Columns = new[] {
        new TextLoader.Column(name:"Label", dataKind: DataKind.String, index: 0),
        new TextLoader.Column(name:"Features",dataKind:DataKind.Single,minIndex:0,maxIndex:40731)
    },
    HasHeader = false,
    Separators = new[] { ',' },
                
});
var data = loader.Load(dataFile.FullName);

Source code / logs

@srsaggam
Copy link
Member

srsaggam commented May 12, 2019

Could it be because of _ in your
[LoadColumn(1, 40_731)]

Try [LoadColumn(1, 40731)] ??

Update: My bad! this is a new notation for digit separator.

@PeterPann23
Copy link
Author

PeterPann23 commented May 12, 2019

Why would it, it's a valid integer and part of the new notation named digit separator are you not able to repoduce the error?

@glebuk glebuk added bug Something isn't working ❤ Community labels May 13, 2019
@artidoro artidoro self-assigned this May 16, 2019
@codemzs codemzs added the P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away. label May 22, 2019
@artidoro
Copy link
Contributor

So I think the solution to your issue is that your Data class should have both get and set auto-properties. So it should be defined as follows:

public static IDataView GetDataView<T>(MLContext mlContext, FileInfo trainingFile)
{
    var loader = mlContext.Data.CreateTextLoader<T>(separatorChar: '|', hasHeader: false);
    return loader.Load(trainingFile.FullName);
            
}
public class Data
{
     [LoadColumn(0)]
     public string Label { get; set; }

     [LoadColumn(1, 40_731)]
     [VectorType(40_730)]
     public float[] Features { get; set; }
}

@artidoro
Copy link
Contributor

artidoro commented May 29, 2019

Let me know if that solves the problem, and in the meantime I'll try to see if we can catch the error earlier, or improve the message.

@PeterPann23
Copy link
Author

LOL what a dumb mistake to make, thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants