Skip to content

Normalize double min and max value returns NaN #2798

Closed
@lisahua

Description

@lisahua

System information

  • OS version/distro: .Net 4.6.2, Win 10
  • .NET Version (eg., dotnet --info): ML.Net nuget 0.10.1

Issue

  • What did you do? Input data has a double columns with (double.min, double.max, <100 random numbers from 0 to 10000>), I call
            var normalizeColumns = numericalFeatures.Select(
                f => new NormalizingEstimator.MeanVarColumn(f.Name, fixZero: false, useCdf: false)).ToArray();
            var normalizingEstimator = this.context.Transforms.Normalize(normalizeColumns);

I see a feature in transformedData.Preview, which is all NaN, for each row. I use SDCA trainer

  • What happened?

The pipeline.Fit(transformedData) fails and throw an exception say "train with 0 instances"

  • What did you expect?
  1. ML.Net should handle double.min, double.max for NormalizingEstimator?
  2. ML.Net should throw a more meaningful exception - "train with 0 instances" for a feature with NaN is a bit misleading - I do have 100+ rows for this feature.

Source code / logs

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

Metadata

Metadata

Labels

need infoThis issue needs more info before triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions