Add FixZero for LogMeanVariance normalizer #3916

artidoro · 2019-06-26T23:46:07Z

This PR introduces the FixZero argument to the LogMeanVariance normalizer, a relative tests and a sample.

It's still in WIP because I would like to make a breaking change instead of creating an overload with required parameter FixZero. As soon as the change is accepted and I set up the API Compat tool to accept the breaking change, I should be able to remove the overloads to the MLContext extensions.

wschin · 2019-06-27T16:03:48Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVariance.cs

-                new DataPoint(){ Features = new float[4] { 2, 2, 2, 0} },
-                new DataPoint(){ Features = new float[4] { 0, 0, 1, 0} },
-                new DataPoint(){ Features = new float[4] {-1,-1,-1, 1} }
+                new DataPoint(){ Features = new float[5] { 1, 1, 3, 0, float.MaxValue } },


Is this change related to FixZero? #Resolved

It's related to the original issue posted by Lisa. They were encountering an issue with the MeanVariance normalizer which does not handle data that looks like float.MaxValue, float.MinValue. I think it would be useful to show that LogMeanVariance normalizer accepts this data in the related sample.

In reply to: 298254220 [](ancestors = 298254220)

wschin · 2019-06-27T16:04:53Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVarianceFixZero.cs

+            // Uses Cumulative distribution function as output.
+            var normalize = mlContext.Transforms.NormalizeLogMeanVariance(true, "Features", useCdf: true);
+
+            // NormalizeLogMeanVariance normalizes the data based on the computed mean and variance of the logarithm of the data.


NormalizeLogMeanVariance normalizes the data based on the computed mean and variance of the logarithm of the data. doesn't tell much more than the line below. We need some equations here.

( log(x) - Mean( Log(x) ) ) / Var( log(x) )

maybe? #Resolved

Yes that would be nice I can add that here.

In reply to: 298254677 [](ancestors = 298254677)

As I told you the other day I am still concerned about that equation. I am going to do a separate PR where I add the equations. (Same for the documentation comment in the catalog that you had).
There is no documentation on how to derive them so I am going through the math again and trying to understand the rational.

In reply to: 298311132 [](ancestors = 298311132,298254677)

wschin · 2019-06-27T16:11:04Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVarianceFixZero.cs

+{
+    public class NormalizeLogMeanVarianceFixZero
+    {
+        public static void Example()


You have two independent cases in Example(...). Could you split them into two files? #Pending

I have followed the same pattern as the MeanVariance normalizer and LogMeanVariance normalizer sample. Would you prefer that I split them all into two files? Or should I just split this one? Or keep it like the others for consistency?

In reply to: 298257288 [](ancestors = 298257288)

Separating them is more reader-friendly.

wschin · 2019-06-27T16:13:13Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVarianceFixZero.cs

+            // If we have multiple columns transformations we need to pass index of InputOutputColumnPair.
+            var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as CdfNormalizerModelParameters<ImmutableArray<float>>;
+            Console.WriteLine("The 1-index value in resulting array would be produce by:");
+            Console.WriteLine($"y = 0.5* (1 + ERF((Math.Log(x)- {transformParams.Mean[1]}) / ({transformParams.StandardDeviation[1]} * sqrt(2)))");


We have equation for columnFixZero now! May we have the same thing for column? #Resolved

I just copied the equation from the LogMeanVariance normalizer sample :)
There is an equation for both settings of FixZero and for both Cdf and no Cdf. So I think it's complete.

In reply to: 298258190 [](ancestors = 298258190)

wschin · 2019-06-27T16:13:54Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVarianceFixZero.cs

+
+            // ERF is https://en.wikipedia.org/wiki/Error_function.
+            // Expected output:
+            // The 1 - index value in resulting array would be produce by:


What does 1 - index mean? #Resolved

I can rephrase here but it just means that we show the normalization parameter values for the slot in the array with index 1.

In reply to: 298258499 [](ancestors = 298258499)

wschin · 2019-06-27T16:18:55Z

src/Microsoft.ML.Transforms/NormalizerCatalog.cs

+        /// <param name="maximumExampleCount">Maximum number of examples used to train the normalizer.</param>
+        /// <param name="useCdf">Whether to use CDF as the output.</param>
+        /// <example>
+        /// <format type="text/markdown">


Uses would love equations in documentations. #Resolved

I will add the equations here as well.

In reply to: 298260450 [](ancestors = 298260450)

I am a little concerned about the equations. Yesterday I think I saw an issue. I would like to make sure I understand all the steps and will add those in a separate PR and check this in for now.

In reply to: 298310313 [](ancestors = 298310313,298260450)

artidoro · 2019-06-29T00:13:13Z

/// * Binning - Bucketizes the data in each row and performs a linear rescale based on the calculated bins.

Add equation for log mean variance
both cdf and non cdf

Refers to: src/Microsoft.ML.Data/Transforms/Normalizer.cs:46 in 39b4002. [](commit_id = 39b4002, deletion_comment = False)

codemzs

We reviewed this in person and it looks good to me. As discussed you will be verifying there isn't any mathematical error in the computations.

wschin

My only comment is to add equations to those normalizers.

artidoro requested review from codemzs, wschin and michaelgsharp June 26, 2019 23:46

artidoro self-assigned this Jun 26, 2019

wschin reviewed Jun 27, 2019

View reviewed changes

artidoro changed the title ~~WIP: Add FixZero for LogMeanVariance normalizer~~ Add FixZero for LogMeanVariance normalizer Jun 28, 2019

codemzs approved these changes Jun 29, 2019

View reviewed changes

wschin approved these changes Jun 29, 2019

View reviewed changes

artidoro added 4 commits July 1, 2019 13:32

add fixZero for LogMeanVariance normalizer

9623019

add test and sample reference in catalog

c829ad4

addressing review comments

0527161

adding the normalizer equation

52c19ed

artidoro force-pushed the normalizer branch from 24ae44c to 52c19ed Compare July 1, 2019 20:32

artidoro merged commit f67aab5 into dotnet:master Jul 1, 2019

Dmitry-A pushed a commit to Dmitry-A/machinelearning that referenced this pull request Jul 24, 2019

Add FixZero for LogMeanVariance normalizer (dotnet#3916)

556880d

ghost locked as resolved and limited conversation to collaborators Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FixZero for LogMeanVariance normalizer #3916

Add FixZero for LogMeanVariance normalizer #3916

artidoro commented Jun 26, 2019

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 27, 2019

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 27, 2019

artidoro Jun 28, 2019

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 27, 2019

wschin Jun 29, 2019

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 27, 2019

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 27, 2019

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 27, 2019

artidoro Jun 28, 2019 •

edited

Loading

artidoro commented Jun 29, 2019

codemzs left a comment

wschin left a comment

Add FixZero for LogMeanVariance normalizer #3916

Add FixZero for LogMeanVariance normalizer #3916

Conversation

artidoro commented Jun 26, 2019

wschin Jun 27, 2019 • edited by artidoro Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 27, 2019 • edited by artidoro Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 27, 2019 • edited by artidoro Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 27, 2019 • edited by artidoro Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 27, 2019 • edited by artidoro Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jun 27, 2019 • edited by artidoro Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artidoro Jun 28, 2019 • edited Loading

Choose a reason for hiding this comment

artidoro commented Jun 29, 2019

codemzs left a comment

Choose a reason for hiding this comment

wschin left a comment

Choose a reason for hiding this comment

wschin Jun 27, 2019 •

edited by artidoro

Loading

wschin Jun 27, 2019 •

edited by artidoro

Loading

wschin Jun 27, 2019 •

edited by artidoro

Loading

wschin Jun 27, 2019 •

edited by artidoro

Loading

wschin Jun 27, 2019 •

edited by artidoro

Loading

wschin Jun 27, 2019 •

edited by artidoro

Loading

artidoro Jun 28, 2019 •

edited

Loading