-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add FixZero for LogMeanVariance normalizer #3916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
new DataPoint(){ Features = new float[4] { 2, 2, 2, 0} }, | ||
new DataPoint(){ Features = new float[4] { 0, 0, 1, 0} }, | ||
new DataPoint(){ Features = new float[4] {-1,-1,-1, 1} } | ||
new DataPoint(){ Features = new float[5] { 1, 1, 3, 0, float.MaxValue } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to FixZero
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's related to the original issue posted by Lisa. They were encountering an issue with the MeanVariance normalizer which does not handle data that looks like float.MaxValue
, float.MinValue
. I think it would be useful to show that LogMeanVariance normalizer accepts this data in the related sample.
In reply to: 298254220 [](ancestors = 298254220)
// Uses Cumulative distribution function as output. | ||
var normalize = mlContext.Transforms.NormalizeLogMeanVariance(true, "Features", useCdf: true); | ||
|
||
// NormalizeLogMeanVariance normalizes the data based on the computed mean and variance of the logarithm of the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeLogMeanVariance normalizes the data based on the computed mean and variance of the logarithm of the data.
doesn't tell much more than the line below. We need some equations here.
( log(x) - Mean( Log(x) ) ) / Var( log(x) )
maybe? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I told you the other day I am still concerned about that equation. I am going to do a separate PR where I add the equations. (Same for the documentation comment in the catalog that you had).
There is no documentation on how to derive them so I am going through the math again and trying to understand the rational.
In reply to: 298311132 [](ancestors = 298311132,298254677)
{ | ||
public class NormalizeLogMeanVarianceFixZero | ||
{ | ||
public static void Example() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have two independent cases in Example(...)
. Could you split them into two files? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have followed the same pattern as the MeanVariance
normalizer and LogMeanVariance
normalizer sample. Would you prefer that I split them all into two files? Or should I just split this one? Or keep it like the others for consistency?
In reply to: 298257288 [](ancestors = 298257288)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating them is more reader-friendly.
// If we have multiple columns transformations we need to pass index of InputOutputColumnPair. | ||
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as CdfNormalizerModelParameters<ImmutableArray<float>>; | ||
Console.WriteLine("The 1-index value in resulting array would be produce by:"); | ||
Console.WriteLine($"y = 0.5* (1 + ERF((Math.Log(x)- {transformParams.Mean[1]}) / ({transformParams.StandardDeviation[1]} * sqrt(2)))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have equation for columnFixZero
now! May we have the same thing for column
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied the equation from the LogMeanVariance normalizer sample :)
There is an equation for both settings of FixZero and for both Cdf and no Cdf. So I think it's complete.
In reply to: 298258190 [](ancestors = 298258190)
|
||
// ERF is https://en.wikipedia.org/wiki/Error_function. | ||
// Expected output: | ||
// The 1 - index value in resulting array would be produce by: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 1 - index
mean? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rephrase here but it just means that we show the normalization parameter values for the slot in the array with index 1.
In reply to: 298258499 [](ancestors = 298258499)
/// <param name="maximumExampleCount">Maximum number of examples used to train the normalizer.</param> | ||
/// <param name="useCdf">Whether to use CDF as the output.</param> | ||
/// <example> | ||
/// <format type="text/markdown"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses would love equations in documentations. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little concerned about the equations. Yesterday I think I saw an issue. I would like to make sure I understand all the steps and will add those in a separate PR and check this in for now.
In reply to: 298310313 [](ancestors = 298310313,298260450)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reviewed this in person and it looks good to me. As discussed you will be verifying there isn't any mathematical error in the computations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comment is to add equations to those normalizers.
Fixes #2798.
This PR introduces the
FixZero
argument to theLogMeanVariance
normalizer, a relative tests and a sample.It's still in WIP because I would like to make a breaking change instead of creating an overload with required parameter
FixZero
. As soon as the change is accepted and I set up the API Compat tool to accept the breaking change, I should be able to remove the overloads to theMLContext
extensions.