-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Problem with ML.NET RobustScaler #5237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @CBrauer , Thank you for reporting this issue. I see that you are using outdated libraries in your codebase. For example:
Please check out the current ML.NET API to view more, and check if you obtain the same extra column for "vwapGain" with |
Hi @CBrauer , To address your point 1., what I meant by your usage of earlier libraries is that you are using functions like To address your point 2., we did indeed add machinelearning/src/Microsoft.ML.Transforms/NormalizerCatalog.cs Lines 326 to 383 in 4f90006
It is weird that you are not seeing the declared To address your point 3., I do not know what exactly you mean here, but I believe I understand why you are seeing two "vwapGain" columns. The first "vwapGain" column you are seeing is hidden, where the hidden column is only accessible through providing its specific index in the output schema, which is exactly how you are accessing this column. This hidden column(s) is there by design, and the logic behind hidden columns is explained in detail here. In short, the For context, when there exists 2+ columns with the same name, the column with the higher index is visible, and other column(s) are marked as "hidden". If you use a To explain my point above, I have added the following snippet of code in your nSpaces = new int[nColumns];
for (var k = 0; k < nColumns; k++)
{
var isHidden = trainPreview.Schema[k].IsHidden;
for (var j = 0; j < maxCharInHeaderName - isHidden.ToString().Length + 1; j++)
{
Console.Write(" ");
}
Console.Write("isHidden: {0}", isHidden);
nSpaces[k] = maxCharInHeaderName - isHidden.ToString().Length + 1;
}
Console.Write("\n"); Here's the output with my added snippet: As you can see, the first "vwapGain" column is hidden, while the second "vwapGain" column is not, as befits the logic explained above. So, in summary, the problem you're referring to with the extra "vwGain" column, is not a problem, but an intentional design choice. To address your point 4., I am not doing contract work for Microsoft, but I am confused to exactly which errors you are referring to and what complaint you have. As I have done in this specific comment, I am happy to explain any other points you do not yet understand in ML.NET, and/or point you to the right resources. However, as I have explained the reason why you are seeing two "vwapGain" Columns (1 hidden, 1 visible), how you are accessing the hidden column through its index (which is the only way to access this column), and how this hidden column is intended and by design, this issue will remain closed. The non-visibility of |
Excellent explanations. I appreciate it. |
System information
Source code
Program output. Notice that RobustScaler produced an extra column for "vwapGain"
Source code
My test program looks like:
Charles
The text was updated successfully, but these errors were encountered: