Skip to content

Calculate ReduceSum row by row in ONNX model from OneVsAllTrainer #4904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2020

Conversation

antoniovs1029
Copy link
Member

There's a bug with the ONNX models exported from OneVsAllTrainers that have OutputFormula = OutputFormula.Softmax. (Notice that to the best of my knowledge, only a LightGBM multiclass trainer that had useSoftmax = true would have such an OutputFormula).

Problem was that the SoftMax (particularly the ReduceSum part of it) would be applied by summing the whole input batch, instead of doing separate sums for each row. This PR fixes that.

Notice that this error wasn't presented in our tests, since the OnnxTransformer which applies the ONNX model, actually process one row at a time, so the batch would always consist of one row. When trying to use this model directly with OnnxRuntime API (without the OnnxTransformer), then this problem appeared.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner February 29, 2020 00:19
Copy link
Contributor

@Lynx1820 Lynx1820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for this?

@@ -928,6 +928,8 @@ public override bool SaveAsOnnx(OnnxContext ctx, string[] outputNames, string fe
var sumOutput = ctx.AddIntermediateVariable(NumberDataViewType.Single, "SumOutput");
var sumNode = ctx.CreateNode(opType, expOutput, sumOutput, ctx.GetNodeName(opType), "");
sumNode.AddAttribute("keepdims", 1);
long[] list = { 1 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the graph itself when run with SoftMax. Does it still look broken?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the sigmoid nodes (that are actually never needed for lightgbm multiclass with softmax) are still there, without having their output connected to anything.

I did try to find how to eliminate them, but it seems that various changes are needed in onevsall trainer and also in the calibrators' saveasonnx methods.

I didn't look much into it, since i didn't know if it was worth it, as solving that won't have any effect on the model's output, and it actually isn't related to the issue i am addresing in this PR (if anything, it makes the onnx model architecture uglier and somewhat confusing, but nothing else).

Do you want me to fix that problem? In this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please fix the problem. I will leave it to you whether you want to fix this in a separate PR or this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will address that in a separate PR.

@antoniovs1029
Copy link
Member Author

Is it possible to add a test for this?

@Lynx1820 i believe the onnx tranaformer only works with batches of one row at a time, thus i don't see how can i test this issue in ML.NET tests.

If anything there's already a test for multiclass trainers, which tests lightgbm with "useSoftmax = true". And that test still passes even with my changes.

@harishsk
Copy link
Contributor

It is okay to leave this without a test for now because the changes required to have OnnxTransformer support batch processing are much larger. We will address this later.
As long as all the current tests pass for now, it is okay.

@antoniovs1029 antoniovs1029 merged commit 179343a into dotnet:master Mar 2, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants