-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Calculate ReduceSum row by row in ONNX model from OneVsAllTrainer #4904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a test for this?
@@ -928,6 +928,8 @@ public override bool SaveAsOnnx(OnnxContext ctx, string[] outputNames, string fe | |||
var sumOutput = ctx.AddIntermediateVariable(NumberDataViewType.Single, "SumOutput"); | |||
var sumNode = ctx.CreateNode(opType, expOutput, sumOutput, ctx.GetNodeName(opType), ""); | |||
sumNode.AddAttribute("keepdims", 1); | |||
long[] list = { 1 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the graph itself when run with SoftMax. Does it still look broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the sigmoid nodes (that are actually never needed for lightgbm multiclass with softmax) are still there, without having their output connected to anything.
I did try to find how to eliminate them, but it seems that various changes are needed in onevsall trainer and also in the calibrators' saveasonnx methods.
I didn't look much into it, since i didn't know if it was worth it, as solving that won't have any effect on the model's output, and it actually isn't related to the issue i am addresing in this PR (if anything, it makes the onnx model architecture uglier and somewhat confusing, but nothing else).
Do you want me to fix that problem? In this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please fix the problem. I will leave it to you whether you want to fix this in a separate PR or this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will address that in a separate PR.
@Lynx1820 i believe the onnx tranaformer only works with batches of one row at a time, thus i don't see how can i test this issue in ML.NET tests. If anything there's already a test for multiclass trainers, which tests lightgbm with "useSoftmax = true". And that test still passes even with my changes. |
It is okay to leave this without a test for now because the changes required to have OnnxTransformer support batch processing are much larger. We will address this later. |
There's a bug with the ONNX models exported from
OneVsAllTrainers
that haveOutputFormula = OutputFormula.Softmax
. (Notice that to the best of my knowledge, only a LightGBM multiclass trainer that haduseSoftmax = true
would have such anOutputFormula
).Problem was that the SoftMax (particularly the
ReduceSum
part of it) would be applied by summing the whole input batch, instead of doing separate sums for each row. This PR fixes that.Notice that this error wasn't presented in our tests, since the
OnnxTransformer
which applies the ONNX model, actually process one row at a time, so the batch would always consist of one row. When trying to use this model directly with OnnxRuntime API (without theOnnxTransformer
), then this problem appeared.