Skip to content

Fixed OnnxTransformer output column mapping. #5192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions src/Microsoft.ML.OnnxTransformer/OnnxTransform.cs
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,17 @@ private static IEnumerable<int> AdjustDimensions(OnnxShape shape)
return new[] { 1 };
}

/// <summary>
/// In the case that the ML.Net user wants a subset of columns or lists the columns in a different order then specified in the ONNX model,
/// we need to map from the ML.Net dataview column index to the ONNX model output index. This method does that mapping.
/// </summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is mapping the dataview column index to onnx output index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// <param name="iinfo">The index of the ML.Net column requested.</param>
/// <returns>The index of ONNX output.</returns>
internal int MapDataViewColumnToOnnxOutputTensor(int iinfo)
{
return Model.ModelInfo.OutputNames.IndexOf(Outputs[iinfo]);
}

private sealed class Mapper : MapperBase
{
private readonly OnnxTransformer _parent;
Expand Down Expand Up @@ -468,7 +479,7 @@ protected override Delegate MakeGetter(DataViewRow input, int iinfo, Func<int, b

var activeOutputColNames = _parent.Outputs.Where((x, i) => activeOutput(i)).ToArray();

if (_parent.Model.ModelInfo.OutputsInfo[iinfo].DataViewType is VectorDataViewType vectorType)
if (_parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].DataViewType is VectorDataViewType vectorType)
{
var elemRawType = vectorType.ItemType.RawType;
var srcNamedValueGetters = GetNamedOnnxValueGetters(input, _inputColIndices, _inputOnnxTypes, _inputTensorShapes);
Expand All @@ -479,7 +490,7 @@ protected override Delegate MakeGetter(DataViewRow input, int iinfo, Func<int, b
}
else
{
var type = _parent.Model.ModelInfo.OutputsInfo[iinfo].DataViewType.RawType;
var type = _parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].DataViewType.RawType;
var srcNamedValueGetters = GetNamedOnnxValueGetters(input, _inputColIndices, _inputOnnxTypes, _inputTensorShapes);
return Utils.MarshalInvoke(MakeObjectGetter<int>, type, input, iinfo, srcNamedValueGetters, activeOutputColNames);
}
Expand Down Expand Up @@ -567,7 +578,7 @@ private Delegate MakeObjectGetter<T>(DataViewRow input, int iinfo, INamedOnnxVal
UpdateCacheIfNeeded(input.Position, srcNamedValueGetters, activeOutputColNames, outputCache);
var namedOnnxValue = outputCache.Outputs[_parent.Outputs[iinfo]];
var trueValue = namedOnnxValue.AsEnumerable<NamedOnnxValue>().Select(value => value.AsDictionary<string, float>());
var caster = _parent.Model.ModelInfo.OutputsInfo[iinfo].Caster;
var caster = _parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].Caster;
dst = (T)caster(namedOnnxValue);
};
return valueGetter;
Expand Down
54 changes: 54 additions & 0 deletions test/Microsoft.ML.OnnxTransformerTest/OnnxTransformTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,60 @@ public void OnnxModelMultiInput()
}
}

[OnnxFact]
public void OnnxModelOutputDifferentOrder()
{
Comment on lines +335 to +337
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can refactor both tests into one, as they're very similar. And include an [InlineData(true) with OnnxModelOutputDifferentOrder(bool checkSubset)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll combine them. I dont think the [InlineData] is needed since I'm testing the output ordering and not the checkSubset` itself though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood what you were saying first. I get it now. The issue with that is that with the subset I dont compare all the columns and with the full column set I do. I just combined the tests and moved the subset pipeline and checking below the full one.

var modelFile = Path.Combine(Directory.GetCurrentDirectory(), "twoinput", "twoinput.onnx");

var dataView = ML.Data.LoadFromEnumerable(
new TestDataMulti[] {
new TestDataMulti()
{
ina = new float[] {1,2,3,4,5},
inb = new float[] {1,2,3,4,5}
}
});
// The model returns the output columns in the order outa, outb. We are doing the opposite here, making sure the name mapping is correct.
var onnx = ML.Transforms.ApplyOnnxModel(new[] { "outb", "outa" }, new[] { "ina", "inb" }, modelFile).Fit(dataView).Transform(dataView);

var outaCol = onnx.Schema["outa"];
var outbCol = onnx.Schema["outb"];
using (var curs = onnx.GetRowCursor(outaCol, onnx.Schema["outb"]))
{
var getScoresa = curs.GetGetter<VBuffer<float>>(outaCol);
var getScoresb = curs.GetGetter<VBuffer<float>>(outbCol);
var buffera = default(VBuffer<float>);
var bufferb = default(VBuffer<float>);

while (curs.MoveNext())
{
getScoresa(ref buffera);
getScoresb(ref bufferb);
Assert.Equal(5, buffera.Length);
Assert.Equal(5, bufferb.Length);
Assert.Equal(0, buffera.GetValues().ToArray().Sum());
Assert.Equal(30, bufferb.GetValues().ToArray().Sum());
}
}

// The model returns the output columns in the order outa, outb. We are doing only a subset, outb, to make sure the mapping works.
onnx = ML.Transforms.ApplyOnnxModel(new[] { "outb" }, new[] { "ina", "inb" }, modelFile).Fit(dataView).Transform(dataView);

outbCol = onnx.Schema["outb"];
using (var curs = onnx.GetRowCursor(outbCol))
{
var getScoresb = curs.GetGetter<VBuffer<float>>(outbCol);
var bufferb = default(VBuffer<float>);

while (curs.MoveNext())
{
getScoresb(ref bufferb);
Assert.Equal(5, bufferb.Length);
Assert.Equal(30, bufferb.GetValues().ToArray().Sum());
}
}
}

[OnnxFact]
public void TestUnknownDimensions()
{
Expand Down