Adding KeyToValueTransformers before finishing exporting to ONNX #4841

antoniovs1029 · 2020-02-14T02:14:44Z

When NimbusML deals with category Pandas columns, it automatically convert them into KeyDataViewType columns when sending the input IDataView to ML.NET. On the way back, it convert those columns back to category columns. This is done without using KeyToValueTransformer or ValueToKeyTransformer, so those transformers aren't actually part of the pipeline of the model that's used by NimbusML. Another behavior that NimbusML has, is that for some classifiers it actually adds a KeyToValueTransformer in the pipeline, but it doesn't add a ValueToKeyTransformer at the end, because it relies on its mechanism of converting the output KeyDataViewType columns into the original type of columns.

In general these behaviors aren't a problem, but it causes troubles when exporting models to ONNX. Since the models created by nimbusml don't have the appropriate ValueToKeyTransformers inside them to map values back from keys, then the generated ONNX models won't have the appropriate LabelEncoders to do that mapping.

In this PR I add a KeyToValueTransformers at the end of the transforms list that is used by SaveOnnxCommand, so that the resulting ONNX model can map values back from the keys, even when the NimbusML pipeline doesn't include the transformers to map them back. This is done for two main cases that were identified to affect NimbusML:

For NimbusML pipelines that ended in a binary or multiclass classifier, which output a PredictedLabel KeyDataViewType column.
For scenarios where the model doesn't touch some input columns which were KeyDataViewTypes after NimbusML sent them to ML.NET. In this scenarios, those input columns are expected to be in the output and remained untouched so that NimbusML can automatically map them back. But when using ONNX models generated by NimbusML without the changes in this PR, then the output of this scenario for ONNX are columns that are no longer KeyDataViewTypes, and that contain only keys. So for this scenario it is also necessary to add the corresponding KeyToValueTransformers. That way the user will get back the values that they're expecting, although this conversion wouldn't be done by NimbusML itself, but by the added transformers. The columns that were categorical in NimbusML, and then KeyDataViewTypes, and that remained untouched by the pipeline, are called "pass through" columns inside the changes of this PR, and in other related discussions.

This is thought to be a temporary solution, to unblock Azure AutoML to work with NimbusML. Ideally these transformers would be added in NimbusML pipeline, but currently it's not possible to add transforms after predictors in NimbusML, and doing these changes would also need to change how NimbusML converts between IDataView and Pandas DataFrames. So for the time being the solution in here was found to be the best.

Fixes microsoft/NimbusML#431 and microsoft/NimbusML#426

antoniovs1029 · 2020-02-14T02:17:21Z

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

+
+                    var xf = new KeyToValueMappingTransformer(Host, "PredictedLabel").Transform(scorePipe);
+                    var output = new CommonOutputs.TransformOutput { Model = new TransformModelImpl(Host, xf, scorePipe), OutputData = xf };
+                    end = output.OutputData;
+                    transforms.AddLast(end as ITransformCanSaveOnnx);


So far, ignore everything in this PR except for this lines where I add the KeyToValueMapping at the end of the transforms list.

Here I had hardcoded the name of the PredictedLabel column, but I guess that simply retrieving the KeyDataViewType columns of the scorePipe dataview could be possible, so to then add KeyToValueMappingTransformers for each KeyDataViewType column we might desire.

Ok, reading it again now. I like this idea because I think this code path runs only in the Nimbus scenario. (We should verify that, is is the case). If that is true, then this may unblock us for now. But I would like to be able to solve this for the non-Nimbus scenario as well. If this works for now, we can create a bug for the non-Nimbus scenario and fix it later.

In reply to: 379220940 [](ancestors = 379220940)

Yes, my understanding is that the Run() method where I added my changes only gets executed when running the following entrypoint:

machinelearning/src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

Lines 324 to 329 in 0b4b15b

[TlcModule.EntryPoint(Name = "Models.OnnxConverter", Desc = "Converts the model to ONNX format.", UserName = "ONNX Converter.")]

public static Output Apply(IHostEnvironment env, Arguments input)

{

new SaveOnnxCommand(env, input).Run();

return new Output();

}

So any tool that has access to that entrypoint will be able to run that. I am unsure if NimbusML is the only one who has it (I wonder if MAML, and AutoML/ModelBuilder have access to it). If other tools have access to this, then perhaps adding an overload to Run() with a flag such as "addMappingAtTheEnd = false" might be a good idea.

On the other hand I am not sure how to confirm that this Run() method gets only called through that entrypoint. I might want to ask you about this offline about how to do it.

In reply to: 379256936 [](ancestors = 379256936,379220940)

The command is runnable from MAML.

I added a comment in the NimbusML issue, I am not sure that we should be adding transforms to the ONNX export, that do not exist in the model. If the goal is consistency between the results when scoring in NimbusML or in ONNX, to me it makes more sense to change NimbusML to make the KeyToValue transform a part of the model (otherwise, you will get consistency between NimbusML and ONNX, but not with ML.NET...).

In reply to: 379566192 [](ancestors = 379566192,379256936,379220940)

I did try to add the KeyToValue transform to the pipeline from NimbusML. The problem is that the way NimbusML creates pipelines and run predictions in them, doesn't allow me to add a transform after the predictor. I tried a couple of approaches to make it work, but I couldn't manage to do it only in NimusML side. My conclusion of those attempts was that to add the KeyToValue transform from NimbusML side would also require to change the way that ML.NET executes graphs created in NimbusML.

I believe that @ganik agreed that trying to change it from NimbusML (with the changes that would be required in ML.NET) would require major changes in the codebase, and so it was better to go with the approach I am using here in this PR as a temporary solution.

If you have something in mind to change it in NimbusML side without requiring major changes, please let me know. Thanks!

In reply to: 381234684 [](ancestors = 381234684,379566192,379256936,379220940)

… model

yaeldekel · 2020-02-19T08:43:18Z

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

+
+                    if(rawPred.PredictionKind == PredictionKind.BinaryClassification || rawPred.PredictionKind == PredictionKind.MulticlassClassification)
+                    {
+                        if(scorePipe.Schema.GetColumnOrNull("PredictedLabel")?.Type is KeyDataViewType)


is KeyDataViewType [](start = 84, length = 18)

This still doesn't mean that the KeyToValueMappingTransformer will work - there can be key type columns that don't have key value annotations. I think there is a utility method called something like HasKeyNames that you can use.

The HasKeyValues method requires me to specify the keyValueItemType. Does anyone know if there's a way to avoid having to specify it, since in here I don't care about the keyValueItemType, as long as it has key value annotations.? @ganik @harishsk (if I leave it null, it defaults to be TextDataViewType)

machinelearning/src/Microsoft.ML.Data/Data/SchemaAnnotationsExtensions.cs

Lines 48 to 63 in b34c3b6

public static bool HasKeyValues(this DataViewSchema.Column column, PrimitiveDataViewType keyValueItemType = null)

{

// False if type is neither a key type, or a vector of key types.

if (!(column.Type.GetItemType() is KeyDataViewType keyType))

return false;

if (keyValueItemType == null)

keyValueItemType = TextDataViewType.Instance;

var metaColumn = column.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.KeyValues);

return

metaColumn != null

&& metaColumn.Value.Type is VectorDataViewType vectorType

&& keyType.Count == (ulong)vectorType.Size

&& keyValueItemType.Equals(vectorType.ItemType);

}

How about you do something like:

ReadOnlyMemory tmp = default;
if (input.Data.Schema.TryGetAnnotation(TextDataViewType.Instance, AnnotationUtils.Kinds.ScoreValueKind, i, ref tmp) && ReadOnlyMemoryUtils.EqualsStr(AnnotationUtils.Const.ScoreValueKind.PredictedLabel, tmp))
{

In reply to: 382339571 [](ancestors = 382339571)

More general advice: search for "ScoreValueKind.PredictedLabel" and see how in other places we check for such column

In reply to: 382745605 [](ancestors = 382745605,382339571)

You can look at the implementation of HasKeyValues and filter out the checks that are relying on keyValueItemType . e.g.

if (!(column.Type.GetItemType() is KeyDataViewType keyType)) return false; var metaColumn = column.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.KeyValues); return metaColumn != null && metaColumn.Value.Type is VectorDataViewType vectorType && keyType.Count == (ulong)vectorType.Size)

In reply to: 382339571 [](ancestors = 382339571)

I've decided to go with @harishsk suggestion. Please, @yaeldekel let me know if you know about another way of doing this.

In reply to: 382804341 [](ancestors = 382804341,382339571)

harishsk · 2020-02-19T17:44:30Z

I am in the process of adding SlotNames annotations to the exported onnx model. If that works, we should try to use the same technique to add KeyValues annotations also.

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

najeeb-kazmi

LGTM pending the discussion on key type columns having key values.

Plus one comment on using the default column name.

…eyValues annotations and added explanatory comments

antoniovs1029 · 2020-02-22T02:40:35Z

I am running with some problems when using this fix with BinaryClassification models coming from NimbusML. For some reason the exported ONNX model can not be loaded back, as it throws an exception when attempting to do that. It seems that, for some reason, exporting KeyToValueTransformers to ONNX, after a binary classification pipeline, is the problem.

The error message looks as follows:

'[ErrorCode:Fail] Load model from C:\Users\anvelazq\AppData\Local\Temp\tmpxaxztxa2.onnx failed:Node:Identity9 Output:PredictedLabel.output [ShapeInferenceError] Mismatch between number of source and target dimensions. Source=0 Target=2'Error: *** Microsoft.ML.OnnxRuntime.OnnxRuntimeException: '[ErrorCode:Fail] Load model from C:\Users\anvelazq\AppData\Local\Temp\tmpxaxztxa2.onnx failed:Node:Identity9 Output:PredictedLabel.output [ShapeInferenceError] Mismatch between number of source and target dimensions. Source=0 Target=2'
Estimator successfully exported to ONNX.

Notice that I get a similar error message when exporting to ONNX an ML.NET OVA model (without NimbusML) that is followed by a KeyToValueTransformer. Wonder if there's a connection with the fact that OVA uses Binary classifiers.

On the other hand, I haven't been able to reproduce the above error message using only ML.NET (without NimbusML), because apparently there's no way of training Binary classification trainers with a Label column that is of KeyDataViewType (exceptions are thrown if I attempt to do this)... it seems they only work with Boolean Label columns. I don't know why is the reason for this, since NimbusML actually converts non-Boolean labels to KeyDataViewType behind the scenes, and the BinaryClassificationTrainer works with them as expected. Apparently different code paths are followed on each case that stop ML.NET from being capable of doing the same as NimbusML. I wonder if this difference in design has anything to do with the fact that the resulting ONNX model is, for some reason, not well-formed causing the exception above.

antoniovs1029 · 2020-02-24T21:51:20Z

I am running with some problems when using this fix with BinaryClassification models coming from NimbusML. For some reason the exported ONNX model can not be loaded back, as it throws an exception when attempting to do that.

This problem, along with problems with OVA models using this PR, has been fixed by #4878 😃

…tering

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

yaeldekel · 2020-02-24T23:45:28Z

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs

+
+            passThroughColumnNames.IntersectWith(inputKeyDataViewTypeColumnsNames); // Only count those columns that were in the input of the pipeline
+
+            foreach (var name in passThroughColumnNames)


passThroughColumnNames [](start = 33, length = 22)

Here is another edge case: If the input has a categorical column, say A, and the pipeline has a CopyTransform that copies A into a column named B.
The way it is written now, we will not apply key-to-value on B. Should we?
On one hand, we should, because A and B should be copies of each other, so it does not make sense if the ONNX model has different types and values for them... On the other hand, I don't see an easy rule that would help decide when we do need to apply key-to-value to B (when it is the output of Copy), versus when we don't (for example, when it is the output of ToKey).

Similar issue with Concat: say the input has two categorical columns A and B, and the pipeline has a concat transform that concatenates them to produce a column C...

In reply to: 383580868 [](ancestors = 383580868)

Ideally I think we should apply KeyToValueTransformer to B. But, as you say, I can't think of an easy way to decide when to do this when it's the output of Copy. So I wouldn't know if we should handle this particular edge case.

Thoughts, @ganik , @harishsk ?

agree, that is a bit strange side-effect. Perhaps in next iterations we can get more details on how AutoML prefers this. Maybe we wont need to add KeyToValue for passthrough columns at all and let them surface as keys. For now we can document this.

ganik

Experimentations

1a826dc

antoniovs1029 requested a review from a team as a code owner February 14, 2020 02:14

antoniovs1029 commented Feb 14, 2020

View reviewed changes

antoniovs1029 added 6 commits February 13, 2020 18:26

Removing MyTest method

009a84f

Removing unused comment

f11c197

Fixing a change a made by mistake in an unrelated test

3a972f5

Added test for debugging purposes

3e432a1

Add support to map back keys from columns that where untouched by the…

11b84d0

… model

Added security checks and removed redundant steps

a786d9f

yaeldekel reviewed Feb 19, 2020

View reviewed changes

antoniovs1029 mentioned this pull request Feb 19, 2020

ONNX model for FastLinearClassifier predictions are 1 based vs 0 based microsoft/NimbusML#431

Closed

antoniovs1029 added 4 commits February 20, 2020 16:53

Added todo comment and formatting

cfa264d

Skip Custom tests on CI

86fb14a

Merge remote-tracking branch 'upstream/master' into is23onnxnimbus

afec6d6

Removed my custom tests

3e5b2bd

antoniovs1029 mentioned this pull request Feb 21, 2020

ONNX models dont run for Handler, Indicator & others microsoft/NimbusML#426

Closed

Merge remote-tracking branch 'upstream/master' into is23onnxnimbus

28e447d

najeeb-kazmi reviewed Feb 21, 2020

View reviewed changes

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs Outdated Show resolved Hide resolved

najeeb-kazmi approved these changes Feb 21, 2020

View reviewed changes

Added security checks to confirm that the PredictedLabel column has K…

8864d8e

…eyValues annotations and added explanatory comments

Merge remote-tracking branch 'upstream/master' into is23onnxnimbus

d382373

antoniovs1029 added 2 commits February 24, 2020 15:14

Added HasKeyValues helper method and changed pass-through columns fil…

ff9650e

…tering

Fix extra space

953ad7b

antoniovs1029 requested review from yaeldekel, ganik and harishsk February 24, 2020 23:16

Merge remote-tracking branch 'upstream/master' into is23onnxnimbus

9c63f1a

yaeldekel reviewed Feb 24, 2020

View reviewed changes

src/Microsoft.ML.OnnxConverter/SaveOnnxCommand.cs Outdated Show resolved Hide resolved

yaeldekel reviewed Feb 24, 2020

View reviewed changes

Corrected and refactored GetPassThroughKeyDataViewTypeColumnsNames logic

d70843f

antoniovs1029 changed the title ~~[Draft] Adding KeyToValueTransformers before finishing exporting to ONNX~~ Adding KeyToValueTransformers before finishing exporting to ONNX Feb 25, 2020

ganik approved these changes Feb 25, 2020

View reviewed changes

harishsk mentioned this pull request Feb 25, 2020

Slightly simplified version of adding KeyToValue for onnx export #4881

Closed

antoniovs1029 merged commit 4493397 into dotnet:master Feb 25, 2020

antoniovs1029 mentioned this pull request Mar 2, 2020

Output of Label Column when applying ONNX model is not as expected microsoft/NimbusML#460

Open

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding KeyToValueTransformers before finishing exporting to ONNX #4841

Adding KeyToValueTransformers before finishing exporting to ONNX #4841

antoniovs1029 commented Feb 14, 2020 •

edited

Loading

antoniovs1029 Feb 14, 2020

harishsk Feb 14, 2020

antoniovs1029 Feb 14, 2020 •

edited

Loading

yaeldekel Feb 19, 2020

antoniovs1029 Feb 19, 2020 •

edited

Loading

yaeldekel Feb 19, 2020

antoniovs1029 Feb 21, 2020

ganik Feb 21, 2020

ganik Feb 21, 2020

harishsk Feb 21, 2020

antoniovs1029 Feb 22, 2020

harishsk commented Feb 19, 2020

najeeb-kazmi left a comment

antoniovs1029 commented Feb 22, 2020

antoniovs1029 commented Feb 24, 2020

yaeldekel Feb 24, 2020

yaeldekel Feb 24, 2020

antoniovs1029 Feb 25, 2020

ganik Feb 25, 2020

ganik left a comment

	[TlcModule.EntryPoint(Name = "Models.OnnxConverter", Desc = "Converts the model to ONNX format.", UserName = "ONNX Converter.")]
	public static Output Apply(IHostEnvironment env, Arguments input)
	{
	new SaveOnnxCommand(env, input).Run();
	return new Output();
	}

	public static bool HasKeyValues(this DataViewSchema.Column column, PrimitiveDataViewType keyValueItemType = null)
	{
	// False if type is neither a key type, or a vector of key types.
	if (!(column.Type.GetItemType() is KeyDataViewType keyType))
	return false;

	if (keyValueItemType == null)
	keyValueItemType = TextDataViewType.Instance;

	var metaColumn = column.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.KeyValues);
	return
	metaColumn != null
	&& metaColumn.Value.Type is VectorDataViewType vectorType
	&& keyType.Count == (ulong)vectorType.Size
	&& keyValueItemType.Equals(vectorType.ItemType);
	}


		passThroughColumnNames.IntersectWith(inputKeyDataViewTypeColumnsNames); // Only count those columns that were in the input of the pipeline

		foreach (var name in passThroughColumnNames)

Adding KeyToValueTransformers before finishing exporting to ONNX #4841

Adding KeyToValueTransformers before finishing exporting to ONNX #4841

Conversation

antoniovs1029 commented Feb 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoniovs1029 Feb 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoniovs1029 Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harishsk commented Feb 19, 2020

najeeb-kazmi left a comment

Choose a reason for hiding this comment

antoniovs1029 commented Feb 22, 2020

antoniovs1029 commented Feb 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganik left a comment

Choose a reason for hiding this comment

antoniovs1029 commented Feb 14, 2020 •

edited

Loading

antoniovs1029 Feb 14, 2020 •

edited

Loading

antoniovs1029 Feb 19, 2020 •

edited

Loading