New ONNX converter interface and some tests #2013

wschin · 2019-01-03T22:13:53Z

This PR create a ONNX converter targeting ITransformer. A full example is added as a test; we train a pipeline (normalize + SDCA) using new APIs, convert the trained model, and call ONNXRuntime to evaluate it. In addition, we recreate existing ONNX conversion tests using new APIs to ensure at least identical test coverage after removing ONNX tests using legacy APIs.

Working item summary:

Create new ONNX converter interface which is able to handle ITransformer.
Remove OnnxTest.cs
Recreate ONNX conversions in OnnxTest.cs using new APIs.
Add extra end-to-end tests. We train ML.NET model, convert it to ONNX format, and call ONNX runtime to execute it. The results produced by ONNX runtime and ML.NET's original model are checked.

TomFinley · 2019-01-04T05:57:29Z

src/Microsoft.ML.Onnx/TransformerChainOnnxConverter.cs

+
+namespace Microsoft.ML.Model.Onnx
+{
+    public class TransformerChainOnnxConverter


public class [](start = 4, length = 12)

static class #Resolved

Done. This class is renamed to ProtabilityCatalog.

In reply to: 245213997 [](ancestors = 245213997)

TomFinley · 2019-01-04T06:01:24Z

src/Microsoft.ML.Onnx/TransformerChainOnnxConverter.cs

+{
+    public class TransformerChainOnnxConverter
+    {
+        public static ModelProto Convert<T>(TransformerChain<T> chain, IDataView inputData, HashSet<string> inputColumnNamesToDrop=null, HashSet<string> outputColumnNamesToDrop=null) where T : class, ITransformer


TransformerChain chain [](start = 44, length = 25)

Is there a reason to do this? If you wanted specialized behavior for the transformer chain, would it be sufficient to just check if it is a transformer chain in the other method, and not make this public?

The presence of a specialized overload for transformer chain specifically makes me suspect that you're having different behavior for that. But this makes me wonder, since a TransformerChain<T> is also an ITransformer, does this mean that if I were to just feed it in as such below, that it would not work, or have a flaw of some kind? In which case, the method below we'd say is buggy. (Since it purports to work over any ITransformer. But if it does work, then why have a separate method anyway? So I'd put that logic (if indeed any must exist) on the second method, and not expose whatever implementation specific type polymorphism to the user. Since they shouldn't have to care about that. #Resolved

I will delete the version for TransformerChain<T>.

In reply to: 245214766 [](ancestors = 245214766)

codemzs · 2019-01-04T06:01:32Z

Thanks, @wschin. The one main issue that I see is you are testing for just Windows and not other platforms. You are also not checking if the produced ONNX JSON file is matching the baseline JSON file, by not doing this we will lose existing code coverage that we have with current tests. Is there a way we can use existing SaveONNXCommand by somehow creating a wrapper around it? The reason I ask this is because it can convert an IDataView and Model zip file to ONNX but your current implementation just converts the in memory pipeline and does not even allow to convert to JSON representation.

@TomFinley, What are your thoughts?

TomFinley · 2019-01-04T06:05:16Z

src/Microsoft.ML.Onnx/SaveOnnxCommand.cs

+        }
+
+        [BestFriend]
+        internal static ModelProto ConvertTransformListToOnnxModel(OnnxContextImpl ctx, IDataView inputData, IDataView outputData,


Why [BestFriend]? I don't see that you're exposing this elsewhere. #Resolved

I will remove it. Thanks.

In reply to: 245215299 [](ancestors = 245215299)

TomFinley · 2019-01-04T06:07:05Z

src/Microsoft.ML.Onnx/TransformerChainOnnxConverter.cs

+{
+    public class TransformerChainOnnxConverter
+    {
+        public static ModelProto Convert<T>(TransformerChain<T> chain, IDataView inputData, HashSet<string> inputColumnNamesToDrop=null, HashSet<string> outputColumnNamesToDrop=null) where T : class, ITransformer


ModelProto [](start = 22, length = 10)

Our typical approach for new capabilities like this is that we'd, as much as possible, make operations like this. How do we current expose saving an ITransformer? Through ModelOperationsCatalog. This should have likewise been an extension method on ModelOperationsCatalog. (And, like all other things like this, in namespace Microsoft.ML.) That was this functionality gets discoverable through MLContext, which is our strategy for things like this so far.

A fix is pushed. Please take a look at ProtabilityCatalog.cs and ModelOperationsCatalog.cs in Iteration 4.

In reply to: 245215676 [](ancestors = 245215676)

TomFinley · 2019-01-04T06:10:30Z

src/Microsoft.ML.Onnx/TransformerChainOnnxConverter.cs

+{
+    public class TransformerChainOnnxConverter
+    {
+        public static ModelProto Convert<T>(TransformerChain<T> chain, IDataView inputData, HashSet<string> inputColumnNamesToDrop=null, HashSet<string> outputColumnNamesToDrop=null) where T : class, ITransformer


HashSet inputColumnNamesToDrop=null, HashSet outputColumnNamesToDrop [](start = 92, length = 84)

While I know why we have these from the command's perspective (since that kind of has to be a one-stop-shop) to do it from the API's perspective is incorrect, since it complicates the call enormously. If someone wants to drop a column on inputData, they should just do that as they do literally everywhere else in the API... by dropping that column. Similarly, it seems that if someone does not want to include transform outputs produced, they ought to add something to the transform chain (again, dropping columns) to limit the output to only those columns they care about. At least, so I think.

Sounds good. I will make those parameter not accessible to users. On the other hand, I am not sure if I can reproduce some old tests without the two argument, so having an internal version with the two arguments might be a backup plan.

In reply to: 245216200 [](ancestors = 245216200)

wschin · 2019-01-04T06:50:58Z

Those tests have no json file which can be reused because (1) no such a test before or (2) json file gets changed because random seed mechanisms are different in new and old contexts (so it's impossible to reproduce the model used before). If you really want json files, we will have to (1) accept those tests first, (2) produce json files from here, and (3) make other tests for comparing those text outputs. Honestly, comparing text outputs do not make a lot sense for me because the correctness of a conversion can only determined by its output values.

This is a wrapper around SaveOnnxCommand.

I don't understand the need of JSON. We create protobuf object so as long as protobuf package is available, user can convert the output model to text naturally.

In reply to: 451359788 [](ancestors = 451359788)

codemzs · 2019-01-04T07:20:20Z

I will keep this simple and outline the goals for this work:

We do NOT want to regress code coverage.
We want our tests to run on Windows, MacOS and Linux just as previous ONNX conversion test ran.
You are not exercising code that exports to json format for a model that is done at

machinelearning/test/Microsoft.ML.Tests/OnnxTests.cs

Line 150 in 9a6711b

public void BinaryClassificationFastTreeSaveModelToOnnxTest()

I can tell with a lot of confidence this PR regresses the current code coverage and I will be happy to show you tomorrow with code coverage tool.

In reply to: 451365439 [](ancestors = 451365439,451359788)

wschin · 2019-01-04T08:01:24Z

My argument doesn't conflict with yours. It's ok to add json but I will have to check in those tests before doing any json. Those models need to be tested by those windows-only tests before being saved as jsons. That also means, we will have at least two batchs of PRs. Notice that old tests are not reproducible.

[Update] Let me try to add a json comparsion.

In reply to: 451365439 [](ancestors = 451365439,451359788)

TomFinley · 2019-01-04T17:11:40Z

My argument doesn't conflict with yours. It's ok to add json but I will have to check in those tests before doing any json. Those models need to be tested by those windows-only tests before being saved as jsons. That also means, we will have at least two batchs of PRs. Notice that old tests are not reproducible.

Hi @wschin and @codemzs, there's something I don't fully appreciate here, maybe you can help me. As I understand it, we are producing a ModelProto object as part of the test. Is it not so? And, we would expect the result of this to be deterministic, I hope? And the simplest way we can detect whether something changes is to just take this ModelProto, export it as a JSON or YAML or some other easily interpreted textual fformat (which I believe protobuf provides facilities for), and baseline that, so that if something changes, we can see at a glance without having to do a lot of work, what has changed. (Either due to a failure of our expectations, a change in the ONNX library and protobuf defs, or some other issue.) So, with that plan, is there something stopping us from doing that right now?

I often think we use baseline testing where it really isn't appropriate, but in this case, where we have this other complex piece of machinery we're trying to interop with, and we have this complex object we're producing many parts of which can change in ways invisible to us, a baseline test makes me feel less nervous. It is hard for me to even anticipate in this case all the ways in which a change in ONNX could break our scenario, and the simplest way I can think of to avoid being surprised is to just baseline the darn thing.

(Even better, perhaps, would be to run the thing through the ONNX transform, now that we have it, to see if it produces the same results. 😄 But one thing at a time I guess.)

We should probably not have export as JSON in the API itself though, since we are writing an API and the most natural way to export as JSON or whatever is to use existing facilities for handling this through protobuf.

wschin · 2019-01-04T19:13:29Z

Text comparison added.

In reply to: 451377212 [](ancestors = 451377212,451365439,451359788)

TomFinley · 2019-01-04T21:27:34Z

src/Microsoft.ML.Data/Model/ModelOperationsCatalog.cs

+        /// The catalog of model protability operations. Member function of this classes are able to convert the associated object to a protable format,
+        /// so that the fitted pipeline can easily be depolyed to other platforms. Currently, the only supported format is ONNX (https://github.com/onnx/onnx).
+        /// </summary>
+        public sealed class PortabilityTransforms : SubCatalogBase


PortabilityTransforms [](start = 28, length = 21)

I am not sure I like this introduction of a new sort of catalog. Why would we do this? Why not just on model operations directly? As it is written, this will be a totally empty object most of the time, until they include and reference the ONNX nuget. Then it will be populated. I mean, could we just push it on model operations catalog directly? Why not do that? It's an operation on a model, why do we need to subdivide things endlessly? How many model portability extensions do we actually really expect to have altogether, even if people were to include all of them? One? Two? Why bother? Also the one extension method that was written for it is not a transform, so even if we wanted a separate catalog for this (we don't) then this would be misnamed, but that is a moot point since I'm not certain we want it anyway.

Two changes are made.
(1) Remove PortabilityTransforms
(2) Move SaveToOnnx to a direct extension to ModelOperationsCatelog

In reply to: 245425207 [](ancestors = 245425207)

TomFinley · 2019-01-04T21:28:14Z

src/Microsoft.ML.Onnx/ProtabilityCatalog.cs

@@ -0,0 +1,32 @@
+using System.Collections.Generic;
+using Microsoft.ML.Core.Data;


Don't forget header #Resolved

No problem.

In reply to: 245425377 [](ancestors = 245425377)

TomFinley · 2019-01-04T21:28:28Z

src/Microsoft.ML.Onnx/ProtabilityCatalog.cs

+        /// <param name="transform">The <see cref="ITransformer"/> that will be converted into ONNX format.</param>
+        /// <param name="inputData">The input of the specified transform.</param>
+        /// <returns></returns>
+        public static ModelProto ConvertToOnnx(this ModelOperationsCatalog.PortabilityTransforms catalog, ITransformer transform, IDataView inputData)


ModelOperationsCatalog [](start = 52, length = 22)

As mentioned elsewhere, just on ModelOperationsCatalog itself.

No problem.

In reply to: 245425439 [](ancestors = 245425439)

TomFinley · 2019-01-04T21:29:06Z

src/Microsoft.ML.Onnx/ProtabilityCatalog.cs

+        /// <summary>
+        /// Convert the specified <see cref="ITransformer"/> to ONNX format. Note that ONNX uses Google's Protobuf so the returned value is a Protobuf object.
+        /// </summary>
+        /// <param name="catalog">A field in <see cref="MLContext"/> which this function associated with.</param>


which this function associated with [](start = 69, length = 35)

I don't know that people get to see the this parameter that often in documentation, but nonetheless we probably want to clean up the grammar here a little.

We use The class that <see cref="ConvertToOnnx(ModelOperationsCatalog, ITransformer, IDataView)"/> attached to now.

In reply to: 245425596 [](ancestors = 245425596)

src/Microsoft.ML.Onnx/ProtabilityCatalog.cs

TomFinley · 2019-01-04T21:43:02Z

src/Microsoft.ML.Onnx/ProtabilityCatalog.cs

+        public static ModelProto ConvertToOnnx(this ModelOperationsCatalog.PortabilityTransforms catalog, ITransformer transform, IDataView inputData)
+        {
+            var env = new MLContext(seed: 1);
+            var ctx = new OnnxContextImpl(env, "model", "ML.NET", "0", 0, "com.microsoft", OnnxVersion.Stable);


com.microsoft [](start = 75, length = 13)

I am not an expert on ONNX, but is this some sort of namespace? When this was functionality that was meant for internal Microsoft consumption, having this be a default may have been appropriate, but when shipped as ML.NET (which involves, of course, many non-Microsoft), should it not be something the user specifies? Or is its role something I don't fully appreciate? (Could be that com.microsoft is entirely possible here, I'd just like to understand why if so, because that seems odd. If it is appropriate, surely a comment to that effect would be desirable.)

It should be "ai.onnx.ml". Sorry about causing confusion. Currently, we don't have anything Microsoft-specific.

[Update] I was wrong. Combining model name, model version, and model domain, we should be able to uniquely indentify an ONNX model. Here, "com.microsoft" is a model domain. Should we use "ml.net" again?

In reply to: 245428777 [](ancestors = 245428777)

No idea, since I have no idea what it is for. 😄 But perhaps there are experts elsewhere we can ask to decide on the right answer.

Because ML.NET is not Microsoft-specific, I just change it to machinelearning.dotnet so that machinelearning.dotnet + ML.NET converter version + model name can be unique IDs of models.

In reply to: 245446247 [](ancestors = 245446247)

src/Microsoft.ML.Onnx/SaveOnnxCommand.cs

TomFinley · 2019-01-04T23:08:09Z

test/Microsoft.ML.Tests/OnnxConversionTest.cs

@@ -0,0 +1,225 @@
+using System;
+using System.IO;


Don't forget your header! #Resolved

2. Address minor comments Remove two best friends

wschin · 2019-01-06T05:39:59Z

Thank you Zeeshan.

wschin force-pushed the upgrade-onnx-cvt branch from b705efd to 2cb004f Compare January 3, 2019 23:44

wschin changed the title ~~[WIP] Prototype of new ONNX converter and an end-to-end test~~ Prototype of new ONNX converter and two end-to-end tests Jan 4, 2019

wschin requested review from codemzs and artidoro January 4, 2019 04:39