Skip to content

ONNX Conversion for Hashing Transformer #4596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from

Conversation

KsenijaS
Copy link
Member

Here is the draft for Onnx Conversion for Hashing transformer.
Please not that some test will not pass because onnx only support string and uint for murmur hash3
and also needs to add mask and useOrderedHashing to be compatible with ML.NET
Further more there is a bug in ORT regarding UTF 8 encoding, (they fixed it in this PR: microsoft/onnxruntime#2697), but I think the next release is in about 2 months

@KsenijaS KsenijaS changed the title Hash no span ONNX Conversion for Hashing Transformer Dec 19, 2019
verWrittenCur: 0x00010002, // Invert hash key values, hash fix
verReadableCur: 0x00010002,
//verWrittenCur: 0x00010002, // Invert hash key values, hash fix
verWrittenCur: 0x00010003,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x00010003 [](start = 31, length = 10)

Please add a comment explaining why the version is changing (similar to the line above).

if (!(srcType is VectorDataViewType vectorType))
return ComposeGetterOne(input, iinfo, srcCol, srcType);
return ComposeGetterVec(input, iinfo, srcCol, vectorType);
if (GetVersionInfo().VerWrittenCur == 0x00010002)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetVersionInfo [](start = 16, length = 14)

This will always return 0x00010003.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you need is to add a field to HashingTransformer that has the version. That field will be initialized to be 0x00010003 in the constructors in lines 166 and 177, and according to the ModelLoadContext's version info in the constructor in line 267.


In reply to: 387066236 [](ancestors = 387066236)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field GetVersionInfo().VerWrittenCur will always return 0x00010003 for newly created models, this was meant for already existed models and those models will not be able to be converted to ONNX

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetVersionInfo() calls the method in line 121, so it will return 0x00010003 even if the transformer was created from a model with a different version. When the GetGetterCore method is called you no longer have access to the model that created the transformer (or, it might not have been created from a serialized model at all), which is why you need to add a field to store the version, and initialize it in all the constructors. In the


In reply to: 396068943 [](ancestors = 396068943)

@@ -834,6 +1083,66 @@ private void AddMetaKeyValues(int i, DataViewSchema.Annotations.Builder builder)
}

protected override Delegate MakeGetter(DataViewRow input, int iinfo, Func<int, bool> activeOutput, out Action disposer) => _parent.GetGetterCore(input, iinfo, out disposer);

private bool SaveAsOnnxCore(OnnxContext ctx, int iinfo, string srcVariable, string dstVariable)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SaveAsOnnxCore [](start = 25, length = 14)

Should NumberOfBits and UseOrderedHashing influence the output here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, mask is dependant on NumberofBits, and we aslo need UseOrderedHashing, but we can get that info from _parent._columns[iinfo]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see this information being used anywhere in SaveOnnxCore.


In reply to: 391310537 [](ancestors = 391310537)


IDataView data = mlContext.Data.LoadFromEnumerable(samples);

var hashEstimator = new HashingEstimator(Env, "Education");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashingEstimator [](start = 36, length = 16)

Please add tests that test HashingEstimator with non-default arguments.

OnnxNode murmurNode;

opType = "MurmurHash3";
if (_types[iinfo].RawType == typeof(KeyDataViewType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the output for this transformer is always uint32 for the tests in your branch. Do you have a test where this code get's called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't have.

@harishsk
Copy link
Contributor

This was replaced by #5013. Closing.

@harishsk harishsk closed this May 12, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants