-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ONNX Conversion for Hashing Transformer #4596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
verWrittenCur: 0x00010002, // Invert hash key values, hash fix | ||
verReadableCur: 0x00010002, | ||
//verWrittenCur: 0x00010002, // Invert hash key values, hash fix | ||
verWrittenCur: 0x00010003, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x00010003 [](start = 31, length = 10)
Please add a comment explaining why the version is changing (similar to the line above).
if (!(srcType is VectorDataViewType vectorType)) | ||
return ComposeGetterOne(input, iinfo, srcCol, srcType); | ||
return ComposeGetterVec(input, iinfo, srcCol, vectorType); | ||
if (GetVersionInfo().VerWrittenCur == 0x00010002) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetVersionInfo [](start = 16, length = 14)
This will always return 0x00010003
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you need is to add a field to HashingTransformer
that has the version. That field will be initialized to be 0x00010003
in the constructors in lines 166 and 177, and according to the ModelLoadContext
's version info in the constructor in line 267.
In reply to: 387066236 [](ancestors = 387066236)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field GetVersionInfo().VerWrittenCur will always return 0x00010003 for newly created models, this was meant for already existed models and those models will not be able to be converted to ONNX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetVersionInfo()
calls the method in line 121, so it will return 0x00010003
even if the transformer was created from a model with a different version. When the GetGetterCore
method is called you no longer have access to the model that created the transformer (or, it might not have been created from a serialized model at all), which is why you need to add a field to store the version, and initialize it in all the constructors. In the
In reply to: 396068943 [](ancestors = 396068943)
@@ -834,6 +1083,66 @@ private void AddMetaKeyValues(int i, DataViewSchema.Annotations.Builder builder) | |||
} | |||
|
|||
protected override Delegate MakeGetter(DataViewRow input, int iinfo, Func<int, bool> activeOutput, out Action disposer) => _parent.GetGetterCore(input, iinfo, out disposer); | |||
|
|||
private bool SaveAsOnnxCore(OnnxContext ctx, int iinfo, string srcVariable, string dstVariable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveAsOnnxCore [](start = 25, length = 14)
Should NumberOfBits
and UseOrderedHashing
influence the output here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, mask is dependant on NumberofBits, and we aslo need UseOrderedHashing, but we can get that info from _parent._columns[iinfo]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't see this information being used anywhere in SaveOnnxCore
.
In reply to: 391310537 [](ancestors = 391310537)
|
||
IDataView data = mlContext.Data.LoadFromEnumerable(samples); | ||
|
||
var hashEstimator = new HashingEstimator(Env, "Education"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashingEstimator [](start = 36, length = 16)
Please add tests that test HashingEstimator
with non-default arguments.
OnnxNode murmurNode; | ||
|
||
opType = "MurmurHash3"; | ||
if (_types[iinfo].RawType == typeof(KeyDataViewType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the output for this transformer is always uint32 for the tests in your branch. Do you have a test where this code get's called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't have.
This was replaced by #5013. Closing. |
Here is the draft for Onnx Conversion for Hashing transformer.
Please not that some test will not pass because onnx only support string and uint for murmur hash3
and also needs to add mask and useOrderedHashing to be compatible with ML.NET
Further more there is a bug in ORT regarding UTF 8 encoding, (they fixed it in this PR: microsoft/onnxruntime#2697), but I think the next release is in about 2 months