-
Notifications
You must be signed in to change notification settings - Fork 1.9k
IndexOutOfRange Exception in KeyToVector transformer #2681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -573,7 +573,6 @@ private ValueGetter<VBuffer<float>> MakeGetterInd(DataViewRow input, int iinfo) | |||
if (key >= (uint)size) | |||
continue; | |||
editor.Values[count] = 1; | |||
editor.Indices[count++] = slot * size + (int)key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also appears that there are tests failing in CI with this change.
Maybe the right change is to use
machinelearning/src/Microsoft.ML.Core/Data/VBufferEditor.cs
Lines 42 to 44 in 01a362b
/// <param name="requireIndicesOnDense"> | |
/// True means to ensure the Indices buffer is available, even if the buffer will be dense. | |
/// </param> |
Codecov Report
@@ Coverage Diff @@
## master #2681 +/- ##
==========================================
- Coverage 72.78% 71.69% -1.09%
==========================================
Files 808 810 +2
Lines 145588 142487 -3101
Branches 16250 16112 -138
==========================================
- Hits 105964 102163 -3801
- Misses 35202 35898 +696
- Partials 4422 4426 +4
|
Hi @Ivanidzo4ka, is this relevant to #2185? It seems to be touching much of the same code I noted in that issue. I mentioned you there but perhaps you did not see. At least, I do not see that there is a comment. Maybe you can review and see that all the issues I observed are actually fixed by this? I see some would be, but might be nice to be complete. |
@@ -1273,7 +1273,7 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema) | |||
metadata.Add(slotMeta); | |||
if (colInfo.InvertHash != 0) | |||
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.KeyValues, SchemaShape.Column.VectorKind.Vector, TextDataViewType.Instance, false)); | |||
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, col.ItemType is VectorType ? SchemaShape.Column.VectorKind.Vector : SchemaShape.Column.VectorKind.Scalar, NumberDataViewType.UInt32, true, new SchemaShape(metadata)); | |||
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, col.Kind, NumberDataViewType.UInt32, true, new SchemaShape(metadata)); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean input shape must be scalar? If yes, we need to throw when encountering a vector. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It means we accept vectors, scalars and varvectors
In reply to: 263166153 [](ancestors = 263166153)
@@ -466,7 +466,9 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema) | |||
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.SlotNames, SchemaShape.Column.VectorKind.Vector, keyMeta.ItemType, false)); | |||
if (col.Kind == SchemaShape.Column.VectorKind.Scalar) | |||
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.IsNormalized, SchemaShape.Column.VectorKind.Scalar, BooleanDataViewType.Instance, false)); | |||
result[colInfo.outputColumnName] = new SchemaShape.Column(colInfo.outputColumnName, SchemaShape.Column.VectorKind.Vector, NumberDataViewType.Single, false, new SchemaShape(metadata)); | |||
result[colInfo.outputColumnName] = new SchemaShape.Column(colInfo.outputColumnName, | |||
col.Kind == SchemaShape.Column.VectorKind.VariableVector ? SchemaShape.Column.VectorKind.VariableVector : SchemaShape.Column.VectorKind.Vector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col [](start = 20, length = 3)
This is key-to-vector mapping. Can the input column be a vector-valued?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact a vector of keys is one of the most important scenarios (e.g., bag of words).
@@ -1273,7 +1273,7 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema) | |||
metadata.Add(slotMeta); | |||
if (colInfo.InvertHash != 0) | |||
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.KeyValues, SchemaShape.Column.VectorKind.Vector, TextDataViewType.Instance, false)); | |||
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, col.ItemType is VectorType ? SchemaShape.Column.VectorKind.Vector : SchemaShape.Column.VectorKind.Scalar, NumberDataViewType.UInt32, true, new SchemaShape(metadata)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col.ItemType is VectorType ? SchemaShape.Column.VectorKind.Vector : SchemaShape.Column.VectorKind [](start = 76, length = 97)
Geez, the original author sure got confused.
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.SlotNames, SchemaShape.Column.VectorKind.Vector, keyMeta.ItemType, false)); | ||
if (!colInfo.Bag && (col.Kind == SchemaShape.Column.VectorKind.Scalar || col.Kind == SchemaShape.Column.VectorKind.Vector)) | ||
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.CategoricalSlotRanges, SchemaShape.Column.VectorKind.Vector, NumberDataViewType.Int32, false)); | ||
if (!colInfo.Bag || (col.Kind == SchemaShape.Column.VectorKind.Scalar)) | ||
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.IsNormalized, SchemaShape.Column.VectorKind.Scalar, BooleanDataViewType.Instance, false)); | ||
|
||
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, SchemaShape.Column.VectorKind.Vector, NumberDataViewType.Single, false, new SchemaShape(metadata)); | ||
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, | ||
col.Kind == SchemaShape.Column.VectorKind.VariableVector && !colInfo.Bag ? SchemaShape.Column.VectorKind.VariableVector : SchemaShape.Column.VectorKind.Vector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col.Kind == SchemaShape.Column.VectorKind.VariableVector && !colInfo.Bag ? SchemaShape.Column.VectorKind.VariableVector : SchemaShape.Column.VectorKind.Vector [](start = 20, length = 158)
Um... yeah.if you wanted to explode out this condition a bit with a bunch of clarifying comments I wouldn't object to this becoming more than a one-line statement. I found this slightly hard to unpack. (Correct, I think, just hard to unpack.)
Hi, @Ivanidzo4ka, wondering if you've had a chance to give this any thought or attention. |
@Ivanidzo4ka - what's the status of this PR? Can it be updated? |
@eerhardt I've update it. |
@@ -1,14 +0,0 @@ | |||
#@ TextLoader{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the SingleDebug
versions of these two baseline files be deleted as well?
new InputOutputColumnPair("TermA", "A"), | ||
new InputOutputColumnPair("TermB", "B"), | ||
new InputOutputColumnPair("TermC", "C") | ||
new OneHotEncodingEstimator.ColumnOptions("CatA", "A"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OneHotEncodingEstimator.ColumnOptions
is an internal class. Can we instead write the tests using the public API?
closing this PR since #3763 has these changes in it. I had initially created that PR not knowing there was this PR but then @yaeldekel pointed out this PR also takes care of output schema so I took those changes. I need to check-in my PR as it is blocking 1P customer. |
Fixes #2678